Senior Site Reliability Engineer

Boston, MA

Senior Site Reliability Engineer
Boston, MA
Apply directly to creposa@syrinx.com

U.S. Citizens and those authorized to work in the U.S. are encouraged to apply. We are unable to sponsor at this time. No Corp to Corp.

Our e-commerce partner is hiring for the next generation of their products that are delivering engaging, adaptive, and personalized experiences to optimally support every learner.

We are hiring a Sr. Site Reliability Engineer who will work with the software engineers to build reliable, high capacity and high-performance infrastructure in support of our mission to reimagine learning for millions of students worldwide.

If you know AWS services inside out, have solid networking experience, and you like
engineering solutions to solve site reliability and operations problems, you will thrive in
this position.

Essential Accountabilities:
● Hands-on design, analysis and troubleshooting of highly-distributed large-scale
production systems;
● Ownership of reliability, uptime, capacity, and performance analysis thereof
● Ensuring the repeatability, traceability, and transparency of our infrastructure automation
● Identifying highest-impact opportunities to optimize existing systems
● System design consulting for teams seeking to leverage or improve their production
infrastructure
● Anticipate, build and plan capacity for upcoming product/feature launches

Required Skills:
● Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and
replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront,
Lambda, CloudFormation, ECS, SNS, ElastiCache);
● Expertise in container/container-fleet-orchestration technologies (like Docker,
Kubernetes, AWS ECS);
● Expertise in designing and manage escalation response plans from monitoring, react,
respond, remediate and retrospect in culturally aligned (proactive, customer focused,
collaborative, data-driven and AUTOMATED) ways;
● Mastery of infrastructure build and configuration automation technologies (like
Terraform, Ansible, Puppet, CodeDeploy, Chef);
● Strong skills in reading, understanding and writing code in at least two of: Javascript,
Python, PHP, Go, or Ruby;
● Strong network engineering skills;
● Cloud and container native Linux administration/build/management skills (AWS AMIs,
Packer, etc.);
● Significant experience troubleshooting concurrent and distributed system interactions;
● Expertise with continuous-deployment software development lifecycles in the Cloud (e.g.
CI/CD);
● Cloud database operations and deployment experience (RDS MySQL/Postgres/Aurora),
caching operations & deployments (Memcache, Redis);
● Expertise with Lean/Agile deployment processes (ZDT: Blue/Green, Canary, DNS
strategies);
● Familiarity with site and infrastructure monitoring systems (CloudWatch, Datadog, New
Relic, Sumologic, Thousand Eyes);
● Strong problem solving, root cause analysis and systems engineering skills;
● Good presentation and communication skills;
● Expertise with SDLC branching, SCM, and code deployment systems (Git/Gitflow,
Jenkins, CircleCI, etc.);

Senior Site Reliability Engineer

Share This Job