We are hiring a
Sr. Site Reliability Engineer who will work with the software engineers to build reliable, high capacity and high-performance infrastructure in support of our mission to reimagine learning for millions of students worldwide.
If you know AWS services inside out, have solid networking experience, and you like engineering solutions to solve site reliability and operations problems, you will thrive in this position. The position will be located at our
Boston, MA facility.
Essential Accountabilities:
- Hands-on design, analysis and troubleshooting of highly-distributed large-scale production systems;
- Ownership of reliability, uptime, capacity, and performance analysis thereof
- Ensuring the repeatability, traceability, and transparency of our infrastructure automation
- Identifying highest-impact opportunities to optimize existing systems
- System design consulting for teams seeking to leverage or improve their production infrastructure
- Anticipate, build and plan capacity for upcoming product/feature launches
Required Skills:
- Mastery of AWS services (IAM, EC2, S3, EBS/EFS, ELB/ALB, AutoScaling, RDS and replication techniques, VPC, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ECS, SNS, ElastiCache);
- Expertise in container/container-fleet-orchestration technologies (like Docker, Kubernetes, AWS ECS);
- Expertise in designing and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data-driven and AUTOMATED) ways;
- Mastery of infrastructure build and configuration automation technologies (like Terraform, Ansible, Puppet, CodeDeploy, Chef);
- Strong skills in reading, understanding and writing code in at least two of: Javascript, Python, PHP, Go, or Ruby;
- Strong network engineering skills;
- Cloud and container native Linux administration/build/management skills (AWS AMIs, Packer, etc.);
- Significant experience troubleshooting concurrent and distributed system interactions;
- Expertise with continuous-deployment software development lifecycles in the Cloud (CI/CD);