Jobs at Syrinx

View all jobs

Senior Site Reliability Engineer (SRE)

Remote, US Only

Senior Site Reliability Engineer (SRE)

This role is remote with a Syrinx Educational Technology Partner

This role can be contract or contract to hire.

U.S. Citizens and those authorized to work in the U.S. are encouraged to apply. We are unable to sponsor at this time. 

We are looking for an adventurous Senior DevOps Engineer who loves AWS technologies. You will be a member of an engineering team where collaboration and innovation are a key focus. 

What you will be doing

·        Partner with engineering, security, and product teams to keep our services reliable, available, fast and cost efficient

·        Build tools and automation that eliminates repetitive tasks, minimizes downtime, achieves human free operations, and provides self-service solutions to product development teams

·        Design, build and operate large-scale production systems hosted within our on-prem and AWS hosting environments

·        Lead technology initiatives that drive scalability and reliability improvements

·        Advocate and implement reliable design patterns (e.g. circuit breakers, graceful degradation)

·        Share an on-call rotation with your team and respond to incidents; lead triage efforts and provide needed status updates

 

Skills and Qualifications:

·        7+ years of industry experience

·        3+ years deploying, operating, and debugging server software on Linux. Comfortable diagnosing and resolving common system issues.

·        Deep experience implementing infrastructure as code with Terraform

·        You have designed, built, and operated highly available AWS ECS, EKS or independent K8s clusters.     

·        Strong knowledge of common AWS technologies like ELB, CloudFront, EC2, RDS, ElastiCache, S3, ElasticSearch, IAM and Route 53

·        You have participated in a 24x7 on-call rotation with your team and responded to incidents

·        Proficient with APM, infrastructure and log aggregation tooling to monitor system health and customer experience (e.g. New Relic, OpenTelemetry, Cloudwatch, Sumologic, ELK)

·        A proven track record of diagnosing and fixing time sensitive and critical production issues

·        Experience developing and maintaining ci/cd pipelines (e.g. jenkins, circleci, git, gitflow, sonarqube, blue/green)

 

Big Pluses

·        Ansible, Cloudformation, Packer

·        Database administration skills (AWS Aurora, MySQL, Postgres, Oracle)

·        Have leveraged deployment strategies such as blue-green and canary

·        Experience building RESTful services and/or web applications

·        Experience automating software deployments and following a continuous delivery and deployment model

·        Experience with system analysis and troubleshooting in large-scale Linux environment

 

People who have been successful in this role:

·        Passionate and adept at software development and/or system engineering

·        Love to understand how new technologies and architectures work, educate coworkers and channel their knowledge into improving system reliability and performance

·        Continuously learning about application scalability, availability, reliability, and security

·        Intensely curious about how complex distributed systems operate and fail at scale

·        Think freely and independently, and are ready to share their views

·        Eager to learn from mistakes and socialize the lessons learned

·        Like to take ownership of infrastructure components and leading projects

 

Share This Job

Powered by