Cloud Site Reliability Engineer

Stefanini

📍 Dallas, Texas, United States

Full-time Computer Occupations

Job Description

Responsibilities:

What Will Be Expected of You:
  • Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices across the system
  • Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources related to SRE solutions, incorporating cost-efficient design principles
  • Develop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of the solutions
  • Define SRE standards, best practices, and guidelines for adoption across teams; establish SRE metrics like SLI, SLOs, etc.
  • Participate in incident management and on-call rotation, providing technical support for SRE tools, troubleshooting production issues, and collaborating with teams to reduce incident recurrence through proactive detection and pattern analysis
  • Stay current with emerging AWS services, SRE methodologies, and ...
  • Apply for this Position