Job Description
Kaseya is hiring a Site Reliability Engineer to keep our production systems healthy as we scale. You'll own the reliability of services that thousands of MSPs depend on every day. That means defining the SLOs we hold ourselves to, leading incidents when they happen, and building the automation that keeps things stable as we ship. The work is hands on, the on call rotation is real, and the environment runs heavily on AWS. If you treat reliability as a product instead of a chore, you'll fit in well here.
What You'll Do
- Set, monitor, and enforce SLOs, SLIs, and error budgets that keep our systems reliable
- Lead incident response, troubleshooting, and blameless postmortems that produce real fixes
- Build and maintain automated deployment, configuration management, and infrastructure provisioning using Infrastructure as Code
- Manage cloud and hybrid infrastructure with Terraform or CloudFormation, balancing cost...