Job Description
Opportunity We are looking for SREs who want to define what reliability means for the next generation of industrial software. Defining SLIs/SLOs, building observability platforms, and establishing incident management processes. Responsibilitie sDefine and implement SLI/SLO frameworks for complex engineering systems across manufacturing and industrial client sDesign and deploy observability platforms using Prometheus, Grafana, and Datado gEstablish incident management processes and lead blameless post-mortem sImplement chaos engineering practices to proactively identify system weaknesse sDrive toil elimination through automation and platform improvement sBuild reliability engineering capabilities within the practice and client organisation s Essential Skil lsSLI/SLO definition and implementation at enterprise sca leObservability: Prometheus, Grafana, Datadog, New Rel icIncident management and post-mortem facilitati onChaos engineering: Gremlin, Chaos Monkey, Litm usPython testing for re...