Job Description
Description
Analyze, troubleshoot, and resolve customer-impacting issuesin a timely manner.
Monitor production systems, service health dashboards, alerts, and operational tools to proactivelyidentifycustomer-impacting incidents.
Lead incident response activities and coordinate with Engineering, Product, Infrastructure, and Support teams to drivetimelyresolution.
Perform incident triage, impact assessment, and severity classification for production incidents.
Manage major incident bridges and ensuretimelystakeholder communication throughout the incident lifecycle.
Debug and analyze service disruptions, platform degradation, and customer-impacting issues, escalating whenrequired.
Collaborate with cross-functional teams toidentifyroot causes and implement corrective and preventive actions.
Drive Root Cause Analysis (RCA) reviews and ensure action items are t...