Job Description
JOB OVERVIEW JOB TITLE Service Operations Architect LOCATION Offshore ENGAGEMENT TYPE Multiple concurrent enterprise engagements across healthcare, public-sector and AI/cloud platforms GENERAL JOB DESCRIPTION Operational Excellence and SRE lead for hybrid environments. Owns service reliability across traditional 3-tier stacks (IIS / Tomcat / MS SQL), cloud-native Kubernetes microservices, and GPU-accelerated AI inference workloads. Drives ITIL v4 + SRE practices, SLA adherence, DR readiness and observability. DUTIES & RESPONSIBILITIES Define and govern incident, problem, change and release management aligned to client SLAs. Own DR plan execution and failover/failback runbooks using VMware Site Recovery Manager, Velero and multi-region cloud strategies. Design observability across application, database, integration, microservices and GPU tiers (Grafana, Prometheus, ELK, Zabbix, Azure Monitor). Lead post-incident reviews and continuous improvement. Ensure compliance with security baselin...