Job Description
Role: Production Engineering -Monitor and analyze application and system logs to identify issues and anomalies. -Troubleshoot and resolve incidents related to system performance, application errors, and infrastructure issues. -Work closely with cross-functional teams to diagnose and resolve complex technical problems. -Implement proactive measures to prevent incidents and improve system reliability. -Respond to major incidents in a timely manner and communicate effectively with stakeholders to provide updates and coordinate resolution efforts. -Document incident details, troubleshooting steps, and resolutions for future reference. -Continuously monitor system health and performance metrics to identify potential issues and areas for optimization. -Participate in on-call rotation to provide 24/7 support for incident management. -Proficiency in Unix/Linux operating systems and command-line utilities. -Experience with troubleshooting application and system logs using tools like grep, awk, ...