Skip navigation EPAM
CONTACT US

Lead Site Reliability Engineer (SRE) Pune, India

  • hot

Lead Site Reliability Engineer (SRE) Description

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.

We are seeking a talented and motivated Lead Site Reliability Engineer (SRE) to join our organization.

The Lead SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.


#LI-DNI#EasyApply

Responsibilities

  • Design, build, and maintain scalable and reliable cloud infrastructure and services on platforms such as AWS, Azure, or Google Cloud
  • Automate manual work using scripting/programming languages like Python, Bash, or PowerShell, particularly within cloud environments
  • Utilize automation tools like Jenkins, GitLab, and Ansible/Chef to streamline deployment, monitoring, and management of systems and applications in the cloud
  • Monitor system performance, proactively troubleshoot issues, and ensure high availability and performance using Observability tools like Prometheus, Grafana, or ELK stack
  • Participate in capacity planning and scalability assessments to support business growth, focusing on cloud resource optimization
  • Implement containerization and orchestration technologies such as Docker and Kubernetes, particularly in cloud-native environments
  • Ensure compliance with security best practices and standards to safeguard data and systems in the cloud
  • Continuously evaluate and recommend new technologies and practices to improve system reliability, performance, and efficiency in the cloud
  • Document processes, procedures, and configurations to maintain system integrity and facilitate knowledge sharing
  • Provide on-call support and participate in incident management & response activities as needed

Requirements

  • 8-13 years of experience in a similar role
  • Prior leadership experience or team management skills
  • Experience with cloud platforms like AWS, Azure, or Google Cloud
  • Proficiency in scripting/programming languages such as Python, Bash, or PowerShell
  • Experience with automation tools like Jenkins, GitLab, and Ansible/Chef
  • Strong communication and collaboration skills
  • Experience with Observability tools such as Prometheus, Grafana, ELK stack, or similar
  • Hands-on experience with Docker, Kubernetes, or similar technologies
  • Knowledge of security practices and standards in cloud environments
  • Experience with SLI, SLO, SLA, and Error Budget concepts
  • Strong problem-solving skills and ability to troubleshoot complex issues under pressure
  • Familiarity with Agile methodologies and DevOps practices
  • Excellent documentation skills

Nice to have

  • Certifications in cloud technologies (AWS, Azure, Google Cloud)
  • Contributions to open-source projects

We offer

  • Opportunity to work on technical challenges that may impact across geographies
  • Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
  • Opportunity to share your ideas on international platforms
  • Sponsored Tech Talks & Hackathons
  • Unlimited access to LinkedIn learning solutions
  • Possibility to relocate to any EPAM office for short and long-term projects
  • Focused individual development
  • Benefit package:
    • Health benefits
    • Retirement benefits
    • Paid time off
    • Flexible benefits
  • Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)

A DAY IN THE LIFE

BLOG

Salman Talat
Director, Account Management
TORONTO, CANADA

Read More

BLOG

Iryna Kovalenko
Delivery Manager
KYIV, UKRAINE

Read More

BLOG

Jan Mazurek
Chief Business Analyst
GDANSK, POLAND

Read More

GET IN TOUCH

Hello.
How can we help you?

Get in touch with us. We'd love to hear from you.

Our
Locations