Site Reliability Engineering Managers

Site Reliability Engineering (SRE), Application Support, Java, Cloud Technologies
Description

We are looking for Site Reliability Engineering Managers who are determined to solve the organization’s most challenging problems. Join our global workforce and unleash a pool of opportunities.

Location: Hyderabad
Role Type: Full Time
Published On: 26 May 2021
Experience: 10+ Years
Description
We are looking for Site Reliability Engineering Managers who are determined to solve the organization’s most challenging problems. Join our global workforce and unleash a pool of opportunities.
Role and Responsibilities
  • Build methodologies to track reliability and performance issues to give teams insights on where to improve customer satisfaction and overall product quality.
  • Drive efficiency through automation of manual processes, deep dive into incidents, and facilitate blameless post mortems.
  • Help in transforming the existing production support teams into SRE.
  • Manage and mentor teams of engineers and work with the leadership on strategic initiatives.
  • Work to provide hands-on technical expertise to design, deploy, secure and optimize services
  • Apply a “measure everything” approach through standardized telemetry to drive efficient alerting, decision making, analysis, error budgeting, and other optimization techniques.
  • Drive improvements in technical architecture and standards/processes to deliver the best customer experiences.
  • Contribute to account and practice growth by working with Client Partners in identifying the new opportunities and resource needs.
  • Work with the Hiring team in attracting and retaining the right talent and timely fulfillment of resource needs.
  • Exhibit inspirational leadership and build a talented, cohesive, result-oriented, and healthy team environment.
  • Build value-proposition presentations, case studies, and accelerators to assist the Sales team during the pre-sales cycle addressing all facets of Managed Services.
Skills and Experience
  • 10+ years of experience in software development and/or technical operations, and running large-scale applications with 3+ years of management experience.
  • 5+ years of experience in running 24x7 Professional Services or Support teams.
  • Experience in any one programming language, preferably Java or Python.
  • Proficient in one or more cloud providers: GCP, AWS, or Azure.
  • Prior experience in logging platforms and application performance metrics: Datadog, New Relic, Dynatrace, Splunk, ELK Stack, Azure monitoring, etc.
  • Good understanding of high volume, mission-critical applications along with container platforms, such as Docker or Kubernetes.
  • Thorough knowledge of IT Infrastructure Library (ITIL) framework and various IT Search Management (ITSM) tools available in the marketplace.
  • Prior experience in engineering solutions for metrics gathering/publishing and event collection/correlation across distributed architectures, automation, monitoring, intelligent alerting, and self-healing.
  • Must bring top-notch consulting /relationship management skills and deep appreciation of IT tools, techniques, systems, and solutions.
  • Excellent communication skills along with experience in managing and motivating teams and individuals to deliver better performance.
  • Must have creative problem-solving skills related to teams, project deliverables, and cross-functional issues amidst the changing priorities.
  • Flexible and resourceful to manage the constantly changing operational goals and demands.
  • Passionate about operational excellence and governance.
  • Good experience in handling escalations and take complete responsibility and ownership on all critical/major issues to get logical closure.

Key Details

Location: Hyderabad
Role Type: Full Time
Published On: 26 May 2021
Experience: 10+ Years

Apply Now