SRE Leads - Java

Site Reliability Engineering (SRE), Java, Spring, Cassandra, Azure, Kafka, Hazelcast
Description

An efficient Site Reliability Engineering (SRE) professional is as much about how you think as your technical skills. The SRE role requires a mix of development and operations skills that combine software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.

 

As a part of the SRE team, you will manage the complex challenges of scale that are unique to the client, while using your expertise in coding, systems, the complexity of operating systems, and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem-solving, and openness is key to its success. We bring together people with a wide variety of backgrounds, experiences, and perspectives.

Who We Are

GSPANN has been in business for over a decade, with over 1800 employees worldwide, and servicing some of the largest retail, high technology, and manufacturing clients in North America. We provide an environment that enables career growth while still interacting with company leadership.

Visit Why GSPANN for more information.

Location: Hyderabad
Role Type: Full Time
Published On: 21 February 2022
Experience: 6+ Years
Description

An efficient Site Reliability Engineering (SRE) professional is as much about how you think as your technical skills. The SRE role requires a mix of development and operations skills that combine software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.

 

As a part of the SRE team, you will manage the complex challenges of scale that are unique to the client, while using your expertise in coding, systems, the complexity of operating systems, and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem-solving, and openness is key to its success. We bring together people with a wide variety of backgrounds, experiences, and perspectives.

Role and Responsibilities
  • Lead and sustain a high-performance team that supports production operations.
  • Transform the existing production support teams into SRE teams.
  • Build deep knowledge of the business and understand the end-to-end customer journey.
  • Partner with leaders across functions to improve the design, visibility, availability, scalability, and performance of services.
  • Efficiently automate manual processes, deep dive into incidents, and facilitate blameless post-mortems.
  • Improve alert management, decision making, analysis, error budgeting, and various optimization techniques by measuring data using standardized telemetry.
  • Build methodologies to track reliability and performance issues - give teams an insight on areas of improvement to ensure customer satisfaction and increase product quality.
  • Adhere to crucial company controls necessary to meet internal or external audit requirements.
  • Contribute to the account and practice growth by working with the client partners in identifying new opportunities and resource requirements.
  • Work with the HR team to attract and retain the right talent and timely fulfill resource needs.
  • Exhibit inspirational leadership and build a talented, cohesive, result-oriented, and healthy team environment.
  • Build value-proposition presentations, case studies, and accelerators to assist the Sales team during the pre-sales cycle to address all facets of Managed Services.
  • Provide engineering solutions for gathering or publishing data and event collection across the distributed architectures, automation, monitoring, intelligent alerting, and self-healing.
Skills and Experience
  • 6+ years of experience in software development, technical operations, and running large-scale applications.
  • 4+ years of experience in running 24x7 support teams.
  • Expertise in developing or supporting microservices using Java, Spring Boot, Kafka, SQL, NoSQL, and distributed caching solutions.
  • Good understanding of application performance management solutions like Postman, New Relic.
  • Hands-on experience in working with high-volume, mission-critical applications - build messaging and/or event-driven architectures.
  • Knowledge of the IT Infrastructure Library (ITIL) framework and various IT Service Management (ITSM) tools available in the marketplace.
  • Prior experience in transitioning traditional production support teams into SRE teams.
  • Must have top-notch consulting or relationship management skills and a deep appreciation of IT tools, techniques, systems, and solutions.
  • Excellent communication skills along with experience in managing and motivating high-performance teams and individuals.
  • Use creative problem-solving skills to resolve issues pertaining to project deliverables and cross-functional teams amidst the changing priorities.
  • Should be flexible and resourceful to swiftly manage the changing operational goals and demands.
  • Passionate about operational excellence and governance.
  • Expertise in managing escalations and taking complete responsibility and ownership of all critical issues to get a logical closure.

Key Details

Location: Hyderabad
Role Type: Full Time
Published On: 21 February 2022
Experience: 6+ Years

Apply Now