Who is the Company

The organization is a large multinational consumer electronics manufacturer.

The Challenge

The technology division of the marketing business unit is responsible for monitoring around 1,500 URLs used by various apps for marketing purposes. The technology environment included a Jenkins server that ran a job every 15 minutes to check on the applications' status. However, the system generated excessive false positives, wasting time and effort.

Many URLs that needed monitoring were generated dynamically, further compounding the problem. The company’s systems reliability engineering (SRE) team manually added all auto-generated URLs to the Jenkins job. This manual intervention was also prone to errors within the system.

In addition to the above, the business unit needed a centralized dashboard to provide the team with a single place to get an accurate picture of application availability.

The tools used by the organization’s SRE team include Kubernetes, Jenkins, Amazon Web Services (AWS), Vault, Consul, vCenter, NGINX Plus, NetScaler, FreeIPA, Argo CD, Spinnaker, GitHub, Jira, Confluence, and Artifactory.

In brief, the organization was looking for the following:

  • Accurate data on application status: The current system generated far too many false positives, making its existing status alerts virtually meaningless. The company needed a system to validate alerts and provide accurate status data.
  • Immediate integration of auto-generated URLs: The company sought a solution to capture auto-generated URLs as they were created and automatically added to the Jenkins job.
  • Comprehensive centralized system health dashboard: The company urgently needed a single place where stakeholders could view accurate, up-to-date, and highly visible application URL status.

The Solution

Business Impact

Key benefits the company enjoyed after project completion:

  • Fast access to health data: Our solution implemented a centralized one-stop-shop monitoring solution that provides the company’s SRE team immediate access to the system’s health data. The SRE team no longer wastes precious time searching for the right tool.
  • Enhanced visibility: The new dashboard provides high-quality analytics data on the company’s applications, tools, and infrastructure status.
  • Proactive response: The instant alert system substantially reduced the delay between response and issue resolution time. With newly generated URLs automatically added to the to-be-monitored list, problems are now spotted almost immediately.
  • Increased productivity: The company’s SRE team enjoys enhanced productivity as tedious monitoring tasks are handled automatically, allowing the team to focus on more critical issues.
  • Enhanced user experience: The new system includes many new features that take the mystery out of system health checks.

Technologies Used

Prometheus: An open-source system that supports a multidimensional data model and turns metrics into actionable insights
Splunk: A horizontal technology used for application management, security, and compliance, as well as business and web analytics
Kubernetes: An open-source container orchestration tool for automating computer application deployment, scaling, and management
ReactJS/NodeJS: A JavaScript-based client-server technology used to build single-page applications running in the browser
Python: A popular scripted programming language
PostgreSQL: A highly performant open-source database

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.

Do you have a similar project in mind?

Enter your email address to start the conversation