Who is the Client

A US-based manufacturer of skincare products with $1.5B annual sales. Millions of customers across United States, Canada, and Australia exclusively depend on this brand for their skincare and makeup routine.

The Challenge

The client distributes its products through a multi-level marketing model. They hire independent sales consultants that connect with potential preferred customers (PCs) via social media or in-person meetings. The PCs eventually buy and resell the products to the end customers. They place a certain amount of order every 60 days on the client’s e-commerce portal, which forms a major part of the client’s sales.

The client was using Splunk to monitor various e-commerce applications. However, these applications generated alerts via email that were difficult to monitor. Similarly, the Splunk dashboard had many screens that required monitoring by the Network Operations Center (NOC) L1 team.

  • The email-based alerts, generated by applications on Splunk, were not prioritized in terms of criticality.
  • The NOC L1 team was unable to determine the right course of action for these alerts since there were no documented guidelines for in-response actions required by the team.
  • Monitoring multiple Splunk dashboard screens was a cumbersome and cost-intensive task as it utilized paid features of the New Relic browser.

The Solution

Mentioned below are some of the key tasks performed by the GSPANN’s L1 support team:

  • Eliminated manual email-based monitoring: By switching to Splunk-based systematic monitoring.
  • Streamlined alerts: Improved alert accuracy and quality. Optimized the existing alerts by converting non-critical alerts from real-time to scheduled. Provided well-defined, documented guidelines to the NOC L1 team with the help of runbooks as a reference for the triggered alerts.
  • Created new alerts: We created new, important alerts that were critical for effective monitoring of the applications.
  • Optimized the existing dashboards: Merged multiple dashboards and created new, comprehensive dashboards to help the team in monitoring all critical components of the applications with lesser screens.

Business Impact

  • Converted 40+ e-mail-based monitoring cases to Splunk-based systematic monitoring. Transformed 70+ real-time alerts to scheduled alerts.
  • Fine-tuned Splunk queries, which reduced skip ratio from 45% to 5%.
  • Created runbooks for 230+ alerts. Embedded the runbook reference in the alerts, which helped the team in getting the runbooks quickly. This improved the response time of NOC alerts (MTTR) by 10 minutes.
  • Splunk dashboard screens reduced from 48 to 10. Merged and created light-weight dashboards from 12 to 2.
  • The team delivered additional application support without increasing overheads.

Technologies Used

Splunk Version: 7.1.3. A horizontal technology used for application management, security, compliance, as well as business and web analytics
New Relic. An observability platform built to help engineers create more perfect software by analyzing, troubleshooting, and optimizing the entire software stack
PagerDuty. An American cloud computing company specializing in a SaaS incident response platform for IT departments
Platform. AWS, Rackspace, Equinix

Related Capabilities

Reduce Downtime by Identifying Improvement Areas with a Proactive Production Support

We have expertise in implementing a preventative approach during production support. Our network operations center (NOC) provides deep application and system monitoring to ensure that you don’t face any surprises. Our production support team can help run your application uninterrupted to keep your customers happy and satisfied.

Do you have a similar project in mind?

Enter your email address to start the conversation