After carefully analyzing company requirements, our Managed Services team integrated a Splunk Cloud Platform into its solution. Splunk provides a highly scalable data stream processing engine that combines security with observability and machine learning (ML).
Leveraging Splunk’s out-of-the-box (OOTB) observability features and the powerful Splunk Query Language, our team created 140 unique hourly site verification (HSV) alerts designed to call out any application discrepancies. The automated solution allows the company to respond to and correct issues within an hour.
Thresholds for the alerts are segregated based on the expected traffic on the site. During high transaction periods, thresholds are stringent. In contrast, during moderate or low transaction periods, thresholds are adjusted as appropriate to site user traffic.
For example, an alert will trigger if the number of orders in an hour exceeds a hundred during high transaction periods. On the other hand, during low transaction periods, alerts trigger when the number of transactions is less than ten.
The new system also defines intelligent alerts. These alerts differ from HSV alerts because they detect any potential performance degradation by comparing the thresholds within the last hour or day. Corrective action follows after identifying the performance impact. To illustrate, if 1000 orders were placed on the previous day at 4 PM, an alert triggers if the system detects a decrease in orders by more than 20% the following day.
Here are some key aspects of our solution:
- Hourly monitoring: The new system performs automated site functionality monitoring hourly rather than daily. Comprehensive monitoring vastly increases the likelihood of spotting and correcting an issue promptly.
- Routine and intelligent alerts: Our team introduced over 140 new alerts into the new system. The alerts represent a combination of ordinary alerts concerned with essential application health. In addition, the new design includes intelligent alerts able to detect potential issues.
- Adaptive thresholds: Alerts thresholds in the new system are designed based on the user traffic. Thresholds are adjusted based on usage tiers which increases monitoring accuracy.