Who is the Company

A leading omnichannel health and wellness retailer with 1000+ stores in the United States.

The Challenge

The company migrated their technology platforms and applications from on-premises to the cloud using OpenShift and the Google Cloud Platform (GCP) with the intention of reducing hardware costs. However, they overlooked the hidden costs associated with cloud usage fees.

Shortly after taking charge of managing the company’s cloud operations, our Managed Services experts noticed a mismatch between the expected and actual use of cloud resources. After investigation, we discovered that the company relied on outdated Virtual Machine (VM) and Pod baselines on their current GCP and OpenShift infrastructure. This resulted in high cloud usage costs.

Another problem the company faced was with their legacy production engineering practice, where the repetitive, manual tasks were consuming valuable time and resources without adding long-term value.

These tasks took up much of the production engineering team’s time. On average, an hour a day was spent on these tasks, adding up to 365 hours of valuable engineering time each year. The bigger issue was that these monotonous tasks would only continue to increase as more company platforms migrated to GCP. This meant the time spent on these never-ending tasks would only keep growing.

In brief, the company needed to:

  • Reduce time spent on repetitive manual tasks: The company spent a lot of time on “toil” - repetitive manual tasks with no long-term value. They needed to automate these tasks to avoid wasting engineering resources.
  • Optimize infrastructure costs: It was vital for the company to optimize their cloud infrastructure usage to reduce costs.

The Solution

After deliberation, the GSPANN team devised a detailed plan to reduce costs. Our team introduced automated solutions to manage critical tasks that addressed the issue of high expenditure due to excessive use of resources. This allowed flexibility in adjusting resources during working and non-working hours, which led to significant savings.

Our team created and implemented automated solutions that could adapt according to the workload during the day (18 hours) and night (6 hours). We tested these solutions in a simulated crisis scenario in the East Region of our cloud platforms.

With a strong focus on cost-effectiveness, we made this solution applicable to any project on any cloud platform. Our automated solutions were designed to be easily adapted to other projects or platforms, making the process simpler and more efficient.

By addressing these issues and making the processes more streamlined, the company was able to function more efficiently by freeing engineers’ time to focus on innovative and strategic projects, propelling the business forward. By optimizing company practices and reducing repetitive tasks through automation, we strengthened their technical base, which contributed to long-term success and growth.

Here are some highlights of our solution:

  • Solid metrics based on actual data: Our solution was built on reliable metrics derived from real data, such as transactions per second, traffic volume, and CPU and memory usage, enabling accurate data-driven decision-making.
  • Dynamic capacity optimization: We introduced automated solutions for critical jobs that enabled capacity adjustments, reducing infrastructure costs.
  • Reusable optimization solutions: Our scaling optimization implementation was created to be reusable on both GCP and OpenShift. This feature helped the company save a lot of money in future development costs.

Business Impact

  • Substantial infrastructure cost reduction: The company achieved yearly cost savings of around 10%.
  • Frees resources: Our automated solution reduces manual efforts. At the current rate, the company saved approximately 365 hours per year (one hour daily). The savings were even greater in a bigger time frame.
  • Increased reliability: Manual SRE tasks inevitably involve a certain element of human error. By first testing solutions and then automating the manual processes, our team reduced the incidence of errors, resulting in a substantial increase in system reliability.

Technologies Used

Google Cloud Platform (GCP): A suite of cloud computing services offering infrastructure, data storage, and machine learning capabilities
OpenShift: A Kubernetes-based container platform that simplifies application development, deployment, and management in hybrid cloud environments
Splunk: A data analytics tool that collects and correlates machine-generated data, including transactions-per-second
Dynatrace: An AI-driven application performance monitoring solution that provides CPU and memory metrics
Terraform: An open-source infrastructure-as-code software tool that enables streamlined management and provisioning of cloud resources
Apigee: An API platform that facilitates the creation, deployment, and management of APIs
Rundeck: An open-source automation platform that simplifies operational tasks, streamlines workflows, and enhances collaboration for IT teams

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.

Do you have a similar project in mind?

Enter your email address to start the conversation