Who is the Client

A brand with over 20,000 stores worldwide.

The Challenge

The company runs its critical applications on multiple Kubernetes clusters deployed in production and non-production environments. Maintaining these Kubernetes clusters is crucial for the proper functioning of the company’s platform. It requires backing up various objects in a Kubernetes cluster and storing them in a centralized GitHub repository.

The company used BASH (Bourne Again Shell) scripts to perform Kubernetes object backups. The company’s team would clone the corresponding GitHub repository locally to conduct a disaster recovery, manually executing the necessary ‘kubectl’ commands to restore Kubernetes cluster objects.

However, this approach had critical flaws that included:

  • Shell scripts needed modification when new Kubernetes objects were added
  • The backup and restore process was time-consuming and involved multiple manual steps

In summary, the company wanted a way to:

  • Automate the Kubernetes cluster backup and restore process: The current backup process relied upon shell scripts that had to be changed each time the Kubernetes cluster configuration changed. The company was looking for a way to automate backup and restore.
  • Perform error-free cluster restoration quickly and efficiently: The old disaster recovery process involved too many error-prone manual steps. They wanted a smooth and optimized restore process.
  • Cut cloud storage costs: Backing up Kubernetes clusters can quickly incur massive cloud storage costs. The company needed a cost-effective solution.

The Solution

GSPANN’s DevOps team identified the challenges related to the company’s existing approach toward its backup/restore process and proposed a new solution. We implemented Velero to take backups and restore the objects in case of disaster. Velero is an open-source tool that migrates Kubernetes cluster resources and persistent volumes.

We connected an Azure storage account to Velero, which uses an Azure blob container to upload backup files. A Velero server component was installed and configured on all individual Kubernetes clusters. We also installed a Velero client on the company’s local system to interact with the Velero server through the command-line interface (CLI).

Once we finished this initial setup, we scheduled automatic Velero-Kubernetes backup jobs using the Velero client CLI. These jobs’ purpose is to backup respective Kubernetes objects from a given Kubernetes cluster once a week. Backup taken by these jobs is stored in an Azure blob storage container. To minimize Azure Storage cost, we defined Velero backup jobs with a TTL (time to live) of 60 days. This configuration removes any backup files older than 60 days from the Azure storage container.

If disaster recovery is required, we restore Kubernetes objects using backup files present in Azure Storage container also through Velero client CLI.

Key benefits provided by our solution include:

  • Eliminated need for manual script modification: Backup and restore operations are now straightforward and don’t require modification of BASH scripts.
  • Automated the backup and restore process:The automated process of scheduled Velero backup jobs eliminates manual intervention and increases operational efficiency.
  • Provided precision control over backup and restore: The company now has complete control over Kubernetes objects. They can now backup the entire Kubernetes cluster-specific namespace or a specific Kubernetes object.
  • Production cluster replication promotes development and testing:We replicated the production cluster to create LLC environments that help with development and testing efforts. Developers and QA engineers can now work within an environment that mirrors the live production environment.
  • Massively improved cluster restoration time: Our solution improved cluster restoration speed by more than 80%.

Business Impact

With our solution in place, the company experienced many benefits, including:

  • Minimal backup storage costs: Our solution rotates backups after 60 days which minimizes cloud storage costs to the company.
  • Frees company resources:Previously, the time taken by the entire backup and restore process took from 2 to 3 hours. Our automated solution reduces the time to less than 30 minutes, freeing vital resources’ time to focus on more critical issues.
  • Minimal downtime: Recovery now occurs rapidly without requiring time-consuming and error-prone manual steps. Minimized recovery time improves sales volumes and increases customer satisfaction.
  • Real-time disaster recovery: Our solution can perform real-time recovery from a disaster in the production environment. The company is now confident that they’ll lose no business due to technical failures.
  • Scalable cluster management automation: Our solution manages seven clusters and is easily scaled to handle more.

Technologies Used

Velero: Open-source tool to backup and restore Kubernetes clusters
Microsoft Azure: Cloud computing service provider with a suite of cloud computing services
Kubernetes: Open-source container orchestration tool for automating computer application deployment, scaling, and management
etcd Key Store: Consistent and highly-available key value store used by Kubernetes to store cluster metadata, configuration, and state data
Platforms: Linux, UNIX, and Windows

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investment with effective use of the latest data and analytics tools.

Do you have a similar project in mind?

Enter your email address to start the conversation