Who is the Company

A well-known manufacturer of skincare products with $1.5B in annual sales.

The Challenge

The company’s SQL server could not handle the massive amount of data it received daily. This significantly impacted the load time of business-critical data analytics reports. The company wanted to have live or at least near-real-time access to reports and dashboards for data catering to different subject areas, like orders and accounts.

The company also needed reports that captured daily orders, sales, user accounts, and enrollment data. Further, the company wanted comparative reports from data spanning the past 13 months. Data and metrics were required for each country with its corresponding local currency and a USD ($) conversion for all reports

In short, the company needed:

  • A single comprehensive view of the data: Existing system reports had inaccurate results due to data redundancies and lacked a single, comprehensive view of data.
  • Consistency with legacy reports: Any new reports built on advanced query metrics had to be consistent with legacy reports produced from SQL Server Data.
  • A clear view of current data: The company wanted to access a set of reports and dashboards near real-time.
  • Precision graphical data representation: The dashboards needed to employ graphs with selectable high-granularity metrics, such as monthly, quarterly, and year-to-date. Information of this depth was not available in the company’s legacy system.
  • Access to historical data: The company wanted to include historical data in the new set of reports and dashboards. The company could determine trends leading to more accurate sales forecasting with this data.
  • Fast performance: One of the current issues facing the company was slow data access. The company required the new system to deliver requested reports within thirty seconds.

The Solution

After careful consideration, our Content team chose Google Cloud Platform (GCP) BigQuery to form the basis for the business solution.

Orders, sales, enrollments, product, and user accounts data are streamed through Kafka. The validated data then flows into a set of BigQuery nested tables. A data profile testing process identifies schema mismatches. Subsequently, an end-to-end test is performed that mitigates discrepancies where records were dropped in the reporting layer.

Our team used BigQuery to define several dashboards and reports, including:

  • Daily Sales Report
  • Orders Dashboard
  • Customer Dashboard
  • Product Dashboard
  • Promotion Dashboard
  • Auto-ship Dashboard

We validated performance by running datasets with a massive volume of data. We also defined event-based real-time pipelines to get transactional and continuous incremental data. The QE team then verified the reports and dashboards for near-real-time data.

The validated historical data was migrated from SQL Server to BigQuery using an automation framework. Our team then defined several BigQuery views in accordance with the company’s reporting requirements.

Key points in our solution include:

  • Comprehensive and accurate data view: The earlier system reports were getting inaccurate results due to data redundancy and the lack of a single, comprehensive view of data.
  • Highly optimized platform: Using GCP, our team has optimized performance, enabled ease of maintenance, enhanced self-service capabilities, and improved execution capabilities.
  • Zero data loss: The new and improved architecture results in zero data loss. The ingestion layer combines and consolidates data, allowing efficient data handling and ensuring that no business-critical information is wasted or lost.
  • Modern analytics platform: GCP migration turns the old system into an enterprise data warehouse (EDW) with up-to-date analytics capabilities.
  • Near real-time response: The reports are generated using BigQuery tables and views, improving performance and providing near real-time values in reports.

Business Impact

  • ROI growth over time: The new GCP architecture improves the overall scalability of the system, allowing for future development at minimal additional cost.
  • Faster response to changing market conditions: The new system vastly reduces the turnaround time of reports and dashboards compared with the old system. Quick turnaround time enables company management to make informed decisions faster.
  • Better informed business decisions: Comprehensive and accurate data is now delivered to company management in near real-time, allowing them to make informed business decisions. The new system provides high-quality graphs with orders, sales, user accounts, and enrollment data by day, month, quarter, or year-to-date.

Technologies Used

Microsoft SQL Server: Relational database management system
Apache Airflow: Workflow management platform that provides support for data pipelines
GCP BigQuery: Scalable cloud-based data warehouse that provides analytics tools and support for large-scale SQL queries
GCP Cloud Computing Services: Provides tools for collecting, processing, managing, analyzing, and visualizing data
GCP Cloud Composer: Managed workflow orchestration service built upon Apache Airflow
GCP Cloud Data Fusion: Tool that provides support for pre-defined batch or real-time data transformations and custom connections

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.

Do you have a similar project in mind?

Enter your email address to start the conversation