Who is the Client

A US-based Fortune 500 departmental store chain with more than 1000 stores across the states, they are bringing stylish clothing for the entire family since decades now.

The Challenge

The client runs an extensive e-commerce portal that receives a huge amount of user, transactional, and behavioral data. They were using an in-house relational database system that contained 7 years of data of all online and offline marketing campaigns from different marketing platforms.

Each campaign was expected to have at least 1 million products with 48K – 72K customer interactions, resulting in 48 – 72 billion data points. It includes the responses received, types of responses, date, time, minimum/maximum temperature, product information, customer information, and more via social media platforms, SMS, push notifications, and emails. The in-house database was incapable of analyzing such huge amounts of permutations and combinations.

The client wanted to get key insights from the database—such as campaign entry/exit touchpoints, performance statistics of various channels, etc.—to better target future campaigns. They wanted to ensure that only the intended users receive the discount coupons via the marketing campaigns.

The Solution

GSPANN’s advanced analytics team examined the requirement and developed POCs for a model deployment using Google Cloud Platform (GCP) and big data tools. We deployed the model on Airflow and built an automated scoring pipeline. Later, we retrained and fine-tuned the model.

Our team analyzed the result of each campaign by applying various data transformations that include calculations, logic alterations, and more, as per the client’s requirements. We rectified data discrepancies (due to duplicate entries) in the relational database, incremental data received daily, and removed the data of the expired campaigns.

Based on the input data, the model determines the cost of execution of each campaign for the given interval based on the start and end date. We used the Managed File Transfer (MFT) process to transfer the files into GCP buckets.

We handled two types of data – Dimensions (full refresh tables) and Fact Data (incremental tables). Using Hive and BigQuery, data was processed into structured Hive tables that were used to create insight tables as per the business requirement. The client’s marketing team visualize the data using Tableau.

Business Impact

  • We eliminated all duplicate data from the database and removed 10% data of the expired campaigns for better data management.
  • The process of data correction resulted in 90% more accurate data for future campaigns as compared to the previous year.
  • The client can now execute targeted campaigns by sending relevant promotional coupons to improve sales and build better marketing strategies for the future.

Technologies Used

Python. An interpreted, high-level, and general-purpose programming language that enables programmers to write clear and logical code
BigQuery. A fully-managed, serverless data warehouse that enables scalable, cost-effective, and fast analysis over petabytes of data
Hive. A data warehouse software project built on top of Apache Hadoop to provide data query and analysis
Sqoop. A command-line interface application for transferring data between relational databases and Hadoop
Google Cloud Services. A suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investment with effective use of the latest data and analytics tools.

Do you have a similar project in mind?

Enter your email address to start the conversation