Who is the Client

A US-based Fortune 500 departmental store chain with more than 1000 stores across the states, they are bringing stylish clothing for the entire family since decades now.

The Challenge

The client runs a huge e-commerce portal and drives sales through promotional campaigns on their ‘new customer’ database. Differentiating a new customer from an old one is crucial for the client’s business. They use the customer analytics dashboard to analyze the new customer acquisitions and segment their promotional campaigns for the old and new customers.

However, the customer analytics dashboard was showing an unusually high number of new customers. The numbers were not matching with the data in the campaign tool as it was reflecting some of the customers as ‘old’ while the dashboards were reflecting them as ‘new.’

As a result, the promotional code meant for the new customers was sent to the old ones, resulting in wastage of the promotional budget. The client wanted to identify and resolve the reason behind such mismatch of information and ensure that the old customers are not targeted with the campaigns designed for new customers.

The Solution

Team GSPANN analyzed the client’s customer analytics dashboard and concluded that the reason behind the high number of ‘new customers’ reflecting on customer analytics dashboards is duplicate customer IDs.

The client’s internal customer database system receives data from multiple sources, which includes online and offline transactions made from multiple cards, internal cash-card programs, loyalty programs, etc. These sources generated different profiles for the same customer in the database.

Earlier, these different profiles were not mapped to a single user ID. Hence, whenever the customers made a transaction from a different credit card or source, the system treated them as a ‘new’ customer and assigned a new customer ID.

To resolve this, we came up with a customer IDs matching approach, which involves choosing the most frequently used source to store the customer details, fetch information from it, and map it with all available customer IDs. Later, when multiple matches were found, we fetched the best-preferred customer ID for each customer/profile based on a few rules/priorities set by the client and refreshed the existing data with the newly created data. Apart from the fetched ID, all other IDs were marked as duplicate IDs and were force-matched to the same customer.

The source of data and their respective databases, in terms of completeness of customer information, can be arranged in the descending order – client’s internal cash card, registered account, guest account, loyalty account, and third-party credit cards. We compared the customer IDs within the third-party credit card database (containing the least amount of information) and with the client’s internal cash card database (containing the best available information). Results of the customer IDs matching process are updated on the dashboards once a day and the same data is used by other campaign tools.

Business Impact

  • The solution resulted in a 12% decrease in the total number of duplicate IDs.
  • The solution matches more than 100K customer IDs daily that corrects data for dashboards and promotional campaigns.
  • The wastage of promotional budget was reduced by allowing the new customers to use the promotional coupons.

Technologies Used

Python. An interpreted, high-level, and general-purpose programming language that enables programmers to write clear and logical code
BigQuery. A fully-managed, serverless data warehouse that enables scalable, cost-effective, and fast analysis over petabytes of data
Apache Spark. An open-source distributed general-purpose cluster-computing framework that can quickly perform processing tasks on very large datasets
Data Proc. A fast, easy-to-use, fully-managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way
Google Cloud Services. A suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investment with effective use of the latest data and analytics tools.

Do you have a similar project in mind?

Enter your email address to start the conversation