How can organizations modernize legacy systems to improve data governance and ensure regulatory compliance?

Organizations can modernize legacy systems by migrating to cloud-based data lakes and implementing automated ETL processes. This involves moving historical data and daily feeds to platforms like Azure, configuring scheduled data processing jobs, and establishing comprehensive validation frameworks. The modernization should include metadata testing for schema validation, data completeness testing to prevent loss or corruption, and data quality testing to ensure business rules are properly applied. Automated validation tools can process millions of records in minutes, replacing manual processes that were previously impractical due to data volume.

What are the key validation requirements when migrating millions of daily transactions from legacy systems?

When migrating high-volume transaction data, organizations must verify each row of every data record, ensure business rules are applied correctly, and eliminate data discrepancies including duplicates and null entries. The validation process should encompass metadata testing to verify column names, data types, and constraints, plus data completeness testing to confirm all expected data transfers without loss. Data quality testing detects duplicate records and validates business logic application. Automated validation tools like Tosca DI can process over 8 million daily transactions efficiently, identifying functional defects that manual testing would miss due to overwhelming scale.

Why do legacy AS-400 systems struggle with modern data governance requirements?

Legacy AS-400 systems with DB2 databases face significant limitations in handling modern data governance needs due to their underlying technology stack and text-based interfaces. These systems struggle to resolve conflicts between data from different upstream sources and have difficulty applying complex business rules to produce valid data for stakeholders. The traditional 'green screen' interfaces make it nearly impossible to generate user-friendly reports for daily activities. Additionally, manual validation processes become impractical when dealing with millions of daily transactions, creating bottlenecks in data processing and reducing overall system efficiency and accuracy.

What role do automated validation tools play in ensuring data integrity during system modernization?

Automated validation tools are essential for maintaining data integrity during large-scale system modernization projects. Tools like Tosca Data Integrity enable organizations to compare high volumes of data at enormous scale within reasonable timeframes, processing millions of records in minutes rather than days. These tools perform comprehensive testing including metadata validation, data completeness checks, and business rule verification through SQL-based automation scripts. They identify functional defects that manual testing would miss due to data volume constraints and become indispensable for regression testing, ensuring ongoing data quality throughout the modernization process.

How can organizations measure the business impact of legacy system modernization on data governance?

Organizations can measure modernization impact through several key metrics including processing time reduction, revenue accuracy improvements, and operational efficiency gains. Successful modernization can reduce hard close processing time by up to 95%, from days to hours, while increasing revenue accuracy by significant amounts monthly. Enhanced user interfaces reduce business team onboarding times and improve daily processing speed. The transition from text-based legacy interfaces to modern graphical user interfaces with advanced reporting and analytics capabilities provides measurable improvements in user productivity and data accessibility for stakeholders.

What are the essential components of a comprehensive data validation framework for modernized systems?

A comprehensive data validation framework must include metadata testing to verify schema information like column names, data types, and constraints with additional validation for length and naming conventions. Data completeness testing ensures all expected data transfers from source to target without loss, including row count verification and aggregate column validation. Data quality testing checks that business rules and logic are properly applied to ETL pipelines, detecting duplicates and performing integrity validation. The framework should also include draft run capabilities for preview before final posting and fully automated processes that validate daily data per business requirements.

How do cloud-based data lakes improve data governance compared to traditional database systems?

Cloud-based data lakes provide superior data governance by enabling scalable storage and processing of vast amounts of structured and unstructured data from multiple upstream systems. They support automated ETL jobs that can be configured and scheduled to summarize data and generate reports through modern business intelligence tools like Power BI. Unlike traditional systems, cloud data lakes facilitate comprehensive validation processes using tools like Azure Databricks with SQL capabilities, allowing organizations to perform business validation and data sampling testing efficiently. This architecture supports better conflict resolution between different data sources and enables more sophisticated business rule application.

Modernize Legacy Systems for Improved Data Governance

About the Client

A food and beverage company with an extensive chain of stores worldwide.

The Challenge

The company used an IBM AS-400 with a DB2 Database for its revenue accounting system. After each fiscal period closure, the company was required to summarize and send accounting data to the General Ledger but faced data accuracy problems. One of the major challenges they encountered was that the data originated from different upstream systems. Their current data governance implementation had difficulty resolving conflicts between inputs.

The company also needed help applying business rules to produce valid data intended for its users and stakeholders. They wanted user-friendly reports for their day-to-day activities, which proved impossible on the old system due to its underlying technology stack.

An additional challenge was that the company needed to validate a massive amount of data, with daily data items numbering in the millions. The company sought a solution to guarantee the integrity of its data pipeline after data conversion and business rules were applied. They needed a scalable legacy system modernization solution to:

Verify each row of every data record.
Verify that each business rule is applied correctly.
Verify there are no data discrepancies, including duplicates and null entries.

In short, the company needed:

Accurately ingested financial data: The company was looking for a solution capable of accurately ingesting accounting data from different sources.
User-friendly reports: The company sought a system that could apply business rules correctly and guard against discrepancies to produce easily readable reports.
Validation of vast amounts of data: Performing manual validation was challenging due to the high volume of transactions received, totaling more than 8 million daily transactions.

Our Solution

Our team designed and executed a migration plan that moved ten years of historical data along with the latest daily feeds from different upstream systems to an Azure data lake. We configured and scheduled extract, transform, and load (ETL) jobs to summarize data and send reports to business users through Power BI and Azure SQL Reporting.

Our quality engineering team assisted the company in the validation performed during data migration and conversion. We also performed business validation on revenue accounting system data. The team validated business rules applied to the data and conducted data sampling testing for migrated and converted records in Azure Databricks via Structured Query Language (SQL). Using SQL in our solution was ideal because, at this stage, there was no UI (User Interface) or API involvement.

Image Credit: Tutorial: Extract, transform, and load data by using Azure Databricks

During our preliminary analysis, we explored and proposed two proof-of-concepts. One used ‘Great Expectations,’ and the other leveraged ‘Tricentis Tosca Data Integrity’ (Tosca DI).

Great Expectations is an open-source Python library that helps produce assertions for PySpark data frames. This would have been a good choice as we built our software on Azure Data Lake. However, subsequent investigation uncovered some shortcomings.

Tosca DI successfully fulfilled all our automation needs, enabling us to compare a high volume of data on an enormous scale in a reasonable timeframe. The company was already utilizing Tosca for other applications. Consequently, the company decided upon Tosca and we recommended the Tosca DI module, a tool purposefully designed to meet data integrity testing needs.

Our team developed automation scripts using SQL in Tosca DI to address the challenges we encountered. These scripts effectively identified numerous functional defects that functional testing had previously missed due to the overwhelming data scale. These automation scripts became indispensable in regression testing, which was manually impractical due to the data volume.

Image Credit: Tricentis Tosca

Below are some key results our legacy system modernization solution achieved through automation:

Metadata testing: This allows the company to obtain information about the data within the schema, for example, the column names, the data type of a column, and constraints imposed on the data. The new system incorporates a rich set of additional validation that includes checks on the data type, length, constraint, and naming conventions.
Data completeness testing: All expected data is extracted from the source and loaded into the target without loss or corruption. The company can confirm the row count, aggregate column data values, validate primary keys, and perform a full-value comparison validation. In addition, they can now monitor missing setups.
Data quality testing: The new system allows us to check that the business rules and logic have been applied to the ETL pipeline. Quality tests detect duplicate data, confirm that business rules have been applied, and perform data integrity validation.
Draft run: Our solution lets the company view journal data before the final posting to the company’s Oracle accounting systems.
Fully automated auto-accept process:The new system validates all the daily store data per business requirements.
Scalability and futureproofing:Testing and validation automation using Tosca DI helped validate millions of upstream records in minutes.

Business Impact

One of the greatest benefits company stakeholders experienced was moving away from traditional AS-400 “green screen” text-based apps into a modern graphical user interface (GUI).

The other areas of business impact include:

Greatly enhanced and user-friendly interfaces improve daily work: The new intuitive navigation capabilities have reduced business team onboarding times. The day-to-day processing is now faster and easier due to the enhanced sales estimate maintenance screens. The new UI dashboards also provide advanced reporting and analytics.
Increased revenue accuracy: The new system increased revenue accuracy by up to $200,000 monthly. Also, the store comp process has been dramatically streamlined and shows increased reporting accuracy.
Improved sales estimating: Due to the ready availability of accurate data, the new system improved the company’s ability to estimate sales. This substantially impacted its offline sales discount process and has benefited the company overall. The new system of alerts and notifications provides improved visualization for trending vs average sales.
Hard close time reduced by 95%: The hard close processing and validation completion now takes only 2 hours. Under the old system, this took up to 2 days to complete.

Related Capabilities

Utilize Actionable Insights from Multiple Data Hubs to Gain More Customers and Boost Sales

Unlock the power of the data insights buried deep within your diverse systems across the organization. We empower businesses to effectively collect, beautifully visualize, critically analyze, and intelligently interpret data to support organizational goals. Our team ensures good returns on the big data technology investments with the effective use of the latest data and analytics tools.

Related Services

Technologies Used

Node.js
React.js
MongoDB Cosmo
Express.js
Apache NiFi
Microsoft Azure Databricks
Oracle EBs
Tricentis Tosca DI