new.narwal.ai

Leveraging TOSCA DI in Data Migration from On-Prem to Azure Databricks

Leveraging TOSCA DI in Data Migration from On-Prem to Azure Databricks

Background: 

A leading American distributor of gasoline embarked on a significant data migration initiative. The project involved transitioning 36 tables, containing substantial volumes of data (approximately 800 million records), from SQL Server (On-Prem) to Azure Databricks (On-Cloud). The goal was to ensure data integrity and quality while addressing data format issues, security, and compliance challenges. 

Challenges: 

The distributor faced several challenges during the migration process: 

  • Data Integrity and Data Loss: Ensuring that data remained consistent and accurate during the migration. 
  • Data Format Issues: Managing different data formats between the source and target systems. 
  • Data Security and Compliance: Ensuring that the migration met all necessary security and compliance requirements. 
  • Downtime and Business Continuity: Minimizing downtime to ensure business operations were not disrupted. 
  • Skill and Knowledge Gaps: Bridging the gaps in skills and knowledge required for the new cloud platform. 

The Solution: 

To address these challenges, Narwal implemented a comprehensive solution involving Tosca Data Integrity (DI): 

  • Pilot Testing: Conducted pilot testing to identify potential issues before a full-scale migration. 
  • Data Validation: Ensured data accuracy through schema comparison, row count comparison, and checksum/hash totals. 
  • Data Integrity Testing: Performed consistency checks and data quality assessments, including row-by-row comparisons of high-volume tables. 
  • Automation: Implemented Tosca DI to automate test cases for file-file, database-database, file-database, and file-API comparisons, facilitating a more efficient and reliable validation process. 

Detailed Implementation: 

  • Data Size and Composition: The pilot table contained approximately 340 million rows and 209 columns, confirming its substantial size. 
  • Chunking Strategy: Due to the large volume, the data was chunked into smaller segments to facilitate row-by-row comparison. Queries were created to divide the data into segments of 10 million records each, resulting in 34 distinct test cases. 
  • Execution of Test Cases: 34 test cases were executed on three DEX machines. 
  • Hardware Specifications: Each DEX machine was equipped with 32 GB RAM, 500 GB disk space, and a quad-core processor, sufficient for handling large data sets and intensive processing tasks. 
  • Execution Time: The total execution time for all 34 test cases amounted to 84 hours. 

Outcomes: 

The implementation of Tosca DI and the automated data validation solution resulted in significant benefits for the distributor: 

  • Risk Reduction: Minimized the risk of data loss and integrity issues during migration. 
  • Efficiency: Reduced manual testing time significantly, achieving faster validation with full data coverage. 
  • Cost and Time Savings: Achieved 50% savings in cost and time on maintenance. 
  • Business Continuity: Ensured smooth transition and supported ongoing business operations without significant downtime. 
  • Data Quality: Improved data availability and self-service capabilities, ensuring better optimization on the cloud platform. 

The successful validation and migration of 36 tables from SQL Server to Azure Databricks facilitated the management of 800 million records, ensuring data quality and integrity. This partnership with Narwal demonstrated the power of automation in data validation, enabling the distributor to focus on their core business goals and innovation. 

Contact us today to unlock your business’s full potential and experience the benefits of automated data validation with Narwal and Tosca DI. 

Request a Consultation session Today!

Let\’s Talk

Leave a Reply to zoritoler imol Cancel Reply

Your email address will not be published. Required fields are marked *

0 thoughts on “Leveraging TOSCA DI in Data Migration from On-Prem to Azure Databricks”

  1. Keep up the great work, I read few content on this site and I believe that your blog is real interesting and has lots of good info .

  2. I have been examinating out some of your posts and i can state pretty nice stuff. I will surely bookmark your site.

Scroll to Top