From Documents to Rows: Our Journey Migrating MongoDB to SQL Server in AWS

When I received a MongoDB to SQL Server migration requirement in 2019, I had to pause. This wasn't typical relational-to-relational migration—it meant transforming flexible documents into rigid, normalized tables. Here's how I bridged two completely different data paradigms using Talend and what I learned about heterogeneous database migrations.

Aug 2, 2025 0

Loading 65 Million Records into Cosmos DB: A Weekend Data Migration Journey

Migrating 65 million records into Azure Cosmos DB seemed impossible with our 1000 RU/s limit. Through strategic planning, temporary scaling to 10,000 RU/s, and 15-batch processing with Azure Databricks, we completed the migration in 30 hours over a weekend, achieving 100% data integrity while maintaining security and cost efficiency.

Jul 31, 2025 0

Automate Data Security: Azure Logic Apps for SFTP Uploads

In the digital age, protecting data at every stage is essential, particularly for organizations handling sensitive or regulated information. One crucial aspect of data security is ensuring that files entering an organization's system are safe from malware and other threats. Automated file scanning at the point of entry is a robust strategy that can secure... Continue Reading →

Jan 12, 2025 0

Optimize Delta Lake Storage with VACUUM Command

As a data engineer managing batch file processing with Databricks, I recently encountered a storage issue that many teams face: rapidly increasing storage volume. In this blog, I'll share the challenge I faced with my Delta Lake storage, how I resolved it, and the benefits I gained by implementing Databricks' VACUUM command to manage storage... Continue Reading →

Nov 2, 2024 0

Optimizing Parallel Data Loads to Delta Lake: A Concurrency Issue Solution

The data lake architecture utilizes SFTP for data uploads from multiple customers, requiring parallel file loading into Delta Lake. Concurrency issues arose during merging operations, primarily due to simultaneous updates. The team implemented table partitioning by Customer ID and added retry logic to mitigate conflicts, planning a future upgrade to Databricks Runtime 15.4.

Sep 14, 2024 0

Blog at WordPress.com.

Up ↑