Latest posts
09 February 2024

The blog introduces SageMaker as a versatile AWS service for tasks like building data pipelines and deploying machine learning models, addressing common confusion by explaining how to write pipeline definitions and deploy them using AWS CDK into your SageMaker domain.

16 December 2023

This series of blog posts aims to demystify the associated terminology and concepts, providing a comprehensive guide for individuals looking to comprehend and leverage these powerful models in their projects.

13 August 2023

This article explores the importance of data lineage, which tracks the flow and transformations of data from source to destination, playing a vital role in ensuring data integrity and transparency in data processes.

20 June 2023

In this blog, we explore how to ensure data quality in a Spark Scala ETL (Extract, Transform, Load) job. To achieve this, we leverage Deequ, an open-source library, to define and enforce various data quality checks..