Available to hire
I am Sourav Roy, a data engineering professional with over 7 years of experience driving data-driven transformations across financial services, retail, and technology. I specialize in architecting cloud data infrastructures, leading cross-functional teams, and delivering Agile, fast-paced solutions.
I have hands-on experience with Generative AI, RAG-based retrieval, vector databases, and embedding pipelines, and I excel at integrating AI capabilities into enterprise-grade cloud platforms (GCP, Azure) to boost operational efficiency and cut costs.
Skills
Language
English
Fluent
Work Experience
Senior Data Engineer at Scotiabank
September 1, 2024 - November 10, 2025Led migration of MarTech data pipelines including social media and Adobe Analytics data from EDL to GCP; rebuilt using Dataflow, Pub/Sub, GCS, Cloud Run, Cloud Composer (Airflow), DBT, and BigQuery, enabling real-time campaign performance tracking across 5 business units. Built a metadata-driven ingestion framework using PySpark, Airflow, and GCS to integrate 100+ data sources into a centralized BigQuery model; automated schema onboarding, reduced manual config by 80%, improved SLA compliance, and enabled data-driven decision making. Developed scalable solutions for processing and analyzing 13 million records on Hadoop using Spark and Azure services. Created Power BI dashboards by optimizing source queries across Hive, Trino, MS SQL, SharePoint, PostgreSQL, and Azure Synapse. Implemented embeddings for RAG-based AI applications and collaborated on LLM-powered document retrieval using embeddings and vector search with models like OpenAI GPT and Copilot.
Data Engineer at Arap Technologies
August 31, 2024 - August 31, 2024Architected enterprise-scale data models in Snowflake, applying dimensional modeling with Star and Snowflake schemas to streamline reporting and reduce data processing time by 50%. Designed BI solutions with Power BI, orchestrating data integration across 7+ platforms and automating reporting workflows for 500+ users. Built automated data quality monitoring using dbt tests achieving 99.9% data accuracy across 50+ datasets. Implemented SCD Type 2 for 15+ data sources to preserve historical integrity.
Data Engineer at Scotiabank
April 30, 2024 - April 30, 2024Engineered end-to-end data pipelines for Global Wealth data operations using Talend, Azure Data Factory and Databricks, processing millions of customer interactions daily to enable real-time analytics in Synapse Analytics and improve operational insights by 60%. Optimized large-scale data processing with PySpark, applying advanced partitioning to cut cluster processing time by 50%. Migrated corporate product data pipelines to GCP (Cloud Composer, BigQuery, DBT) to optimize performance. Designed scalable pipelines across On-Prem, Hybrid, and Cloud platforms using Python, PySpark, Airflow, Presto, and Cloud Storage.
Data Analyst at MCAP
April 1, 2022 - April 1, 2022Developed data and reporting infrastructure using Tableau and SQL to provide real-time insights into product, marketing funnels, and KPIs. Performed exploratory data analysis with Pandas on 6.9 million mortgage markers, improving high-risk profile identification by 18%. Optimized hyperparameters with GridSearchCV, boosting model performance by 12% for SVM, Random Forest, and Logistic Regression; applied KNN, DBSCAN, and GMM for customer segmentation to enhance ad targeting.
BI Consultant at Arap Technologies
July 31, 2020 - July 31, 2020Designed and deployed Azure Synapse Analytics environments for data warehousing and analytics; optimized query performance and data retrieval. Streamlined ETL using Microsoft SQL Server and SSIS, achieving significant time reductions. Automated report generation with SSRS, reducing manual effort by over 56% and ensuring timely data delivery.
BI Consultant at River Cree Resort & Casino
March 31, 2018 - March 31, 2018Developed reports with SSRS in Visual Studio and SSMS; identified and optimized slow SQL queries and ETL processes to improve data processing speed. Built ETL processes to transform and load data into Synapse Analytics for seamless integration.
Education
Master of Management Analytics (Advanced Analytics and Data Science) at Queen's University
September 1, 2020 - December 1, 2021Qualifications
Microsoft Certified: Azure Data Engineer Associate
January 11, 2030 - November 10, 2025Microsoft Azure AI Fundamentals
January 11, 2030 - November 10, 2025Microsoft Certified: Fabric Data Engineer Associate
January 11, 2030 - November 10, 2025Databricks Certified Data Engineer Professional
January 11, 2030 - November 10, 2025Industry Experience
Financial Services, Software & Internet, Professional Services, Media & Entertainment, Retail
Skills
Hire a Data Analyst
We have the best data analyst experts on Twine. Hire a data analyst in Kingston today.