I am an experienced data scientist specializing in optimizing data pipelines and deploying data-driven insights to improve decision-making processes across various industries. I have worked extensively with cloud platforms such as Azure and AWS, and technologies including Airflow, Databricks, and Redshift. I am passionate about leveraging AI and machine learning to create scalable solutions that drive business value. I enjoy collaborating across cross-functional teams to solve complex problems and continuously upskill through certifications and hands-on projects. Outside of work, I contribute to open source projects focused on fine-tuning large language models and am deeply interested in generative AI and data visualization technologies.

Sai Praveen Vattem

I am an experienced data scientist specializing in optimizing data pipelines and deploying data-driven insights to improve decision-making processes across various industries. I have worked extensively with cloud platforms such as Azure and AWS, and technologies including Airflow, Databricks, and Redshift. I am passionate about leveraging AI and machine learning to create scalable solutions that drive business value. I enjoy collaborating across cross-functional teams to solve complex problems and continuously upskill through certifications and hands-on projects. Outside of work, I contribute to open source projects focused on fine-tuning large language models and am deeply interested in generative AI and data visualization technologies.

Available to hire

I am an experienced data scientist specializing in optimizing data pipelines and deploying data-driven insights to improve decision-making processes across various industries. I have worked extensively with cloud platforms such as Azure and AWS, and technologies including Airflow, Databricks, and Redshift. I am passionate about leveraging AI and machine learning to create scalable solutions that drive business value.

I enjoy collaborating across cross-functional teams to solve complex problems and continuously upskill through certifications and hands-on projects. Outside of work, I contribute to open source projects focused on fine-tuning large language models and am deeply interested in generative AI and data visualization technologies.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Work Experience

Data Scientist at Data Semantics FZE
July 1, 2025 - August 5, 2025
Optimized data pipelines in Airflow, reducing processing time significantly to enhance real-time analytics. Created analytics reports and proposed new data insights using various BI tools. Managed business requirements and prepared complex business insights as per regulatory and management standards. Developed pipelines for big data support and integrated event-driven real-time streaming data. Created automation flows for data ingestion from external ecosystems and collaborated on machine learning model development and feature extraction for product lines.
Data Scientist at Data Semantics Private Ltd
November 1, 2024 - August 5, 2025
Collaborated with cross-functional teams to design and deploy data-driven insights significantly improving decision-making processes. Created and optimized analytical reports, designed automation pipelines, and managed data workflows using Azure, AWS, Airflow, Redshift, and other cloud and big data platforms. Developed solutions for large data sets, including real-time streaming and multi-currency cost updates. Customized solutions with plant users in manufacturing environments to improve KPI understanding.
Machine Learning Engineer and Community Builder at Omdena (NGO)
November 1, 2019 - August 5, 2025
Led data transformation initiatives optimizing data pipelines to support real-time analytics resulting in improved decision-making processes. Created lexicon vectors for text translation, performed data profiling and clustering with advanced NLP techniques, and implemented sentiment analysis models. Developed crop classification models using satellite and image data with machine learning tools. Contributed as a community builder and volunteer on AI projects.
Data Scientist at Data Semantics FZE
December 1, 2024 - Present
Optimized data pipelines in Airflow, significantly reducing processing time to enhance real-time analytics availability within 6 months. Collaborated on creating analytics reports and proposing new insights using Metabase and Redshift queries. Developed Power Automate flows for integrating new data points and managed business requirements ensuring compliance with regulatory standards. Maintained and optimized DAGs for supporting big data from Dynamics F&O and designed logical insights and KPIs to better understand data in Synapse and Databricks. Developed and automated Azure Data Factory pipelines and incorporated real-time streaming data solutions from SAP to Databricks using Event Hub triggers.
Data Scientist at Data Semantics Private Ltd
November 30, 2024 - August 5, 2025
Collaborated with cross-functional teams to design and deploy data-driven insights, significantly enhancing decision-making processes independent of traditional metrics. Led data transformation initiatives optimizing pipelines for real-time analytics, which improved accuracy and timeliness of decisions. Created pipelines in Azure Data Factory and Databricks to ingest data into platforms like Onelake, supporting incremental and full load automation through CI/CD pipelines. Worked on data transformation, feature engineering, and model selection for multiple product lines at client organizations, optimizing code to handle large monthly data volumes. Delivered analytical solutions integrating SAP transactions with Power BI reporting and real-time streaming.
MLEngineer and Community Builder at Omdena (NGO)
November 30, 2019 - August 5, 2025
Led data transformation initiatives optimizing data pipelines to support real-time analytics, resulting in improved decision-making processes. Created lexicon vectors to translate tweets from Ebonics to English, performed topic-level data profiling and clustering for threat detection. Implemented Vader Sentimental analysis on network models and created geolocation-based alert notifications for enforcement. Developed a crop classification model using the Inception architecture and labeled datasets with Labelbox to train image classifiers from satellite and Google Maps data.

Education

PG-Diploma in Banking and Finance at Manipal University Bangalore
May 1, 2015 - May 1, 2015
B-Tech Electronics & Communication Engineering at SCSVM University Kancheepuram
March 1, 2009 - March 1, 2013

Qualifications

PMP
July 1, 2025 - August 29, 2025
Azure Data Scientist Associate
June 1, 2022 - June 1, 2026
Certified PySpark Developer from Databricks
October 1, 2021 - August 1, 2022
Certified PySpark Developer from Databricks
October 1, 2021 - August 1, 2022

Industry Experience

Financial Services, Manufacturing, Real Estate & Construction, Software & Internet, Transportation & Logistics