Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hi, I’m Soumya B Addham, a data scientist with 4+ years of experience in machine learning, NLP, time-series forecasting, and GenAI. I enjoy turning complex data into practical solutions that empower teams and drive business outcomes.\n\nI’ve built production pipelines, NLP models, and MLOps workflows, and I love delivering insightful dashboards that make data accessible across stakeholders. I’m open to new opportunities and relocating as needed.…Hi, I’m Soumya B Addham, a data scientist with 4+ years of experience in machine learning, NLP, time-series forecasting, and GenAI. I enjoy turning complex data into practical solutions that empower teams and drive business outcomes.\n\nI’ve built production pipelines, NLP models, and MLOps workflows, and I love delivering insightful dashboards that make data accessible across stakeholders. I’m open to new opportunities and relocating as needed.

Soumya Baddham

Data Scientist, AI Engineer, Data Analyst, +2





Hi, I’m Soumya B Addham, a data scientist with 4+ years of experience in machine learning, NLP, time-series forecasting, and GenAI. I enjoy turning complex data into practical solutions that empower teams and drive business outcomes.\n\nI’ve built production pipelines, NLP models, and MLOps workflows, and I love delivering insightful dashboards that make data accessible across stakeholders. I’m open to new opportunities and relocating as needed.…Hi, I’m Soumya B Addham, a data scientist with 4+ years of experience in machine learning, NLP, time-series forecasting, and GenAI. I enjoy turning complex data into practical solutions that empower teams and drive business outcomes.\n\nI’ve built production pipelines, NLP models, and MLOps workflows, and I love delivering insightful dashboards that make data accessible across stakeholders. I’m open to new opportunities and relocating as needed.

Available to hire

Hi, I’m Soumya B Addham, a data scientist with 4+ years of experience in machine learning, NLP, time-series forecasting, and GenAI. I enjoy turning complex data into practical solutions that empower teams and drive business outcomes.\n\nI’ve built production pipelines, NLP models, and MLOps workflows, and I love delivering insightful dashboards that make data accessible across stakeholders. I’m open to new opportunities and relocating as needed.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Language

English

Fluent

Work Experience

Data Scientist at Morgan Stanley USA

November 1, 2024 - Present

Led the development of large-scale financial data pipelines (PySpark, AWS Glue, and Athena), restructuring ETL flows and enforcing schema validation to improve intraday data availability by 38% and meet regulatory data quality standards. Built a production-ready Retrieval-Augmented Generation (RAG) workflow with LangChain and GPT-4, reducing analyst research time by 55% and increasing the accuracy of investment insights through enhanced retrieval precision. Developed transformer-based NLP models (RoBERTa) and Named Entity Recognition (NER) to extract counterparty, transaction, and risk-related entities from unstructured reports, boosting extraction accuracy from 82% to 94%. Implemented real-time trade-surveillance analytics using AWS Kinesis and TensorFlow, enabling proactive anomaly detection and reducing false alerts by 22%. Strengthened MLOps with model versioning, drift checks, and CI/CD pipelines, reducing release cycles by 40% while ensuring regulatory compliance. Produced intera

Jr. Data Scientist at Zensar Technologies India

January 1, 2020 - December 1, 2022

Built supervised ML models (Decision Trees, Random Forests, SVM, Naive Bayes, XGBoost) to predict customer churn and support-ticket patterns, achieving up to 92% accuracy and enabling early retention interventions. Developed time-series forecasting models using ARIMA and Facebook Prophet to project weekly demand, reducing forecast error by 27%. Led Exploratory Data Analysis (EDA) and hypothesis testing on 10M+ transactional records to identify revenue leakage and process gaps, contributing to an 18% improvement in operational efficiency. Created CNN-based image classification pipelines using Keras and PyTorch to achieve 96% precision on defect-detection tasks for a manufacturing client, reducing manual quality checks by 40%. Automated ETL pipelines in Python (NumPy, Pandas) for weekly operations, reducing processing time from 3 hours to 20 minutes, and designed Power BI dashboards to cut manual reporting time by 50%.