After completing my Masters program in Data Science, I've realized that the gap between academic AI and real-world deployment is larger than I anticipated. I've worked with models in controlled environments, but I want to understand the full lifecycle: from messy data to production systems that people actually use.

Dhruv Chaubey

After completing my Masters program in Data Science, I've realized that the gap between academic AI and real-world deployment is larger than I anticipated. I've worked with models in controlled environments, but I want to understand the full lifecycle: from messy data to production systems that people actually use.

Available to hire

After completing my Masters program in Data Science, I’ve realized that the gap between academic AI and real-world deployment is larger than I anticipated. I’ve worked with models in controlled environments, but I want to understand the full lifecycle: from messy data to production systems that people actually use.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Work Experience

Data Science Analyst at Rebecca Everlene Trust Company
July 31, 2025 - August 6, 2025
Enabled self-serve analytics for stakeholders by integrating Generative AI-powered natural language querying into Tableau dashboards, significantly improving reporting turnaround times. Developed AI-augmented autonomous agent pipelines for data ingestion, quality assurance, and real-time reporting, increasing executive decision speed by 25%. Designed RAG pipelines using LangChain and vector databases to enhance contextual search for data needs and policy alignment. Automated ETL workflows using SQL and LLM-based agents to ensure scalable, adaptive data transformation across multiple sources.
Data Scientist at Neurotech R3, Inc
May 31, 2024 - August 6, 2025
Built and deployed a robust AWS recommendation system using SageMaker for model training and Lambda for automating real-time data processing. Streamlined MLOps pipelines integrating Jenkins CI/CD with AWS SageMaker and Glue, reducing latency by 15% and ensuring production-grade reliability. Developed comprehensive ML observability dashboards using Prometheus and Grafana to monitor model drift, latency, and feature importance. Configured Lambda functions for triggering models on S3 bucket uploads and saving predictions back to S3.
Research Assistant at NJIT
May 31, 2024 - August 6, 2025
Designed BERT models to predict loan outcomes from credit data focused on architecture, experimentation, and extracting financial insights. Developed fine-tuned NLP systems using FinBERT and Longformer models to extract entities and relationships from financial texts, validated with statistical testing.
Data Scientist at IBM
May 31, 2022 - August 6, 2025
Defined project scope, timelines, coordinated stakeholders, and deployed solutions for data engineering projects resulting in measurable business impact. Engineered PySpark pipelines processing 500 GB/day risk data and integrated Kafka streams with exactly-once semantics. Designed DBSCAN+Isolation-Forest anomaly detectors generating real-time labels for financial transactions, reducing fraud by 12%. Applied CUDA kernel fusion to fraud detection CNNs improving throughput by 35%. Trained churn prediction and revenue estimation models using LightGBM and XGBoost on user data deployed via AWS SageMaker with hyperparameter tuning. Developed deep learning models (CNN, RNN, LSTM) for credit risk forecasting using structured transaction and time-series data. Collaborated with data engineers and security teams to build ML registries using MLflow for audit-ready AI deployments.
Data Scientist at Kappium LLC
July 1, 2025 - Present
Designed and deployed production-grade RAG pipelines using LangChain, Pinecone vector stores, and Azure ML-hosted GPT-4 and Llama models; reduced prior-authorization lookup time by 50% and met strict latency SLAs. Built context-aware retrieval across structured and unstructured healthcare data, improving response relevance and downstream decision accuracy. Implemented scalable document ingestion, embedding, and semantic search optimization enabling real-time AI-driven insights for clinical and policy validation. Integrated RAG into FastAPI-based microservices with monitoring and evaluation pipelines (MLflow, Evidently AI), enhancing reliability and reducing incident frequency.
AI Data Analyst at Rebecca Everlene Trust Company
September 1, 2024 - July 1, 2025
Enabled self-serve analytics by integrating GenAI-powered natural-language querying into Tableau dashboards; cut reporting turnaround by 50%. Built AI-augmented data pipelines for ingestion, transformation, and quality validation; accelerated decision-making across risk, portfolio, and policy analytics. Conducted advanced data analysis using SQL and Python, delivering actionable insights for financial datasets.
Data Scientist - Capstone at Neurotechr3, Inc.
February 1, 2024 - May 1, 2024
Designed and implemented a recommendation system integrated with game-based therapy for neurological disorders and cerebral palsy, enabling real-time tracking of patient progress. Delivered predictive insights by automating ETL and inference pipelines for structured and unstructured data.
Teaching Assistant at New Jersey Institute of Technology
September 1, 2023 - May 1, 2024
Mentored 100+ students through personalized instruction and discussions; developed and validated NLP systems (fine-tuned FinBERT, Longformer) to extract entities and relationships from financial texts.
Software Developer at IBM
April 1, 2018 - May 1, 2022
Engineered large-scale data pipelines using PySpark and Kafka processing 500GB+ of data daily; designed ETL backends using SQL, Spark, and AWS to reduce data processing latency by up to 40%. Implemented CI/CD pipelines and MLOps practices using Docker, Kubernetes, and MLflow for scalable deployment and governance of data and ML systems.

Education

MSDS at New Jersey Institute of Technology, NJ
September 1, 2022 - May 1, 2024
BS at Uttar Pradesh Technical University
August 1, 2013 - June 1, 2017
Master of Science in Data Science at New Jersey Institute of Technology
January 11, 2030 - January 1, 2024
Bachelor of Technology in Computer Engineering at Uttar Pradesh Technical University
January 11, 2030 - January 1, 2017

Qualifications

Add your qualifications or awards here.

Industry Experience

Financial Services, Healthcare, Retail, Software & Internet, Education, Life Sciences, Professional Services, Media & Entertainment

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more