Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I'm Sayali Deshmukh, an AI/ML Engineer with over 4 years of experience architecting and deploying end-to-end machine learning systems, large language models, and real-time AI services across enterprise and consumer platforms. I specialize in transformer-based architectures (LLaMA 2/3, GPT, CLIP), NLP, and recommender systems, with a focus on responsible AI practices and production-grade deployments across cloud, multi-cloud, and edge environments. I thrive in fast-paced Agile teams, continuously optimizing models and pipelines to deliver measurable business impact. I have a proven track record of building production-ready solutions using PyTorch, TensorFlow, and Hugging Face Transformers, implementing CI/CD automation, and delivering scalable AI services. My work spans in-database inference acceleration, retrieval-augmented generation, feature engineering pipelines, and real-time inference, with strong emphasis on governance, data privacy, and regulatory compliance.…I'm Sayali Deshmukh, an AI/ML Engineer with over 4 years of experience architecting and deploying end-to-end machine learning systems, large language models, and real-time AI services across enterprise and consumer platforms. I specialize in transformer-based architectures (LLaMA 2/3, GPT, CLIP), NLP, and recommender systems, with a focus on responsible AI practices and production-grade deployments across cloud, multi-cloud, and edge environments. I thrive in fast-paced Agile teams, continuously optimizing models and pipelines to deliver measurable business impact. I have a proven track record of building production-ready solutions using PyTorch, TensorFlow, and Hugging Face Transformers, implementing CI/CD automation, and delivering scalable AI services. My work spans in-database inference acceleration, retrieval-augmented generation, feature engineering pipelines, and real-time inference, with strong emphasis on governance, data privacy, and regulatory compliance.

Sayali Deshmukh

AI Engineer, Developer, Data Scientist, +3





I'm Sayali Deshmukh, an AI/ML Engineer with over 4 years of experience architecting and deploying end-to-end machine learning systems, large language models, and real-time AI services across enterprise and consumer platforms. I specialize in transformer-based architectures (LLaMA 2/3, GPT, CLIP), NLP, and recommender systems, with a focus on responsible AI practices and production-grade deployments across cloud, multi-cloud, and edge environments. I thrive in fast-paced Agile teams, continuously optimizing models and pipelines to deliver measurable business impact. I have a proven track record of building production-ready solutions using PyTorch, TensorFlow, and Hugging Face Transformers, implementing CI/CD automation, and delivering scalable AI services. My work spans in-database inference acceleration, retrieval-augmented generation, feature engineering pipelines, and real-time inference, with strong emphasis on governance, data privacy, and regulatory compliance.…I'm Sayali Deshmukh, an AI/ML Engineer with over 4 years of experience architecting and deploying end-to-end machine learning systems, large language models, and real-time AI services across enterprise and consumer platforms. I specialize in transformer-based architectures (LLaMA 2/3, GPT, CLIP), NLP, and recommender systems, with a focus on responsible AI practices and production-grade deployments across cloud, multi-cloud, and edge environments. I thrive in fast-paced Agile teams, continuously optimizing models and pipelines to deliver measurable business impact. I have a proven track record of building production-ready solutions using PyTorch, TensorFlow, and Hugging Face Transformers, implementing CI/CD automation, and delivering scalable AI services. My work spans in-database inference acceleration, retrieval-augmented generation, feature engineering pipelines, and real-time inference, with strong emphasis on governance, data privacy, and regulatory compliance.

Available to hire

I’m Sayali Deshmukh, an AI/ML Engineer with over 4 years of experience architecting and deploying end-to-end machine learning systems, large language models, and real-time AI services across enterprise and consumer platforms. I specialize in transformer-based architectures (LLaMA 2/3, GPT, CLIP), NLP, and recommender systems, with a focus on responsible AI practices and production-grade deployments across cloud, multi-cloud, and edge environments. I thrive in fast-paced Agile teams, continuously optimizing models and pipelines to deliver measurable business impact.

I have a proven track record of building production-ready solutions using PyTorch, TensorFlow, and Hugging Face Transformers, implementing CI/CD automation, and delivering scalable AI services. My work spans in-database inference acceleration, retrieval-augmented generation, feature engineering pipelines, and real-time inference, with strong emphasis on governance, data privacy, and regulatory compliance.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

English

Fluent

Hindi

Advanced

Marathi (Marāṭhī)

Advanced

Work Experience

AI/ML Engineer at Snowflake

November 1, 2023 - November 12, 2025

Contributed to development and deployment of ML models (classification, NLP, recommender systems) in Snowpark ML; improved in-database inference efficiency by 32%. Built feature engineering workflows processing 50+ TB daily using Snowflake Data Cloud; assisted in retrieval-augmented generation pipelines with Cortex AI embeddings to reduce hallucinations by 27%. Implemented end-to-end ETL → training → serving pipelines with Airflow and dbt; integrated Hugging Face transformers for custom LLM fine-tuning with data governance; developed real-time inference services on Kubernetes + Docker with sub-100ms latency; supported ML experiment tracking and governance with MLflow and Snowflake dashboards; built Streamlit dashboards inside Snowflake for 300+ users; contributed to MLOps with Terraform, Argo CD, Prometheus-Grafana; collaborated to ingest multi-cloud data sources, reducing integration costs by 35%; enhanced LLM summarization and chat features; productized ML solutions via Snowflake

AI/ML Engineer at Snowflake

November 1, 2023 - Present

Contributed to development and deployment of ML models (classification, NLP, and recommender systems) in Snowpark ML, improving in-database inference efficiency by 32% and reducing reliance on external data pipelines. Built feature engineering workflows in Snowflake Data Cloud using SQL, Python, and Snowpark APIs, enabling processing of 50+ TB of structured/unstructured data daily for ML use cases. Assisted in building retrieval-augmented generation (RAG) pipelines with Cortex AI embeddings and vector search, reducing hallucination rates in enterprise AI applications by 27%. Implemented ETL → training → serving pipelines with Airflow and dbt, improving reproducibility and reducing pipeline latency by 40%. Integrated Hugging Face transformers into Snowflake ML workflows for custom LLM fine-tuning, supporting governance and data compliance. Built real-time inference services using PyTorch, TensorFlow, and Snowpark ML deployed on Kubernetes + Docker, achieving sub-100ms response times

AI/ML Engineer at Snowflake

November 1, 2023 - Present

Contributed to development and deployment of ML models (classification, NLP, recommender systems) in Snowpark ML, improving in-database inference efficiency by 32% and reducing reliance on external data pipelines. Built feature engineering workflows in Snowflake Data Cloud using SQL, Python, and Snowpark APIs, enabling processing of 50+ TB of structured/unstructured data daily for ML use cases. Assisted in building retrieval-augmented generation (RAG) pipelines with Cortex AI embeddings and vector search, helping reduce hallucination rates by 27% in enterprise-facing AI applications. Implemented components of ETL → training → serving pipelines with Airflow and dbt, improving reproducibility and reducing pipeline latency by 40%. Integrated Hugging Face transformers into Snowflake ML workflows for custom LLM fine-tuning, supporting compliance with data governance requirements. Built real-time inference services using PyTorch, TensorFlow, and Snowpark ML deployed on Kubernetes + Docke

AI/ML Engineer at Snowflake, CA, USA

November 1, 2023 - Present

Contributed to the development and deployment of ML models including classification, NLP, and recommender systems in Snowpark ML, improving inference efficiency by 32%. Developed feature engineering workflows processing 50+ TB of data daily. Built retrieval-augmented generation pipelines reducing hallucination rates by 27%. Implemented ETL, training, and serving pipelines with Airflow and dbt, reducing latency by 40%. Integrated Hugging Face transformers for custom LLM fine-tuning ensuring data governance. Developed real-time AI inference services with sub-100ms response time on Kubernetes and Docker. Supported experiment tracking and model governance with MLflow, improving deployment velocity by 25%. Created interactive AI dashboards with Streamlit for 300+ users and contributed to MLOps workflows ensuring 99.99% uptime. Collaborated to unify multi-cloud data sources, lowering integration costs by 35%. Enhanced LLM-powered summarization and chatbot features, improving customer support

AI/ML Engineer at IBM

March 31, 2022 - March 31, 2022

Developed scalable ML models for classification, forecasting, and anomaly detection using scikit-learn, XGBoost, and LightGBM; improved operational risk scoring accuracy by 31% for financial services and ERP clients. Built enterprise NLP systems using spaCy, BERT, NLTK, and Hugging Face Transformers for NER, text classification, and document parsing; developed multilingual sentiment analysis for English, Hindi, and Marathi. Engineered ETL pipelines using Spark, PySpark, Presto, and Hive with DVC and Airflow; deployed ML services with Docker, TorchServe, and Triton; managed on Kubernetes clusters across AWS SageMaker, Azure ML Studio, and GCP Vertex AI; applied quantization and TorchScript to accelerate models; implemented MLOps with MLflow, GitHub Actions, Jenkins; monitored model performance with Prometheus, Grafana, and dashboards; embedded Responsible AI (SHAP, LIME, Fairlearn) and collaborated on GDPR, RBI, KYC, AML compliance; worked in Agile/Scrum teams with cross-functional stak

AI/ML Engineer at IBM

March 1, 2022 - October 8, 2025

Developed scalable ML models for classification, forecasting, and anomaly detection using scikit-learn, XGBoost, and LightGBM, improving operational risk scoring accuracy across financial services and ERP clients. Designed enterprise NLP systems using spaCy, BERT, NLTK, and Hugging Face Transformers for NER, text classification, and document parsing to process unstructured reports, contracts, and user feedback. Built multilingual sentiment analysis models integrated into customer service platforms across English, Hindi, and Marathi. Engineered ETL pipelines using Apache Spark, PySpark, Presto, and Hive, integrated with DVC and Airflow for full ML lifecycle tracking. Deployed ML services using Docker, TorchServe, and Triton Inference Server, managed on Kubernetes clusters hosted across AWS SageMaker, Azure ML Studio, and GCP Vertex AI, with real-time inference latency <80ms. Applied quantization (INT8 and 4-bit) and TorchScript optimization to accelerate deep learning models on cloud an

AI/ML Engineer at IBM

March 1, 2022 - October 8, 2025

Developed and deployed scalable ML models for classification, forecasting, and anomaly detection using scikit-learn, XGBoost, and LightGBM. Designed enterprise-grade NLP systems using spaCy, BERT, NLTK, and Hugging Face Transformers for named entity recognition (NER), text classification, and document parsing. Built multilingual sentiment analysis models integrated into customer service platforms, enhancing automation and customer satisfaction tracking across English, Hindi, and Marathi. Engineered ETL pipelines using Apache Spark, PySpark, Presto, and Hive, integrated with DVC and Airflow for full ML lifecycle tracking, leading to a 45% improvement in time-to-deploy. Deployed ML services using Docker, TorchServe, and Triton Inference Server, managed on Kubernetes clusters hosted across AWS SageMaker, Azure ML Studio, and GCP Vertex AI, with real-time inference latency <80ms. Applied quantization (INT8 and 4-bit) and TorchScript optimization to accelerate deep learning models on cloud

AI/ML Engineer at IBM, India

March 1, 2022 - September 4, 2025

Developed scalable ML models for classification, forecasting, and anomaly detection, improving risk scoring accuracy by 31%. Designed enterprise NLP systems for NER, text classification, and document parsing. Built multilingual sentiment analysis models supporting English, Hindi, and Marathi. Engineered ETL pipelines with Apache Spark and integrated DVC and Airflow, improving deployment speed by 45%. Deployed ML services on Kubernetes clusters using Docker, TorchServe, and Triton Inference Server with under 80ms latency. Applied model quantization and TorchScript optimization, boosting performance by 40%. Implemented MLOps pipelines with MLflow and CI/CD tools. Monitored models with Prometheus, Grafana, and Power BI dashboards triggering auto-retraining. Embedded Responsible AI practices with SHAP, LIME, and Fairlearn ensuring regulatory compliance. Collaborated within Agile teams and conducted workshops upskilling 30+ engineers.