I am a data engineer with over 10 years of experience delivering enterprise-grade data engineering, AI/ML, and Generative AI solutions across financial services, healthcare, telecom, and logistics. I design scalable cloud data platforms on AWS and Azure, building high-performance data pipelines and AI-enabled analytics to empower decision-making and operational efficiency. I specialize in GenAI use cases such as intelligent search, advisor copilots, and document intelligence, with a strong focus on retrieval-augmented generation (RAG), vector databases, embeddings, and prompt orchestration. I integrate OpenAI APIs and open-source LLMs with robust MLOps, data governance, and security practices to deliver reliable, compliant AI workloads.

Sainagaraj Kumar Billakuthi

I am a data engineer with over 10 years of experience delivering enterprise-grade data engineering, AI/ML, and Generative AI solutions across financial services, healthcare, telecom, and logistics. I design scalable cloud data platforms on AWS and Azure, building high-performance data pipelines and AI-enabled analytics to empower decision-making and operational efficiency. I specialize in GenAI use cases such as intelligent search, advisor copilots, and document intelligence, with a strong focus on retrieval-augmented generation (RAG), vector databases, embeddings, and prompt orchestration. I integrate OpenAI APIs and open-source LLMs with robust MLOps, data governance, and security practices to deliver reliable, compliant AI workloads.

Available to hire

I am a data engineer with over 10 years of experience delivering enterprise-grade data engineering, AI/ML, and Generative AI solutions across financial services, healthcare, telecom, and logistics. I design scalable cloud data platforms on AWS and Azure, building high-performance data pipelines and AI-enabled analytics to empower decision-making and operational efficiency.

I specialize in GenAI use cases such as intelligent search, advisor copilots, and document intelligence, with a strong focus on retrieval-augmented generation (RAG), vector databases, embeddings, and prompt orchestration. I integrate OpenAI APIs and open-source LLMs with robust MLOps, data governance, and security practices to deliver reliable, compliant AI workloads.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert

Work Experience

Sr. Data Engineer with AI/ML at Charles Schwab
February 1, 2025 - Present
Led the design and operation of a Generative AI and ML data platform for financial advisors and internal researchers. Built LLM-powered pipelines and a Retrieval-Augmented Generation (RAG) architecture linking vector stores with AWS S3, Redshift, and Databricks to deliver contextual responses for research queries. Implemented scalable ingestion via AWS Glue, S3, and Kinesis; automated batch and incremental processing with Step Functions. Developed ML feature pipelines (Python, PySpark) and SageMaker Feature Store to support predictive analytics. Implemented semantic search with embeddings, chunking, and metadata indexing; integrated OpenAI APIs and open-source LLMs for summarization, Q&A, and document intelligence. Managed prompt orchestration with LangChain/LangGraph, and tracked experiments with MLflow. Established CI/CD for data and ML infra using AWS CodePipeline/CodeBuild; enforced data governance, encryption, IAM, and auditing. Fine-tuned transformer models with Hugging Face for
Sr. Data Engineer with AI/ML at McKesson
January 1, 2024 - November 30, 2024
Delivered an AI-enabled healthcare data platform supporting pharmaceutical supply chain analytics, document intelligence, and operational reporting across supplier and patient service datasets. Ingested structured and unstructured data from EHR, claims, and pharmacy systems; built scalable pipelines using Azure Data Factory and Azure Databricks; applied medallion architecture on Azure Data Lake Gen2. Implemented NLP processing, tokenization, embeddings, and metadata extraction for downstream document intelligence and research analytics. Built RAG retrieval pipelines for contextual search across pharmaceutical documents; prepared prompt-ready datasets and embedding stores for LLM workloads. Integrated OpenAI APIs and open-source LLMs for summarization and Q&A; maintained HIPAA-compliant security and governance. Implemented CI/CD with Azure DevOps; monitored data quality, drift, and model performance; ensured regulatory compliance.
Data Engineer at Tally Group
January 1, 2021 - August 31, 2023
Built cloud-native Azure data platform to ingest, process, and serve financial, transactional, and operational data from multiple systems. Implemented near real-time reporting via Azure Data Factory and Databricks; mediated medallion architecture (Bronze-Silver-Gold) with seamless ingestion, transformation, and provisioning for BI and regulatory reporting. Implemented incremental loading, partitioning, and file optimization for cost efficiency; secured data with Azure Key Vault, managed identities, and RBAC. Integrated Databricks with Azure Synapse for downstream analytics; built streaming ingestion using Azure Event Hubs and Stream Analytics. Developed ML pipelines and feature engineering supporting inference and analytics; monitored model performance and data drift. Created REST APIs and served outputs to reporting platforms; collaborated with data science teams on enterprise AI workloads. Applied CI/CD with Azure DevOps; maintained data lineage and runbooks.
Data Engineer at Deliver4U
May 1, 2017 - December 31, 2020
Led AWS-based data platform for logistics domain, ingesting order management, delivery status, driver activity, and customer interactions from REST, RDS, SFTP, and third-party systems into S3. Built batch and incremental pipelines with AWS Glue (PySpark), optimized Parquet storage, and Redshift modeling for operational dashboards and analytics. Implemented CDC-like loading, partitioning, and data quality checks; automated orchestration with Step Functions and CloudWatch. Enforced security with IAM, KMS, and bucket policies; implemented metadata management with Glue Data Catalog. Partnered with analytics teams to translate requirements into scalable data models; built CI/CD pipelines using Git, and maintained runbooks and documentation. Enabled audit trails and data reconciliation for compliance and regulatory reporting.
ETL Developer at Bharti Airtel
June 1, 2014 - February 29, 2016
Telecom data integration for customer usage, billing, and recharge reporting. Extracted data from Oracle and flat files, transformed with business rules, and loaded into the data warehouse to support daily operational and management reporting. Built Informatica mappings, sessions, and workflows; implemented data validation, lookups, filtering, and aggregations. Performed unit testing with SQL queries; resolved data load issues and maintained ETL job schedules. Documented data flows and runbooks; adhered to SDLC, change management, and deployment standards.

Education

Add your educational history here.

Qualifications

Add your qualifications or awards here.

Industry Experience

Financial Services, Healthcare, Telecommunications, Transportation & Logistics, Professional Services