I am a senior AI/ML Data Engineer who designs and delivers scalable data and ML platforms across healthcare, finance, and retail. I specialize in end-to-end LLM-based NLP and RAG systems, using models like BERT, BioBERT, Flan-T5, GPT-4, and LLaMA-2, together with LangChain, Pinecone, and FAISS to enable high-precision semantic search, summarization, and context-aware retrieval. My work spans data engineering, MLOps, and governance to produce reliable, production-grade AI solutions that inform strategic decisions. I enjoy architecting lakehouses and modern data ecosystems across AWS, Azure, and GCP, ensuring reproducible training, real-time inference, and continuous monitoring of data quality and embedding performance. I thrive collaborating with cross-functional teams to operationalize AI responsibly, maintain compliance, and deliver actionable insights through dashboards and ML-driven metrics.

Meghana Reddy Ganugapenta

I am a senior AI/ML Data Engineer who designs and delivers scalable data and ML platforms across healthcare, finance, and retail. I specialize in end-to-end LLM-based NLP and RAG systems, using models like BERT, BioBERT, Flan-T5, GPT-4, and LLaMA-2, together with LangChain, Pinecone, and FAISS to enable high-precision semantic search, summarization, and context-aware retrieval. My work spans data engineering, MLOps, and governance to produce reliable, production-grade AI solutions that inform strategic decisions. I enjoy architecting lakehouses and modern data ecosystems across AWS, Azure, and GCP, ensuring reproducible training, real-time inference, and continuous monitoring of data quality and embedding performance. I thrive collaborating with cross-functional teams to operationalize AI responsibly, maintain compliance, and deliver actionable insights through dashboards and ML-driven metrics.

Available to hire

I am a senior AI/ML Data Engineer who designs and delivers scalable data and ML platforms across healthcare, finance, and retail. I specialize in end-to-end LLM-based NLP and RAG systems, using models like BERT, BioBERT, Flan-T5, GPT-4, and LLaMA-2, together with LangChain, Pinecone, and FAISS to enable high-precision semantic search, summarization, and context-aware retrieval. My work spans data engineering, MLOps, and governance to produce reliable, production-grade AI solutions that inform strategic decisions.

I enjoy architecting lakehouses and modern data ecosystems across AWS, Azure, and GCP, ensuring reproducible training, real-time inference, and continuous monitoring of data quality and embedding performance. I thrive collaborating with cross-functional teams to operationalize AI responsibly, maintain compliance, and deliver actionable insights through dashboards and ML-driven metrics.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

Sr. AI/ML Data Engineer at Centene Corporation
September 1, 2023 - Present
Designed and built scalable ETL/ELT pipelines using Azure Data Factory, Databricks, and PySpark to ingest health care data from FHIR/HL7 APIs, lab systems, databases, and IoT sources. Automated training-data pipelines for LLM and RAG workflows, including chunking, embedding generation, metadata enrichment, and dataset versioning. Implemented semantic search and entity-linking platforms with Pinecone, Azure Cognitive Search, and FAISS for cross-dataset medical entity retrieval. Applied NLP and transformer models (BERT, BioBERT, ClinicalBERT, DeBERTa-v3) to extract clinical entities and relationships from unstructured text. Developed generative AI solutions with GPT-4 and AWS Bedrock (LLaMA-2, Claude) for summarization and clinical reporting. Standardized RAG orchestration and observability with LangGraph, LangSmith, and MCP, improving prompt tuning and explainability. Implemented LLM safety/governance controls including structured prompting, validation tests, and embedding-based consist
Sr. Data Engineer (ML) at State of California
June 1, 2022 - August 31, 2023
Designed and operated cloud-native data pipelines on GCP using BigQuery, Dataflow, and Cloud Composer to consolidate housing, health, and community data. Implemented event-driven ingestion with Pub/Sub and Cloud Functions for near real-time synchronization across agency platforms. Standardized and transformed complex datasets using MongoDB aggregations and PostgreSQL, flattening nested JSON. Established secure, schema-governed workflows with RBAC, ensuring HIPAA compliance. Enabled end-to-end observability and lineage via Cloud Monitoring, Data Catalog, and MLflow; tracked pipeline latency, schema changes, and health metrics. Orchestrated ETL and ML workloads with Airflow/Cloud Composer; automated dependency management, refresh cycles, and retraining triggers. Built curated analytical layers in BigQuery with partitioning and clustering for cost-effective queries. Enforced data quality validation with Great Expectations and Python checks. Developed predictive modeling pipelines with Ten
Cloud Data Engineer at Safeway
December 1, 2020 - May 31, 2022
Modernized legacy on-prem and Oracle data platforms by migrating to Azure Data Lake Gen2 and Azure Synapse, enabling elastic scalability and governance. Built high-performance lakehouse pipelines with Delta Lake, Databricks, and Azure Data Factory, leveraging Photon acceleration and Unity Catalog for optimized processing and access control. Implemented event-driven data workflows with Azure Functions and Event Grid; automated downstream refreshes and alerts. Established batch and streaming ingestion using Apache NiFi, Kafka, Azure Event Hubs, and Databricks Structured Streaming. Organized Bronze/Silver/Gold lakehouse layers with Delta Lake, applying Z-ordering, partitioning, and time travel for analytics and auditability. Tuned PySpark and SQL workloads to reduce compute costs. Developed feature-engineering pipelines for forecasting, demand planning, and churn modeling. Applied time-series modeling in Python and PySpark to improve forecast accuracy. Managed ML experiments with MLflow;
Data Engineer at Edward Jones
January 1, 2019 - November 30, 2020
Modernized Hadoop and on-premise data platforms by transitioning to a hybrid cloud across AWS and Oracle Cloud; migrated Spark/EMR workloads for improved resilience and performance. Maintained Cloudera Hadoop stack while progressively moving workloads to Spark on EMR. Replaced Oozie with Airflow; implemented near real-time eventing via Kafka. Consolidated large Hive workloads into Spark-based pipelines and built reusable Spark SQL transformation layers with audit fields and lineage metadata for compliance. Implemented data masking and fine-grained access controls using IAM and Redshift security features. Partnered with security, infra, and audit teams to ensure governance and IAM consistency across environments. Containerized data processing services with Docker and deployed to EKS for scalable execution. Delivered Power BI reporting for fraud detection and transaction trends; produced audit-ready architecture and lineage documentation for governance.
Associate Python Developer at Dell Technologies
July 1, 2016 - November 30, 2018
Performed exploratory data analysis (EDA) to detect data quality issues and improve data integrity. Enhanced ETL workflows using SQL and Hive across Oracle and SQL Server; enabled cross-functional analytics delivery. Optimized SQL Server/Oracle queries via indexing and partitioning; reduced query times. Conducted time-series and large-scale data analysis to improve forecast accuracy. Built interactive BI dashboards in Tableau and Excel; automated Excel-based workflows with VBA and piloted Airflow. Implemented IAM and CloudTrail for security and audit traceability. Established data governance foundations with data dictionaries and metadata standards; supported end-to-end ETL and reporting releases in a Waterfall model. Containerized data processing services with Docker and deployed to Kubernetes for scalable execution.

Education

Bachelor of Technology in Electronics and Communication Engineering at SVCN, Nellore, India
January 11, 2030 - January 1, 2016

Qualifications

Add your qualifications or awards here.

Industry Experience

Healthcare, Financial Services, Retail, Government, Education