Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments. Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.…I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments. Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.

Ruchitha Sama

Data Scientist, Data Analyst, Full Stack Developer, +2





I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments. Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.…I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments. Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.

Available to hire

I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments.

Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

English

Fluent

Work Experience

Data Engineer at Gilead Sciences

July 1, 2024 - October 31, 2025

Designed and deployed scalable big data pipelines using AWS Glue, S3, and EMR to process GxP-compliant datasets for high-volume analytics. Shipped Bedrock-powered RAG (Claude 3.7 Sonnet + Titan embeddings; Cohere Re-Rank) on Databricks Vector Search, delivering 50% faster document answers with GxP audit trails. Built PySpark ingestion and indexing with idempotent merges and backfills, ensuring zero data loss and low-latency vector refresh cycles. Automated metadata extraction and implemented hybrid vector search with re-ranking to deliver top-k relevance with minimal end-to-end latency. Integrated the application with the enterprise UI using secure Okta SSO and AD-based RBAC. Standardized CI/CD with GitHub Actions: versioned artifacts, environment gates, smoke tests, and canary rollbacks; reduced deploy time and raised release success from 98.7% to 99.9%. Redesigned Airflow DAGs with SLAs, retries, and alerting; reduced MTTR by 40%, improved DAG success rate to 99.9%, and minimized on-

Data Engineer & Analyst at GlobalData

December 31, 2022 - December 31, 2022

Built and managed automated ETL pipelines to ingest, clean, and load data into Amazon S3 and Redshift. Developed and optimized Spark jobs on AWS EMR, boosting data-processing throughput and reducing pipeline latency. Orchestrated nightly AWS Glue batch workflows with retries and alerting, achieving ~99% success across production runs. Tuned Redshift performance and SQL patterns; containerized ETL services with Docker and Kubernetes; automated recurring data validation and deployment tasks; designed PySpark/SQL workflows with star/snowflake schemas and SCDs. Delivered self-service BI in Tableau, Power BI, ThoughtSpot, and QuickSight; automated cost/usage reporting by environment to inform budget allocation.