I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments. Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.

Ruchitha Sama

I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments. Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.

Available to hire

I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments.

Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert

Language

English
Fluent

Work Experience

Data Engineer at Gilead Sciences
July 1, 2024 - October 31, 2025
Designed and deployed scalable big data pipelines using AWS Glue, S3, and EMR to process GxP-compliant datasets for high-volume analytics. Shipped Bedrock-powered RAG (Claude 3.7 Sonnet + Titan embeddings; Cohere Re-Rank) on Databricks Vector Search, delivering 50% faster document answers with GxP audit trails. Built PySpark ingestion and indexing with idempotent merges and backfills, ensuring zero data loss and low-latency vector refresh cycles. Automated metadata extraction and implemented hybrid vector search with re-ranking to deliver top-k relevance with minimal end-to-end latency. Integrated the application with the enterprise UI using secure Okta SSO and AD-based RBAC. Standardized CI/CD with GitHub Actions: versioned artifacts, environment gates, smoke tests, and canary rollbacks; reduced deploy time and raised release success from 98.7% to 99.9%. Redesigned Airflow DAGs with SLAs, retries, and alerting; reduced MTTR by 40%, improved DAG success rate to 99.9%, and minimized on-
Data Engineer & Analyst at GlobalData
December 31, 2022 - December 31, 2022
Built and managed automated ETL pipelines to ingest, clean, and load data into Amazon S3 and Redshift. Developed and optimized Spark jobs on AWS EMR, boosting data-processing throughput and reducing pipeline latency. Orchestrated nightly AWS Glue batch workflows with retries and alerting, achieving ~99% success across production runs. Tuned Redshift performance and SQL patterns; containerized ETL services with Docker and Kubernetes; automated recurring data validation and deployment tasks; designed PySpark/SQL workflows with star/snowflake schemas and SCDs. Delivered self-service BI in Tableau, Power BI, ThoughtSpot, and QuickSight; automated cost/usage reporting by environment to inform budget allocation.

Education

Master of Science in Business Analytics at UMass Boston
January 1, 2023 - December 31, 2024

Qualifications

AWS Certified Cloud Practitioner
January 11, 2030 - October 31, 2025
Machine Learning - Professional Certification
January 11, 2030 - October 31, 2025
SQL (Advanced) Certificate
January 11, 2030 - October 31, 2025

Industry Experience

Healthcare, Professional Services, Software & Internet