Available to hire
I am a data engineer focused on building scalable, governable analytics and GenAI-powered search on AWS and Databricks. I design end-to-end data pipelines and ML-driven features that empower fast, reliable insights in regulated environments.
Across my roles at Gilead and GlobalData, I delivered significant improvements in performance and reliability—delivering ~50% faster document answers, 99.9% DAG reliability, and automated CI/CD with GitHub Actions—while upholding GxP controls and audit trails.
Skills
See more
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Language
English
Fluent
Work Experience
Data Engineer at Gilead Sciences
July 1, 2024 - October 31, 2025Designed and deployed scalable big data pipelines using AWS Glue, S3, and EMR to process GxP-compliant datasets for high-volume analytics. Shipped Bedrock-powered RAG (Claude 3.7 Sonnet + Titan embeddings; Cohere Re-Rank) on Databricks Vector Search, delivering 50% faster document answers with GxP audit trails. Built PySpark ingestion and indexing with idempotent merges and backfills, ensuring zero data loss and low-latency vector refresh cycles. Automated metadata extraction and implemented hybrid vector search with re-ranking to deliver top-k relevance with minimal end-to-end latency. Integrated the application with the enterprise UI using secure Okta SSO and AD-based RBAC. Standardized CI/CD with GitHub Actions: versioned artifacts, environment gates, smoke tests, and canary rollbacks; reduced deploy time and raised release success from 98.7% to 99.9%. Redesigned Airflow DAGs with SLAs, retries, and alerting; reduced MTTR by 40%, improved DAG success rate to 99.9%, and minimized on-
Data Engineer & Analyst at GlobalData
December 31, 2022 - December 31, 2022Built and managed automated ETL pipelines to ingest, clean, and load data into Amazon S3 and Redshift. Developed and optimized Spark jobs on AWS EMR, boosting data-processing throughput and reducing pipeline latency. Orchestrated nightly AWS Glue batch workflows with retries and alerting, achieving ~99% success across production runs. Tuned Redshift performance and SQL patterns; containerized ETL services with Docker and Kubernetes; automated recurring data validation and deployment tasks; designed PySpark/SQL workflows with star/snowflake schemas and SCDs. Delivered self-service BI in Tableau, Power BI, ThoughtSpot, and QuickSight; automated cost/usage reporting by environment to inform budget allocation.
Education
Master of Science in Business Analytics at UMass Boston
January 1, 2023 - December 31, 2024Qualifications
AWS Certified Cloud Practitioner
January 11, 2030 - October 31, 2025Machine Learning - Professional Certification
January 11, 2030 - October 31, 2025SQL (Advanced) Certificate
January 11, 2030 - October 31, 2025Industry Experience
Healthcare, Professional Services, Software & Internet
Skills
See more
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Raleigh today.