Available to hire
I am a seasoned Data Engineer with 5+ years of experience delivering enterprise-grade data solutions across GCP, AWS, and Azure. I design, build, and optimize scalable data ecosystems to enable seamless ingestion, transformation, and analytics for fraud detection, healthcare, and financial services.
I specialize in real-time and batch data processing, ETL/ELT pipelines, and ML deployments using Vertex AI, SageMaker, and Azure ML. I enjoy collaborating with cross-functional teams to translate complex requirements into robust, cost-efficient data platforms while ensuring governance and security.
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Language
Afar
Advanced
Javanese
Intermediate
Work Experience
Data Scientist/Engineer at US Bank
January 1, 2024 - PresentDesigned and implemented scalable ETL pipelines on AWS using Glue and PySpark, migrating legacy workflows to cloud platforms and improving performance by over 40%. Collaborated with credit risk analysts to develop SQL transformations in Snowflake and Redshift. Built real-time ingestion frameworks using Kafka and Spark Streaming targeting fraud detection. Developed custom LLMs and retrieval-augmented generation pipelines utilizing Hugging Face Transformers and Azure ML to support biomedical and regulatory document workflows. Delivered interactive dashboards with Streamlit and Power BI for fraud signal monitoring. Managed workflows with Apache Airflow and automated infrastructure provisioning using Terraform. Tuned and orchestrated batch and streaming data pipelines using Databricks, Kafka, and Snowflake stacks. Partnered with data governance teams to align data models with MISMO financial standards and established data contracts with strong compliance and observability practices.
Data Engineer/Scientist at Kasier Permanent
December 31, 2023 - September 4, 2025Developed and maintained ETL pipelines for clinical, claims, and patient data using PySpark, SQL, and Informatica, integrating diverse data sources into Teradata. Implemented data ingestion pipelines with Apache NiFi and Kafka supporting near-real-time processing for healthcare operations. Created clinician-focused dashboards using Power BI and Streamlit, facilitated GDPR- and HIPAA-compliant transformations and data masking in Snowflake, and supported data governance activities. Migrated legacy SQL Server jobs to Spark for scalability. Tuned SQL and Spark transformations and contributed to pandemic analytics dashboards using Tableau. Automated data quality monitoring and participated in agile teams to deliver clinical data products while ensuring SOX compliance.
Data Engineer/Scientist at CONTUS Tech
February 28, 2022 - September 4, 2025Built scalable ETL pipelines using Apache NiFi, PySpark, and SQL for e-commerce data ingestion and transformation. Designed Hive data warehouses with optimized partitioning and bucketing. Implemented real-time ingestion using Kafka and CDC with Sqoop and Kafka Connect. Developed Python data quality validation frameworks and automated reconciliation scripts. Supported churn prediction model preparation through feature engineering. Contributed to machine learning inference pipeline deployment using Flask on AWS EC2. Developed data lineage with Apache Atlas and participated in sprint planning and code reviews within Agile frameworks.
Data Engineer at US Bank
January 1, 2024 - November 4, 2025Architected and deployed enterprise-scale ETL/ELT pipelines using GCP Dataflow, Pub/Sub, and BigQuery to enable near real-time ingestion and processing of fraud and transactional data. Migrated legacy Teradata workloads to BigQuery and Cloud Storage, improving scalability and reducing refresh cycles by 40%. Developed PySpark transformations and SQL data models to support high-volume analytics for fraud detection. Automated 150+ daily data workflows in Airflow/Cloud Composer with robust error handling. Optimized BigQuery performance through partitioning, clustering, and caching, cutting query costs by 25%. Implemented Vertex AI fraud models with monitoring via Cloud Logging. Enabled data quality via Great Expectations in Airflow. Provided dataset provisioning through Terraform with strict IAM controls. Implemented data lineage with Dataplex and Data Catalog, and secure key management with Secret Manager. Deployed containerized ETL components on GKE and built Looker Studio dashboards for
Data Engineer at Kaiser Permanente
December 1, 2023 - December 1, 2023Engineered large-scale ETL pipelines with AWS Glue and PySpark processing 10+ TB daily. Implemented Change Data Capture with AWS DMS, replicating data from Oracle/SQL Server into Redshift and S3 for near real-time analytics. Deployed ML pipelines in SageMaker improving hospital readmission precision by 18%. Automated metadata discovery with Glue Crawlers and Athena. Built event-driven, serverless pipelines using Lambda and Step Functions. Enforced HIPAA-compliant data handling with IAM/KMS RBAC. Created Power BI dashboards reflecting patient readmission rates. Migrated static ETL workloads to Glue workflows; optimized Redshift with distribution keys, sort keys, and caching, reducing query times by 45%. Implemented AIOps monitoring via CloudWatch/Prometheus/Slack. Streaming data pipelines: Kafka → Kinesis → S3 → Redshift for population health analytics. Implemented schema evolution and lineage tracking using Python scripts with Glue Data Catalog APIs. Developed FastAPI microservic
Data Engineer at CONTUS Tech
February 1, 2022 - February 1, 2022Designed and developed ETL pipelines with Azure Data Factory and Databricks, integrating multi-source e-commerce, customer, and transactional data into a centralized analytics platform. Built and maintained Synapse Analytics data warehouse enabling cross-department analytics. Developed real-time streaming pipelines using Azure Event Hub → Databricks → Data Lake. Implemented CDC with Delta Live Tables. Created PySpark transformations in Databricks for data cleansing and enrichment. Deployed interactive Power BI dashboards tracking customer engagement, churn prediction, and operational KPIs. Implemented metadata-driven automation frameworks generating ADF pipelines, reducing onboarding time. Integrated Databricks, Synapse, and Power BI into unified analytics. Implemented governance with Azure Purview; secure access via AAD and Key Vault; data masking. Performance tuning in ADF/Databricks with parallel copy, partition pruning, and parquet compression, cutting runtimes by 40%. Collabor
Education
at Vidya Jyothi Institute of Technology
January 11, 2030 - September 4, 2025Master's in Information Science at Concordia University, St. Paul
January 1, 2023 - January 1, 2024Qualifications
Industry Experience
Financial Services, Healthcare, Life Sciences, Retail, Software & Internet, Professional Services
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Minneapolis today.