Available to hire
Hi, I am Subhaj Sapkota, a Data Engineer and Data Analyst with a track record of building and optimizing large-scale data pipelines, real-time streaming solutions, and cloud analytics across AWS, Azure, and GCP. I design scalable data models, improve ETL performance, and deliver actionable insights for finance, aviation, and digital payments.
I enjoy collaborating with cross-functional teams of engineers, data scientists, and business stakeholders to translate requirements into analytics workflows, deploy ML-enabled insights, and deliver dashboards that empower decision making.
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Language
English
Fluent
Work Experience
Data Engineer / Data Analyst at PayPal
September 1, 2023 - PresentCollaborated with business stakeholders to translate analytical requirements into scalable data solutions that improved customer insights, operational reporting, and decision-making. Worked with cross-functional teams (data engineers, data scientists, ML engineers, analysts) to design analytics workflows for predictive modeling, customer segmentation, anomaly detection, and performance tracking. Analyzed high-volume transactional data using Spark, Azure Databricks, and Spark-SQL to deliver real-time customer behavior insights. Supported Azure Data Factory pipeline optimization, contributing to a 50% reduction in ETL processing time for financial and operational datasets. Implemented real-time insights pipelines with Spark Streaming and Kafka on Azure HDInsight to support fraud detection and network anomaly monitoring. Improved Snowflake OLAP/OLTP models, increasing query performance by 45% and accelerating churn, engagement, and revenue analytics. Automated ML retraining pipelines with
Data Engineer / Data Analyst at American Airlines
October 1, 2021 - August 31, 2023Collaborated with stakeholders to gather requirements and design scalable data solutions for flight operations, customer analytics, and revenue management. Developed Spark applications (Spark-SQL) on AWS EMR to process high-volume transactional data, enabling real-time insights into customer behavior and flight performance. Implemented dbt and efficient data structures in Spark and Snowflake to streamline ETL processes and enhance data retrieval speeds. Built real-time data streaming pipelines using Spark Streaming and Kafka on AWS MSK, enabling event-driven architectures for flight delay prediction and baggage tracking. Optimized BigQuery data models on GCP for customer analytics dashboards, boosting query performance. Migrated legacy MapReduce programs to Spark (Scala/PySpark), reducing processing time for flight logs. Automated machine learning model retraining using Apache Airflow on GCP, streamlining the deployment of predictive models for flight demand forecasting and dynamic pri
Data Engineer at Visa
July 1, 2020 - September 30, 2021Gathered and analyzed business requirements to design scalable data solutions for processing high-volume payment transactions in line with global security and compliance standards. Developed and maintained ETL pipelines using Spark, PySpark, and Scala, optimizing transaction data processing and overall pipeline efficiency. Built and optimized real-time fraud detection pipelines using Kafka, Spark Streaming, and Flink to support risk scoring across payment networks. Designed AWS-based data pipelines leveraging Lambda, API Gateway, S3, and DynamoDB to automate event-driven fraud monitoring. Integrated merchant, issuer, and cardholder datasets into Lambda and Databricks for real-time financial reporting. Automated AML and compliance monitoring workflows using Airflow, Snowflake, and SQL. Optimized enterprise data warehouse schemas (Star and Snowflake) to reduce query latency for analysts. Created interactive dashboards with Looker and Power BI for finance and risk teams. Migrated OBIEE re
Data Engineer/Data Analyst at PayPal
September 1, 2023 - PresentCollaborated with business stakeholders to translate analytical requirements into scalable data solutions that improved customer insights, operational reporting, and decision-making. Analyzed high-volume transactional data with Spark, Azure Databricks, and Spark-SQL to deliver real-time customer behavior insights. Optimized Azure Data Factory pipelines, contributing to a 50% reduction in ETL processing time for financial and operational datasets. Implemented real-time fraud detection pipelines using Spark Streaming and Kafka on Azure HDInsight, supported automated ML model retraining with Apache Airflow, and enhanced Snowflake OLAP/OLTP models to boost query performance by 45% for churn and revenue analytics. Migrated legacy MapReduce workloads to Spark/Scala, enabling faster network log analysis. Built real-time dashboards with Power BI and migrated OBIEE reports to Power BI to enable self-service analytics. Contributed to automated ML workflows using AWS Step Functions and containeri
Data Engineer/Data Analyst at American Airlines
October 1, 2021 - August 1, 2023Collaborated with stakeholders to design scalable data solutions for flight operations, customer analytics, and revenue management. Developed Spark applications in AWS EMR to process high-volume booking and customer data, enabling real-time insights into flight performance and customer behavior. Implemented dbt and optimized Spark and Snowflake data structures to streamline ETL and data retrieval. Designed data pipelines using GCP Dataflow, reducing ETL execution time by 48%. Built real-time streaming pipelines with Spark Streaming and Kafka on AWS MSK for flight delay prediction and baggage tracking. Optimized BigQuery data models for customer analytics dashboards, improving query performance by 42%. Migrated legacy MapReduce programs to Spark, cutting processing time by 55%. Automated ML retraining with Airflow on GCP for demand forecasting and dynamic pricing. Deployed TensorFlow/Scikit-learn models on AWS SageMaker for predictive maintenance; created Tableau dashboards and migrated
Education
Bachelor of Arts in Econometrics at Knox College
January 11, 2030 - April 9, 2026Bachelor of Arts in Econometrics at Knox College
January 11, 2030 - April 9, 2026Bachelor of Arts in Econometrics at Knox College
January 11, 2030 - April 9, 2026Bachelor of Arts in Econometrics at Knox College
January 11, 2030 - April 9, 2026Qualifications
Microsoft Certified: Azure Data Engineer Associate
January 11, 2030 - April 9, 2026Microsoft Certified: Azure Data Engineer Associate
January 11, 2030 - April 9, 2026Microsoft Certified: Azure Data Engineer Associate
January 11, 2030 - April 9, 2026Microsoft Certified: Azure Data Engineer Associate
January 11, 2030 - April 9, 2026Industry Experience
Financial Services, Transportation & Logistics, Travel & Hospitality, Software & Internet, Professional Services, Other
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Hire a Data Analyst
We have the best data analyst experts on Twine. Hire a data analyst in Arlington today.