Hi, I'm Swapna, a Senior Data Engineer with over 7 years of experience in designing and implementing cloud-native, scalable data solutions across healthcare, telecom, and finance sectors. I specialize in building real-time data pipelines, machine learning workflows, and incorporating innovative GenAI technologies like LangChain and Azure OpenAI to accelerate business and audit processes. I enjoy solving complex data challenges and delivering actionable insights that drive operational efficiency and regulatory compliance. I have extensive hands-on expertise working with tools and platforms such as Azure, GCP, AWS, PySpark, DBT, and CI/CD pipelines. I take pride in collaborating closely with teams to modernize data platforms, optimize performance, and create intuitive dashboards using Power BI and Microsoft Fabric. I'm passionate about enabling data-driven strategies that help organizations achieve their business and compliance goals faster.

Swapna K

Hi, I'm Swapna, a Senior Data Engineer with over 7 years of experience in designing and implementing cloud-native, scalable data solutions across healthcare, telecom, and finance sectors. I specialize in building real-time data pipelines, machine learning workflows, and incorporating innovative GenAI technologies like LangChain and Azure OpenAI to accelerate business and audit processes. I enjoy solving complex data challenges and delivering actionable insights that drive operational efficiency and regulatory compliance. I have extensive hands-on expertise working with tools and platforms such as Azure, GCP, AWS, PySpark, DBT, and CI/CD pipelines. I take pride in collaborating closely with teams to modernize data platforms, optimize performance, and create intuitive dashboards using Power BI and Microsoft Fabric. I'm passionate about enabling data-driven strategies that help organizations achieve their business and compliance goals faster.

Available to hire

Hi, I’m Swapna, a Senior Data Engineer with over 7 years of experience in designing and implementing cloud-native, scalable data solutions across healthcare, telecom, and finance sectors. I specialize in building real-time data pipelines, machine learning workflows, and incorporating innovative GenAI technologies like LangChain and Azure OpenAI to accelerate business and audit processes. I enjoy solving complex data challenges and delivering actionable insights that drive operational efficiency and regulatory compliance.

I have extensive hands-on expertise working with tools and platforms such as Azure, GCP, AWS, PySpark, DBT, and CI/CD pipelines. I take pride in collaborating closely with teams to modernize data platforms, optimize performance, and create intuitive dashboards using Power BI and Microsoft Fabric. I’m passionate about enabling data-driven strategies that help organizations achieve their business and compliance goals faster.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Work Experience

Senior Data Engineer at AMGEN
February 1, 2023 - Present
Led the adoption of Lakehouse architecture using Microsoft Fabric, built Kafka streaming pipelines for real-time clinical and financial data intake, and migrated large clinical datasets to BigQuery to enhance analytics speed and regulatory compliance. Designed PySpark workflows to process genomic and patient data, modernized legacy systems by optimizing Teradata SQL, standardized data formats using Apache Iceberg and Delta Lake. Automated ETL workflows with Azure Data Factory, enabled secure data sharing with Azure Synapse Analytics, and provisioned governed Lakehouse environments with Terraform and AKS. Integrated Azure Purview for metadata and compliance, built a governed data mesh for decentralized ownership, and modeled enterprise ontologies in Palantir Foundry. Worked on monitoring frameworks, quality checks with Soda Core, optimized Spark jobs to reduce cloud costs, and deployed automated ML pipelines for clinical risk assessment. Constructed GenAI pipelines for document review u
Data Engineer at Verizon
December 31, 2022 - July 14, 2025
Designed and built scalable ETL pipelines using Ab Initio and Python to integrate batch and streaming data into a Hadoop data lake. Engineered ingestion pipelines from REST APIs into MongoDB and S3, automated large XML file ingestion using AWS Lambda and Step Functions. Translated Teradata logic into BigQuery views to accelerate reporting. Built high-throughput Kafka Streams and PySpark pipelines to process telecom telemetry into Delta Lake. Implemented ACID-compliant Delta Lake on AWS S3, developed Spark batch and streaming jobs transforming data from MySQL, Oracle, and Kafka. Optimized Snowflake workloads, implemented change data capture with Kafka Connect and AWS DMS. Automated Kafka setup using Terraform, supported CI/CD for Kafka microservices. Delivered real-time network KPIs via Power BI dashboards to improve operational decision-making.
Data Engineer at IDFC Bank
July 31, 2022 - July 14, 2025
Led data modernization initiatives via designing cloud-native ETL pipelines to enhance data accuracy and accessibility in retail and corporate banking. Built pipelines to import data from AWS S3 into Spark RDDs, applying transformations and loading to Hive external tables for Tableau dashboards. Developed Spark jobs for data validation and transformation, utilized Hive queries for insights. Processed large datasets with Spark, Hive, MapReduce, Sqoop, Pig, and HDFS. Created Python scripts and UDFs for aggregation and querying, integrated AI-driven data enrichment workflows using AWS AI services. Designed scalable workflows on Amazon EMR. Performed data wrangling with Pandas, built ETL workflows with Informatica, and developed SQL queries for SSRS and Power BI reports. Developed operational dashboards in Power BI and SSRS for banking metrics to support executive decisions.
Python Developer at Cybage
May 31, 2020 - July 14, 2025
Involved in all project lifecycle stages including design, development, deployment, testing, and support. Developed early-stage web applications using Python, XML, and JSON for backend automation. Collaborated on front-end JavaScript and HTML functionality to support dashboards and internal tools. Worked with Python OpenStack APIs, performed numerical analysis with NumPy, and integrated third-party REST and SOAP APIs. Developed and deployed RESTful APIs using Flask and FastAPI, optimized numerical computations, and managed large datasets with Pandas and relational databases. Fine-tuned neural networks using TensorFlow and Keras to improve sentiment analysis. Performed data visualization using Tableau and Python libraries.
Senior Data Engineer at AMGEN
February 1, 2023 - Present
Led organization-wide adoption of Lakehouse architecture using Microsoft Fabric’s OneLake and Direct Lake, unifying batch and real-time analytics across domains. Created secure Kafka Connect frameworks and streaming pipelines to improve operational visibility and decision-making in clinical and financial areas. Migrated large historical clinical datasets into BigQuery to enable fast complex queries supporting regulatory audits and patient safety. Designed and optimized enterprise data warehouse layers in BigQuery and Snowflake to reduce latency and query costs. Developed PySpark workflows on Dataproc for large-scale genomic and patient data processing while ensuring compliance. Standardized data formats using Apache Iceberg and Delta Lake for analytics and streaming workloads. Built secure data interfaces via Azure Data Factory and FastAPI to streamline audit platform integration. Implemented Azure Purview for metadata governance and RBAC enforcement using Terraform, AKS, and Unity C
Data Engineer at Verizon
December 31, 2022 - July 31, 2025
Designed and developed scalable ETL pipelines with Ab Initio and Python, integrating batch and streaming data into a Hadoop data lake. Automated ingestion of large XML files using AWS Lambda and Step Functions, loading data into Snowflake and Redshift for real-time analytics. Translated Teradata logic into optimized BigQuery views, enhancing dashboard speeds and decision-making for customer and network teams. Built high-throughput real-time Kafka Streams and PySpark pipelines feeding telecom telemetry into Delta Lake for operational analytics. Implemented Delta Lake with ACID transactions on AWS S3. Optimized Snowflake workloads to improve query times. Managed Kafka Connect and AWS DMS for CDC synchronization from operational databases. Created Apache Airflow DAGs to orchestrate hybrid workflows involving Spark, Kafka, and AWS Lambda, improving compliance with SLAs. Delivered real-time KPIs via Power BI dashboards.
Data Engineer at IDFC Bank
July 31, 2022 - July 31, 2025
Led data modernization initiatives designing cloud-native ETL pipelines to improve accuracy and enable faster decision-making across banking divisions. Built Spark data pipelines importing data from AWS S3 into Hive external tables for Tableau dashboards. Developed Spark jobs for data validation and transformation, improving quality and fraud detection. Created Apache NiFi pipelines managing financial data ingestion with provenance tracking and back-pressure handling for resilience. Orchestrated data validation via Airflow DAGs, reducing failures by 38%. Developed large-scale data processing using Amazon EMR with Spark and Hadoop. Created SQL queries and procedures supporting SSRS and Power BI for retail banking dashboards tracking loans, NPAs, and anomalies.
Python Developer at Cybage
May 31, 2020 - July 31, 2025
Involved in full lifecycle of projects including design, development, deployment, and testing. Developed early web applications with Python, XML, and JSON backend automation and front-end enhancements using JavaScript and HTML. Worked with Python OpenStack APIs and numerical analysis using NumPy. Created RESTful APIs using Flask and FastAPI for SQL and NoSQL database integration. Optimized ML workflows using NumPy vectorization. Managed large datasets with Pandas and relational databases, enhancing data cleaning and aggregation. Improved sentiment analysis model's F1 score by 15% using TensorFlow and Keras fine-tuning. Created visualizations using Tableau and Python libraries.

Education

Master's at University of North Texas
January 1, 2015 - December 31, 2015
Master’s in Information Technology at University of North Texas
January 1, 2015 - December 31, 2015

Qualifications

Add your qualifications or awards here.

Industry Experience

Healthcare, Telecommunications, Financial Services, Life Sciences