I am a data engineer with over 3 years of experience owning production ETL/ELT pipelines on Databricks and AWS. I have designed Spark-based data pipelines processing 5M+ records/day across finance and reporting domains with a 99.9% job SLA, implemented Delta Lake schemas, partitioning strategies, and automated data quality controls to ensure reliable analytics workloads. I prioritize production reliability, observability, CI/CD, and cost-aware performance tuning. I enjoy collaborating with business stakeholders to translate reporting requirements into scalable data models and reusable pipelines, and I contribute to on-call rotations, incident RCA, and permanent fixes to reduce repeat issues.

Shaileja Kuthuru

I am a data engineer with over 3 years of experience owning production ETL/ELT pipelines on Databricks and AWS. I have designed Spark-based data pipelines processing 5M+ records/day across finance and reporting domains with a 99.9% job SLA, implemented Delta Lake schemas, partitioning strategies, and automated data quality controls to ensure reliable analytics workloads. I prioritize production reliability, observability, CI/CD, and cost-aware performance tuning. I enjoy collaborating with business stakeholders to translate reporting requirements into scalable data models and reusable pipelines, and I contribute to on-call rotations, incident RCA, and permanent fixes to reduce repeat issues.

Available to hire

I am a data engineer with over 3 years of experience owning production ETL/ELT pipelines on Databricks and AWS. I have designed Spark-based data pipelines processing 5M+ records/day across finance and reporting domains with a 99.9% job SLA, implemented Delta Lake schemas, partitioning strategies, and automated data quality controls to ensure reliable analytics workloads.

I prioritize production reliability, observability, CI/CD, and cost-aware performance tuning. I enjoy collaborating with business stakeholders to translate reporting requirements into scalable data models and reusable pipelines, and I contribute to on-call rotations, incident RCA, and permanent fixes to reduce repeat issues.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Work Experience

Data Engineer at Citco Group Limited
January 1, 2024 - Present
Owned 6+ production ETL/ELT pipelines on Databricks (Spark + Delta Lake), processing 5M+ records/day across finance and reporting domains with 99.9% SLA. Designed Delta Lake schemas, partitioning strategies, and incremental MERGE-based upserts to support scalable analytics workloads. Implemented automated data quality framework (schema validation, null thresholds, reconciliation checks) reducing recurring production defects by ~85%. Tuned Spark SQL workloads (partition pruning, join strategies, caching) reducing average query latency by ~30% (8.0s to 5.6s) and lowering compute cost. Built monitoring and alerting using CloudWatch; participated in on-call rotation, performed RCA, and delivered permanent fixes to reduce repeat incidents. Standardized pipeline templates and reusable modules, cutting manual effort by ~60% and saving ~120 engineering hours/month. Managed CI/CD using Git + Jenkins, enforcing code reviews and automated testing to improve deployment reliability across environme
Python Data Engineer at Getida (An SIB Company)
November 1, 2022 - December 1, 2023
Built Python + SQL data pipelines processing 10M+ records across multiple data sources, delivering analytics-ready datasets for finance and operations teams. Automated recurring data preparation and reporting workflows, reducing processing time by ~70% and saving ~15 hours/week of analyst effort. Designed SQL data models and Power BI dashboards supporting 10+ KPIs used in executive weekly/monthly reviews and operational decision-making. Implemented reconciliation logic and audit checks achieving 99.5% accuracy across multi-source reporting pipelines. Developed NLP-based sentiment analysis on 50K+ customer reviews, driving product improvements that increased customer satisfaction by ~18%. Partnered with business stakeholders to translate reporting requirements into scalable data models and reusable pipelines.
Python Developer (Product & Insights) at NJR Infotech Private Limited
August 1, 2021 - July 1, 2022
Developed Python-based REST APIs handling 200+ automated requests/day for internal reporting and analytics applications. Built extraction and transformation pipelines using Python and SQL, producing standardized datasets consumed by analysts and business teams. Optimized SQL queries and backend processing, improving report stability and reducing long-running queries. Led code reviews and improved internal documentation, reducing onboarding time and improving long-term maintainability.
Python Intern at Defence Research and Development Organisation (DRDO)
January 1, 2021 - March 1, 2021
Automated Python scripts to clean, analyze, and visualize experimental datasets (500K+ data points), improving analysis efficiency by ~40%. Built KPI dashboards and reporting outputs (Power BI/Tableau) for research tracking and project reviews.

Education

Master of Science in Information Technology at Clark University
August 1, 2022 - December 1, 2023

Qualifications

AWS Certified Solutions Architect – Associate
January 11, 2030 - January 29, 2026
AWS Certified Cloud Practitioner
January 11, 2030 - January 29, 2026
IBM Data Science Professional Certificate
January 11, 2030 - January 29, 2026

Industry Experience

Financial Services, Professional Services, Retail