Available to hire
I’m a data and AI engineer with a passion for turning complex data into actionable insights. Over the years I have built and scaled end-to-end data pipelines, deployed LLM-powered solutions, and partnered with cross-functional teams to drive enterprise AI initiatives.
Currently, I design and productionize RAG pipelines, document-intelligence systems, and production-grade APIs, while prioritizing data governance and scalable architecture across cloud platforms.
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
Language
English
Fluent
German
Intermediate
Javanese
Advanced
Bashkir
Advanced
Work Experience
Data and AI Engineer at Talonic
November 1, 2024 - PresentDesigned end-to-end document intelligence pipelines converting unstructured documents into structured datasets. Fine-tuned large language and vision models for insights extraction supporting R&D workflows. Developed retrieval-augmented generation (RAG) workflows to accelerate knowledge discovery for enterprise clients. Implemented quality assurance for structured output validation ensuring data integrity. Collaborated with non-technical stakeholders to build data-driven automation pipelines enhancing operational efficiency.
Data Scientist at Index Exchange
October 31, 2024 - July 25, 2025Conducted exploratory data analysis and statistical modeling to optimize ad exchange operations. Built and validated machine learning models in SageMaker to improve targeting efficiency. Developed real-time inference workflows using Airflow, Kafka, and Docker. Collaborated to deploy scalable ML pipelines with high reliability and traceability. Engineered APIs and Spark-based data flows to streamline model integration and automated decision-making.
Data Engineer at Ek Robotics
March 31, 2024 - July 25, 2025Performed data wrangling and quality enhancement using Python libraries, adhering to coding standards. Developed ML pipelines for sentiment analysis using Scikit-learn and TensorFlow. Designed scalable data warehouse architectures with AWS S3, Data Lake, and Redshift. Automated business processes with AWS Lambda and Step Functions. Integrated Apache Spark with AWS Glue to optimize data pipelines and reduce latency. Modeled relational and NoSQL databases and explored infrastructure as code practices for deployment.
Data Engineer at Tata Consultancy Services
March 31, 2022 - July 25, 2025Developed customized ETL solutions using Informatica handling millions of records. Designed complex PL/SQL processes integrating NoSQL with batch and ETL workflows. Managed large datasets with HDFS and Hive for distributed querying. Led scalable data architecture projects using DBT and enhanced backend processes with TypeScript. Improved data pipeline integration with RESTful APIs. Performed data profiling on multiple database platforms ensuring data accuracy.
Data and AI Engineer at Talonic
November 1, 2024 - PresentDesigned and deployed document intelligence pipelines using AWS Glue, S3, and Lambda to ingest and normalize unstructured documents, integrating OCR and Generative AI for enriched extraction. Built and optimized Spark data pipelines on Databricks, enhancing AI model performance on large-scale data. Engineered Retrieval-Augmented Generation (RAG) pipelines using vector search and language agents to double knowledge retrieval speed while reducing manual overhead. Deployed LLM-powered APIs with FastAPI and Docker, incorporating logging and monitoring for quality assurance. Collaborated with stakeholders to improve data architecture and governance.
Data Scientist (Intern) at Index Exchange
October 31, 2024 - August 27, 2025Collaborated with cross-functional teams to enhance system efficiency via data-driven insights and machine learning algorithms. Used AWS S3 for data processing, and managed workflows with Spark, Kafka, and Airflow. Articulated research hypotheses supporting automated decision-making. Trained ML models using AWS SageMaker. Implemented MLOps with Docker, Kubernetes, Airflow, and Kafka for real-time ingestion and streamlined deployment. Designed data pipelines and APIs for real-time data flow in ad exchange platforms.
Data Engineer (Work-student) at Ek Robotics
March 31, 2024 - August 27, 2025Designed a scalable data warehouse using AWS S3, Data Lake, and Redshift to manage large data volumes efficiently. Automated business processes with AWS Lambda and Step Functions resulting in operational improvements. Developed ETL/ELT pipelines with Apache Spark on Databricks integrating diverse data sources into cloud data lakes. Enhanced OLAP DataMart via dimensional modeling. Created conceptual, logical, and physical data models using Amazon DynamoDB and AWS Data Modeler; explored AWS CDK for infrastructure as code. Built dashboards and reports using AWS QuickSight. Implemented real-time streaming pipelines with Amazon Kinesis.
Master’s Thesis Student at Pragmatic industries GmbH
March 1, 2024 - PresentExtracted key information from over 1,000 technical PDFs using multimodal ML models like CLIP for combined visual and textual embeddings. Developed rule-based models to structure initial data extraction and annotated datasets to improve entity recognition training. Built and fine-tuned Named Entity Recognition models using BERT for precise classification of technical details. Leveraged large language models including Ollama and LLaMAIndex to improve data extraction, interpretation, and comprehension of unstructured document data.
Data Engineer at Tata Consultancy Services
March 31, 2022 - August 27, 2025Delivered customized ETL solutions using Informatica PowerCenter handling millions of data records efficiently. Designed and implemented complex PL/SQL database processes integrating NoSQL with batch and ETL functionality. Managed large datasets with HDFS and Hive ensuring efficient distributed querying and storage. Translated business requirements into scalable data architectures using DBT and enhanced backend integrations using TypeScript. Connected diverse data sources through RESTful APIs. Led statistical data analysis and information flow automation using Python scripting. Conducted data profiling and quality assurance using Informatica Data Quality, SQL, and Python on Teradata, MS SQL Server, and MongoDB.
Data and AI Engineer at Talonic
July 1, 2025 - July 1, 2025Designed and deployed ETL processes and document intelligence pipelines using AWS Glue, S3, and Lambda; established an event-driven architecture to ingest and normalize unstructured documents. Engineered RAG pipelines with vector search and language agents; developed and implemented advanced prompt engineering techniques to improve enterprise knowledge retrieval speed by 2x. Deployed LLM-powered APIs into production using FastAPI and TypeScript with Prisma, integrating with Supabase for database management and ensuring high availability with Docker. Engineered a full-stack data pipeline using PySpark on Databricks for pre-processing and feeding into fine-tuned AI models, directly supporting business-critical enterprise applications. Partnered with stakeholders to improve data architecture and governance, ensuring quality and compliance.
Data Scientist (Intern) at Index Exchange
October 1, 2024 - October 1, 2024Collaborated with cross-functional teams to enhance system efficiency through data-driven insights and ML algorithms. Clearly articulated research hypotheses, decisions, and results, supporting automated decision-making algorithms. Trained and evaluated ML models using AWS SageMaker to optimize performance and accuracy. Developed and maintained data workflows using Apache Airflow for orchestration, scheduling complex ETL tasks. Implemented MLOps with Docker and Kubernetes; integrated Kafka to build a real-time, event-driven architecture for data ingestion, streamlining algorithm deployment. Designed and implemented robust data pipelines and APIs, facilitating seamless data flow and real-time processing platforms.
Data Engineer at Ek Robotics
March 31, 2024 - March 31, 2024Designed and developed a scalable data warehouse using AWS Redshift, leveraging deep knowledge of its internals for performance tuning, including optimizing distribution/sort keys, analyzing query execution plans, and managing workload (WLM). Implemented AWS Lambda and Step Functions to automate business processes, significantly improving operational efficiency. Developed ETL/ELT pipelines with Apache Spark on Databricks, integrating diverse data sources into cloud-based data lakes. Enhanced OLAP DataMart through dimensional modeling (Kimball methodology), optimizing performance and accessibility for business intelligence. Developed conceptual, logical, and physical data models utilizing Amazon DynamoDB and AWS Data Modeler, while exploring AWS CDK for infrastructure as code to improve deployment processes. Implemented streaming data pipelines with Amazon Kinesis, ensuring real-time data ingestion and processing. Actively monitored data pipelines and warehouse performance using AWS Clo
Master’s Thesis Student at Pragmatic industries GmbH
March 1, 2024 - November 21, 2025Architected a two-stage, hybrid model for document intelligence, first leveraging a YOLOv8-based detector to identify key layout components (e.g., tables, figures), then applying a fine-tuned LayoutLMv3 model to these specific regions for high-precision, region-specific data extraction. Implemented a layout-aware OCR pipeline using PaddleOCR, leveraging the hybrid model's output to correct segmentation errors and generate clean markdown data. Orchestrated a zero-shot extraction pipeline by fine-tuning Llama3-8B with PEFT/LoRA. Aligned the model for high-fidelity schema generation using Direct Preference Optimization (DPO), leveraging a curated preference dataset of correct vs. incorrect schema outputs to directly steer the model’s policy.
Data Engineer at Tata Consultancy Services
March 1, 2022 - March 1, 2022Developed customized ETL solutions using Informatica PowerCenter, handling millions of data records efficiently. Designed and implemented complex PL/SQL database processes integrating NoSQL database with batch and ETL functionality. Managed large datasets using HDFS and leveraged Hive for distributed querying, ensuring efficient data processing and storage. Built and maintained modular, reusable data transformation models using DBT; integrated data versioning with Git and implemented CI/CD practices to automate testing and deployment of DBT workflows. Enhanced data pipeline integration by utilizing RESTful APIs to connect various data sources and services. Coordinated statistical data analysis, design, and information flow for efficient data processing using Python scripting. Performed data profiling on Teradata, MS SQL Server, and MongoDB environments using Informatica Data Quality, SQL queries, and Python, to ensure data accuracy and integrity across various platforms.
Education
MSc at Otto Von Guericke Universität Magdeburg
January 1, 2022 - March 31, 2024Bachelors at SRM Institute of Science and Technology
January 1, 2015 - December 31, 2019MSc - Data and Knowledge Engineering at Otto Von Guericke Universität Magdeburg
January 11, 2030 - August 27, 2025Bachelors - Computer Science and Engineering at SRM Institute of Science and Technology
January 1, 2015 - January 1, 2019MSc - Data and Knowledge Engineering at Otto Von Guericke Universität Magdeburg
January 11, 2030 - November 21, 2025Bachelors - Computer Science and Engineering at SRM Institute of Science and Technology
January 11, 2030 - January 1, 2019Qualifications
Industry Experience
Software & Internet, Professional Services, Manufacturing, Financial Services, Healthcare, Computers & Electronics, Media & Entertainment, Education
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Magdeburg today.