I am Sahab Tariq, a data scientist and senior AI/ML engineer specializing in generative AI, computer vision, and large language models. I have led end-to-end ML and AI projects across industries, building scalable backends, deploying real-time data pipelines, and delivering measurable improvements in efficiency. I enjoy collaborating with clients to translate complex requirements into robust data products, mentoring teams, and pushing the boundaries of AI responsibly.

Sahab Tariq

I am Sahab Tariq, a data scientist and senior AI/ML engineer specializing in generative AI, computer vision, and large language models. I have led end-to-end ML and AI projects across industries, building scalable backends, deploying real-time data pipelines, and delivering measurable improvements in efficiency. I enjoy collaborating with clients to translate complex requirements into robust data products, mentoring teams, and pushing the boundaries of AI responsibly.

Available to hire

I am Sahab Tariq, a data scientist and senior AI/ML engineer specializing in generative AI, computer vision, and large language models. I have led end-to-end ML and AI projects across industries, building scalable backends, deploying real-time data pipelines, and delivering measurable improvements in efficiency.

I enjoy collaborating with clients to translate complex requirements into robust data products, mentoring teams, and pushing the boundaries of AI responsibly.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Work Experience

Python Web Scraping Engineer at Zyte
July 1, 2023 - Present
Designed, developed, and maintained large-scale web scraping solutions using Scrapy, Splash, Playwright, and Zyte APIs, ensuring high accuracy and efficiency. Built and optimized data extraction pipelines to handle millions of records daily across diverse domains (E-commerce, FinTech, Real Estate, Social Media). Implemented anti-bot evasion techniques (rotating proxies, headless browsers, dynamic rendering, CAPTCHA solving) to ensure robust and reliable data collection. Developed and maintained ML models for data cleaning, entity extraction, deduplication, and text classification, improving data quality and usability. Collaborated with clients to gather requirements, design scraping strategies, and deliver clean structured datasets in JSON, CSV, and database formats. Automated ETL pipelines integrating with AWS (S3, Lambda, EC2), GCP, and Elasticsearch, enabling end-to-end data processing and analytics. Deployed and monitored scrapers on Zyte Cloud & distributed systems, ensuring scala
Senior Machine Learning Engineer at AppsGenii Technologies
June 1, 2022 - June 1, 2023
Designed and developed chatbots and conversational AI assistants using Python, NLP libraries, and third-party APIs. Implemented natural language understanding (NLU), intent classification, and entity extraction using frameworks like Rasa, spaCy, and HuggingFace Transformers. Fine-tuned LLM-based chatbots (GPT, LLaMA, etc.) for domain-specific use cases to improve accuracy and reduce hallucinations. Built analytics pipelines to track user interactions, response accuracy, and engagement metrics for chatbot performance optimization. Utilized Scrapy and Selenium for data scraping, focusing on childcare websites. Generated text embeddings with OpenAI's Text Embedding 002, initially storing data in MongoDB Atlas, migrating to Elasticsearch for search optimization, and later moving to Pinecone for efficient vector management and LangChain integration. Developed a RAG system using Google Gemini Vertex AI, Claude by Anthropic, and OpenAI. Created a FastAPI backend for the RAG bot, implementing
Machine Learning Engineer at Slashnext
July 1, 2020 - May 1, 2022
Trained YOLOv5 and Ultralytics models for precise product detection in the retail industry. Utilized VGG16 for image feature extraction, enhancing model accuracy and performance. Contributed to automatic training and testing pipelines for efficient ML model development. Managed datasets through dedicated pipelines, ensuring organized data. Implemented FastAPI, Docker, Pydantic, and PostgreSQL to wrap models, providing user-friendly interfaces. Utilized Celery, Redis, and RabbitMQ for background tasks and queue-based communication. Implemented Docker-based containerization for microservices deployment and system scalability. Applied NLP techniques to predict phishing content using TF-IDF features and Random Forest; three models in production, combining computer vision features with TF-IDF. Conducted data mapping to identify and document data sources, data flows, and data transformations, including data lineage visualization. Worked with cloud services and data pipelines for end-to-end M
Computer Vision Engineer at Sixlogics
September 1, 2017 - June 1, 2020
In the sports industry contributed to a 'Given Take' club focusing on object tracking and re-identification. Trained a Siamese network with triplet loss and later transitioned to CLIP Transformer and Vision Transformer for re-identification, achieving improved results. Incorporated color histograms for enhanced feature analysis and developed a custom model for ball detection. In the swimming domain annotated pose data using RoboFlow and trained a model with YOLO Pose. Implemented OpenCV and image processing techniques to accurately count strokes and monitor swimmer performance. Conducted a POC with Twelve Labs API to generate descriptive insights from video content, enhancing automated video analysis. Worked on object detection and tracking algorithms and performed mapping of world coordinates to the 2D plane using homography. Trained Inception V2 model and a Keras image classifier. Built a real-time analytics system that improved performance from 5 fps to 33 fps. Managed a team of ann
Senior Machine Learning Engineer at APPSGEN II Technologies
July 20, 2023 - Present
Designed and developed data extraction pipelines to handle millions of records daily across e-commerce, FinTech, Real Estate, and Social Media. Implemented anti-bot evasion techniques (rotating proxies, headless browsers, dynamic rendering, CAPTCHA solving) to ensure robust and reliable data collection. Built and maintained ML models for data cleaning, entity extraction, deduplication, and text classification, improving data quality and usability. Collaborated with clients to gather requirements, designed scraping strategies, and delivered clean structured datasets in JSON, CSV, and database formats. Automated ETL pipelines integrating AWS (S3, Lambda, EC2), GCP, and Elasticsearch, enabling end-to-end data processing and analytics. Deployed and monitored scrapers on Zyte Cloud and distributed systems, ensuring scalability and fault tolerance, and cost optimization. Improved scraping efficiency through optimization of spider architecture, middleware customization, and caching mechanisms
Senior Machine Learning Engineer at APPSGEN II Technologies
June 20, 2022 - June 20, 2023
Designed and built production-grade RAG backend for an AI bot, implementing real-time streaming via Web Sockets. Developed RAG pipelines using LangChain and LangSmith for monitoring. Analyzed facial expressions, voice, and stress levels during interviews to support decision-making.
Computer Vision Engineer at Six Logics
September 20, 2017 - June 20, 2020
Worked in the sports industry focusing on object tracking and re-identification. Trained Siamese networks with Triplet Loss, then migrated to CLIP Transformer for re-identification. Implemented color histograms for enhanced feature analysis and developed ball-detection models using Roboflow and YOLO Pose. Generated text embeddings using OpenAI embeddings and supported vector databases.

Education

BS Computer Science at COMSATS University
September 1, 2013 - August 1, 2017
Bachelor of Science in Computer Science at COMSATS University Islamabad, Lahore Campus, Pakistan
September 20, 2013 - August 20, 2017

Qualifications

Add your qualifications or awards here.

Industry Experience

Computers & Electronics, Software & Internet, Professional Services, Media & Entertainment, Education, Other