Syla ETL Pipeline — Extract Phase (MongoDB → S3 → Athena → Power BI)
This project showcases the Extract phase of the Syla ETL Pipeline, built to support a full data ingestion and analytics workflow using MongoDB, Python, AWS, and Power BI. The goal of this phase is to reliably pull raw data from multiple MongoDB collections, enrich it with sentiment intelligence, store it in a cloud-ready format, and make it queryable for reporting and dashboard development.
At the core of the workflow is the syla_ingest.ipynb notebook, which connects securely to MongoDB using credentials stored in syla_cred.xlsx. Once connected, it extracts data from key collections such as users, chats, phrases, messages, and quickphrases, loading each dataset into structured Pandas DataFrames for exploration and validation.
To enhance analytics value, the pipeline integrates an ML-based sentiment engine using sentiments.py. This script functions as a reusable module that generates both sentiment classification and key phrase extraction, adding these outputs as new columns within the extracted dataset. This enables downstream analysis such as sentiment score tracking, communication preference insights, and phrase-level trend detection.
For scalability and cloud querying, the pipeline uploads cleaned datasets directly into an Amazon S3 bucket in CSV format. From there, Amazon Athena external tables are dynamically created using schema definitions inferred from DataFrame structures. The notebook automatically generates column definitions, constructs the required CREATE EXTERNAL TABLE SQL statements using OpenCSVSerde, and executes them using the Boto3 Athena client, ensuring the data becomes instantly queryable.
The final output supports Power BI reporting, enabling dashboards such as the Syla Users Dashboard which highlights user activity, sentiment performance, communication methods, and keyword trends. Overall, this Extract Phase POC demonstrates a complete ingestion foundation for building modern, cloud-powered business intelligence solutions.
DataEngineering ETLPipeline MongoDB Python Pandas AWS AmazonS3 AmazonAthena
Boto3 DataIngestion DataAnalytics PowerBI BusinessIntelligence DashboardDesign
SentimentAnalysis MachineLearning KeyPhraseExtraction DataScience BigData
CloudComputing DataPipeline SQL DataVisualization PortfolioProject
TechPortfolio DataProfessional AnalyticsEngineering MicrosoftPowerBI…Syla ETL Pipeline — Extract Phase (MongoDB → S3 → Athena → Power BI)
This project showcases the Extract phase of the Syla ETL Pipeline, built to support a full data ingestion and analytics workflow using MongoDB, Python, AWS, and Power BI. The goal of this phase is to reliably pull raw data from multiple MongoDB collections, enrich it with sentiment intelligence, store it in a cloud-ready format, and make it queryable for reporting and dashboard development.
At the core of the workflow is the syla_ingest.ipynb notebook, which connects securely to MongoDB using credentials stored in syla_cred.xlsx. Once connected, it extracts data from key collections such as users, chats, phrases, messages, and quickphrases, loading each dataset into structured Pandas DataFrames for exploration and validation.
To enhance analytics value, the pipeline integrates an ML-based sentiment engine using sentiments.py. This script functions as a reusable module that generates both sentiment classification and key phrase extraction, adding these outputs as new columns within the extracted dataset. This enables downstream analysis such as sentiment score tracking, communication preference insights, and phrase-level trend detection.
For scalability and cloud querying, the pipeline uploads cleaned datasets directly into an Amazon S3 bucket in CSV format. From there, Amazon Athena external tables are dynamically created using schema definitions inferred from DataFrame structures. The notebook automatically generates column definitions, constructs the required CREATE EXTERNAL TABLE SQL statements using OpenCSVSerde, and executes them using the Boto3 Athena client, ensuring the data becomes instantly queryable.
The final output supports Power BI reporting, enabling dashboards such as the Syla Users Dashboard which highlights user activity, sentiment performance, communication methods, and keyword trends. Overall, this Extract Phase POC demonstrates a complete ingestion foundation for building modern, cloud-powered business intelligence solutions.
DataEngineering ETLPipeline MongoDB Python Pandas AWS AmazonS3 AmazonAthena
Boto3 DataIngestion DataAnalytics PowerBI BusinessIntelligence DashboardDesign
SentimentAnalysis MachineLearning KeyPhraseExtraction DataScience BigData
CloudComputing DataPipeline SQL DataVisualization PortfolioProject
TechPortfolio DataProfessional AnalyticsEngineering MicrosoftPowerBIWW…