Hi! I’m Hasnain, a passionate Data Science practitioner based in Karachi, Pakistan. I specialize in statistical analysis, predictive modeling, and developing robust data pipelines. Currently, I’m gaining valuable industry experience through a technical internship at Developer Hub Corporation, where I contribute to machine learning workflow automation and data infrastructure optimization.
I enjoy tackling real-world data problems, whether it’s building classification models, conducting multivariate analysis, or creating intuitive dashboards for better decision-making. My academic journey in Software Engineering has equipped me with the skills and practical knowledge to excel in data science, and I’m excited about continuing to grow in this dynamic field.
Skills
Experience Level
Language
Work Experience
Education
Qualifications
Industry Experience
Data Analysis of Stanford Open Policing Project
I conducted a comprehensive data analysis of the Stanford Open Policing Project using Python libraries such as pandas, numpy, matplotlib, seaborn, and scipy.stats. Key aspects of the project included:
Data Cleaning & Preparation: Removed null values, merged date and time columns for better analysis, and converted data types for accuracy.
Exploratory Data Analysis (EDA): Visualized driver demographics, drug-related stops, and district-wise accident distributions through bar charts, pie charts, and line plots.
Gender & Violation Analysis: Investigated the impact of gender on arrests, stop outcomes for speeding violations, and calculated violation counts across different districts.
Time-Based Trends: Analyzed arrest rates by the hour and annual trends for drug-related stops and search-conducted rates.
Violation Duration: Mapped and visualized the average stop duration for different violation types.
In this project, I developed a predictive model using logistic regression to classify breast cancer as malignant or benign based on diagnostic data. The dataset was preprocessed by handling missing values, dropping irrelevant columns, and converting categorical data into numerical form for better model performance.
Key Steps:
Data Preprocessing: Cleaned the dataset by removing unnecessary columns and handling missing values. Categorical variables were encoded into binary form to aid model understanding.
Exploratory Data Analysis: Utilized Seaborn and Matplotlib to explore data distributions and relationships, including a correlation matrix heatmap to identify important features.
Model Training: Applied a Logistic Regression model after scaling the features using StandardScaler to improve convergence. The dataset was split into training and testing sets to evaluate model performance.
Model Evaluation: Achieved an impressive accuracy of 98.2%. Evaluated the model using a confusion matrix, classification report, and ROC curve analysis, which showed strong predictive performance with an AUC score reflecting excellent discriminative ability.
Visualization & Insights:
Generated insightful visualizations including heatmaps and ROC curves that not only supported the model evaluation but also provided clear communication of the findings.
This project demonstrates my proficiency in data preprocessing, visualization, model building, and evaluation using Python libraries such as pandas, numpy, scikit-learn, seaborn, and matplotlib, showcasing my ability to deliver actionable insights and accurate predictions in healthcare analytics.
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Karachi today.