Detail-oriented Business analyst with strong data processing, cleaning, and validation skills. Demonstrated expertise in utilizing SQL, Python, Power BI, Tableau, and Excel to extract and transform data for analysis, resulting in improved efficiency and results.
Skills
Experience Level
Language
Work Experience
Education
Qualifications
Industry Experience
Summary:
Developed a Python script for web scraping and text extraction from URLs stored in an Excel file. The script utilizes the Pandas library to read Excel data into a DataFrame and the Requests library to fetch web page content. BeautifulSoup is employed for HTML parsing, enabling extraction of titles and text from web pages
Key Features:
• Data Handling: Utilized Pandas for efficient handling of Excel data, enabling seamless integration of URL and URL ID information into the script
• Web Scraping: Employed Requests library for making HTTP requests to URLs, ensuring robustness with a user-agent header to mimic browser behavior
• HTML Parsing: Leveraged BeautifulSoup library for parsing HTML content, facilitating extraction of titles and text from web pages through intuitive methods like find() and find_all()
• File Management: Organized extracted data into structured text files, with each file containing a title and corresponding text extracted from a web page. Implemented file creation and directory management functionalities to ensure seamless file writing operations
Technologies Used:
• Python
• Pandas
• Requests
• BeautifulSoup
Applications:
• Data Extraction
• Web Scraping
• Natural Language Processing (NLP)
• Automation
• Developed a Python script leveraging the transformers library (version 4.4.2) to perform table question-answering tasks
• Utilized the TAPAS (Table-based Pre-training for Answering Selection) model (google/tapas-base-finetuned-wtq) for accurate and efficient question-answering on tabular data
• Employed the pipeline function to create a streamlined workflow for table question-answering tasks
• Processed tabular data from CSV files, performing data cleaning operations such as correcting typographical errors in specific columns (e.g., ‘location’)
• Generated insightful answers to complex queries posed to the tabular data, facilitating rapid data analysis and decision-making
• Demonstrated proficiency in data manipulation techniques using pandas library for data wrangling and preprocessing tasks
• Optimized code efficiency by saving modified data back to CSV files for future use and easy retrieval
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Gurugram today.