
Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Building an ETL Pipeline to Extract and Transform Crowdfunding Campaign Data
Project Type
Extracting, Transforming, and Loading
Skills & Tools Used:
● Python
● Pandas
● NumPy
● PostgreSQL
● Data Extraction
I collaborated with 3 colleagues to successfully build an ETL pipeline using Python, Pandas, and Python dictionary methods. Our aim was to efficiently extract and transform data from crowdfunding and contacts Excel files, creating CSV files and then loading the data into a Postgres database. Here's a summary of the impressive outcomes:
Category and Subcategory DataFrames:
● Extracted and transformed data from crowdfunding.xlsx to create category and subcategory DataFrames.
● Expertly handled data manipulations, resulting in well-structured DataFrames.
● Exported DataFrames to category.csv and subcategory.csv, ensuring data integrity.
Campaign DataFrame:
● Executed precise data extraction and transformation from crowdfunding.xlsx to construct a comprehensive campaign DataFrame.
● Meticulously converted data types, renamed columns, and managed UTC times seamlessly.
● Exported the refined DataFrame to campaign.csv, reflecting attention to detail.
Contacts DataFrame:
● Employed Python dictionary methods for efficient extraction and transformation from contacts.xlsx.
● Applied rigorous data cleaning techniques, resulting in a polished DataFrame.
● Exported the cleaned DataFrame to contacts.csv, showcasing proficiency in data handling.
Crowdfunding Database:
● Strategically sketched an ERD using QuickDBD, emphasizing a clear understanding of database design.
● Crafted an exemplary table schema for each CSV file, showcasing expertise in data modeling.
● Successfully created the crowdfunding_db database, meticulously verifying table creation and CSV file imports.
Documentation and Collaboration:
● Maintained consistent communication and collaboration with the partner throughout the project.
● Demonstrated effective teamwork, ensuring the success of different project stages.
● Regularly committed and pushed changes to the GitHub repository, showcasing professionalism.
Key Achievements:
● Demonstrated proficiency in Python, Pandas, and ETL pipeline design.
● Showcased a high level of data manipulation skills, resulting in well-structured and clean DataFrames.
● Applied database design principles effectively, creating a well-defined ERD and table schema.
● Successfully collaborated with a partner, ensuring timely progress updates and mutual support.
This project highlights my ability to handle complex ETL tasks, work collaboratively, and deliver precise and well-documented outcomes. I look forward to bringing these skills and experiences to contribute effectively as a data analyst in a dynamic work environment.

