Extract Transform Load (ETL) for Apple Products Specifications

This article is based on a project that we took while having an introduction to data engineering. We used Python as our programming language and a Malaysian website (https://phone.mesramobile.com/category/apple/) for data scraping. Hope this article helps you learn a bit more about ETL and Python.

According to Wikipedia , extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s). In this article, we will explain step by step of ETL pipeline using Python code. Let’s get started!

First, we need to import the necessary libraries. We are using BeautifulSoup the module requests for our scrapper. This allows us to send HTTP requests, which returns a Response Object with the response data.

STEP 1: EXTRACT

Before we start doing the process of scraping data, we need to identify what are data that need to scrape. There are the following information that available in the website :

  • phone model
  • price
  • display
  • camera
  • operating system
  • system
  • memory
  • battery
  • wifi
  • charging
  • network

The scrapped attributes are then displayed as dataframes.

This is the example of the output.

STEP 2: TRANSFORM

DATA PROCESSING

The attributes scraped from the website are already cleaned so not much data cleaning can be done. We checked for null/missing data, tried to remove the remaining whitespaces, dropped (RM) in the price section and (MP) in the camera section, renamed Price to Price(RM), renamed Camera to Camera(MP), and dropped the WiFi attribute as it is not meaningful.

The final output after data processing(cleaning)

DATA VISUALIZATION

The data from the website is already shown nicely but we can further visualize some parts on its own. Such as below:

Output before cleaning for Price attribute
Output after cleaning for Price attribute

STEP 3: LOAD

Since the evaluation for the missing data shows that every row and column have been filled with data and there are not any missing data, we can proceed to save it into the CSV file.

I hope you enjoyed reading the article!

You can find the full codes here :

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store