Málaga Population Data Cleaning Project - SQL
- Ryan Deuter
- Sep 9, 2024
- 2 min read
Updated: Feb 26
In this project I downloaded a dataset from the National Institue of Statistics from Spain of Malaga's population over the last few years. As I imported into SQL Server, I saw there was some data cleaning necessary in order for me to manipulate the data how I wanted to.
Link to Code: https://github.com/ryandeuter/PortfolioProjects/blob/main/malaga_population_data_cleaning.sql
BACKGROUND: I need to clean the data in SQL Server in order to manipalate it properly.
PROCESS: Using REPLACE, UPDATE TABLE, and other SQL functions.
First let's look at the raw data!
SELECT *
FROM malaga_pop
As this is a project I am carrying out in English, I need to convert the Total column from Spanish thousands to English format.
UPDATE malaga_pop
SET Total =
REPLACE(Total, '.', '')
I also need to rename the 'Periodo' column to English.
EXEC sp_rename
'malaga_pop.Periodo',
'Period', 'COLUMN'
Let's take a look at what we have:
Also, going to fix some things in "Sex" column. Converting "Females" to the singular "Female". Also, "FeMale" to "Female".
UPDATE malaga_pop
SET Sex =
REPLACE(REPLACE(Sex,
'Males', 'Male'),
'Females', 'Female')
UPDATE malaga_pop
SET Sex =
REPLACE(Sex,
'FeMale', 'Female')
I also want to seprate Municpality and Postcode as they are in the same column. To do this I will create two new columns, seperate them, and then drop the original column.
ALTER TABLE malaga_pop
ADD Postcode VARCHAR(50), Municipality VARCHAR(100)
UPDATE malaga_pop
SET Postcode = LEFT(Municipalities,
CHARINDEX(' ', Municipalities) -1),
Municipality =
LTRIM(RIGHT(Municipalities,
LEN(Municipalities) - CHARINDEX(' ',
Municipalities)))
ALTER TABLE malaga_pop
DROP COLUMN Municipalities
Great! It's looking good. Are we there yet? Not quite! Should we have used Excel! Yes, it would have probably been a lot easier. Well, got some SQL practice in.
One final thing- in the Sex column, the dataset includes 'Total' as a value (calculation included within the Sex column). I'll drop that to include my own totals with the intention of performing streamlined aggregate Group By functions later on.
DELETE FROM malaga_pop where Sex = 'Total'
Let's have a look at the final product:
There we have it, a cleaned dataset where we can now perform our data analysis. I would of course investigate the NULL values to see if the information could be obtained.
Comments