top of page

Málaga Population Data Cleaning Project - SQL

  • Ryan Deuter
  • Sep 9, 2024
  • 2 min read

Updated: Feb 26

In this project I downloaded a dataset from the National Institue of Statistics from Spain of Malaga's population over the last few years. As I imported into SQL Server, I saw there was some data cleaning necessary in order for me to manipulate the data how I wanted to.



BACKGROUND: I need to clean the data in SQL Server in order to manipalate it properly.


PROCESS: Using REPLACE, UPDATE TABLE, and other SQL functions.


First let's look at the raw data!


SELECT *
FROM malaga_pop

ree

As this is a project I am carrying out in English, I need to convert the Total column from Spanish thousands to English format.


UPDATE malaga_pop
SET Total = 
REPLACE(Total, '.', '')

I also need to rename the 'Periodo' column to English.

EXEC sp_rename
'malaga_pop.Periodo',
'Period', 'COLUMN'

Let's take a look at what we have:


ree









Also, going to fix some things in "Sex" column. Converting "Females" to the singular "Female". Also, "FeMale" to "Female".

UPDATE malaga_pop
SET Sex = 
REPLACE(REPLACE(Sex, 
'Males', 'Male'),
'Females', 'Female')

UPDATE malaga_pop
SET Sex = 
REPLACE(Sex,
'FeMale', 'Female')
ree




I also want to seprate Municpality and Postcode as they are in the same column. To do this I will create two new columns, seperate them, and then drop the original column.

ALTER TABLE malaga_pop
ADD Postcode VARCHAR(50), Municipality VARCHAR(100)
UPDATE malaga_pop
SET Postcode = LEFT(Municipalities,
CHARINDEX(' ', Municipalities) -1),
Municipality =
LTRIM(RIGHT(Municipalities,
LEN(Municipalities) - CHARINDEX(' ',
Municipalities)))
ALTER TABLE malaga_pop
DROP COLUMN Municipalities

Great! It's looking good. Are we there yet? Not quite! Should we have used Excel! Yes, it would have probably been a lot easier. Well, got some SQL practice in.


One final thing- in the Sex column, the dataset includes 'Total' as a value (calculation included within the Sex column). I'll drop that to include my own totals with the intention of performing streamlined aggregate Group By functions later on.


DELETE FROM malaga_pop where Sex = 'Total'

Let's have a look at the final product:


ree

There we have it, a cleaned dataset where we can now perform our data analysis. I would of course investigate the NULL values to see if the information could be obtained.

 
 
 

Comments


bottom of page