top of page

Languages Analysis Project - Power BI

  • Ryan Deuter
  • Sep 6, 2024
  • 1 min read

Updated: Oct 3, 2024

In this projecect real data was used about language speakers (both native and second language) to determine which languages were most popular. Power BI was to clean data and produced visualizations in a dashboard.


Link to Power BI file:


BACKGROUND: Raw data was received and imported into Power BI to find insights about world's most prevalent languages.


PROCESS: There was some data cleaning necessary in Power Query to make the visualizations more accessible.


Below, we can see columns are not formatted with names. Also instead of numbers, columns have the world million, as well as additional characters that do not allow for proper aggregation and analysis.

ree

Here is used Replace values under the Transform tab to replace "million" with nothing, to get rid of the million part. I then formatted the column to a proper number in the First-language, Second-language, and Total Speaker columns.

ree
ree

To get rid of the brackets in the Second-language column, I used "Split Column by Delimiter", chose the first bracket as as a delimiter, and deleted the additional column.

ree

The cleaned dataset looks like this:

ree

Here is the final dashboard. A treemap was used to show the prevalence of languages, while bar charts broke it down by three categories: Total Speakers, First-Language Speakers, and Second-Languages Speakers.

ree

 
 
 

Comments


bottom of page