Languages Analysis Project - Power BI
- Ryan Deuter
- Sep 6, 2024
- 1 min read
Updated: Oct 3, 2024
In this projecect real data was used about language speakers (both native and second language) to determine which languages were most popular. Power BI was to clean data and produced visualizations in a dashboard.
Link to Power BI file:
BACKGROUND: Raw data was received and imported into Power BI to find insights about world's most prevalent languages.
PROCESS: There was some data cleaning necessary in Power Query to make the visualizations more accessible.
Below, we can see columns are not formatted with names. Also instead of numbers, columns have the world million, as well as additional characters that do not allow for proper aggregation and analysis.
Here is used Replace values under the Transform tab to replace "million" with nothing, to get rid of the million part. I then formatted the column to a proper number in the First-language, Second-language, and Total Speaker columns.
To get rid of the brackets in the Second-language column, I used "Split Column by Delimiter", chose the first bracket as as a delimiter, and deleted the additional column.
The cleaned dataset looks like this:
Here is the final dashboard. A treemap was used to show the prevalence of languages, while bar charts broke it down by three categories: Total Speakers, First-Language Speakers, and Second-Languages Speakers.
Comments