Languages Analysis Project - Power BI

Ryan Deuter
Sep 6, 2024
1 min read

Updated: Oct 3, 2024

In this projecect real data was used about language speakers (both native and second language) to determine which languages were most popular. Power BI was to clean data and produced visualizations in a dashboard.

Link to Power BI file:

https://github.com/ryandeuter/PortfolioProjects/blob/main/languages_analysis.pbix

BACKGROUND: Raw data was received and imported into Power BI to find insights about world's most prevalent languages.

PROCESS: There was some data cleaning necessary in Power Query to make the visualizations more accessible.

Below, we can see columns are not formatted with names. Also instead of numbers, columns have the world million, as well as additional characters that do not allow for proper aggregation and analysis.

Here is used Replace values under the Transform tab to replace "million" with nothing, to get rid of the million part. I then formatted the column to a proper number in the First-language, Second-language, and Total Speaker columns.

To get rid of the brackets in the Second-language column, I used "Split Column by Delimiter", chose the first bracket as as a delimiter, and deleted the additional column.

The cleaned dataset looks like this:

Here is the final dashboard. A treemap was used to show the prevalence of languages, while bar charts broke it down by three categories: Total Speakers, First-Language Speakers, and Second-Languages Speakers.

Languages Analysis Project - Power BI

Recent Posts

Comments