Billionaires Analysis Project - Python
- Ryan Deuter
- Sep 6, 2024
- 2 min read
Updated: Oct 3, 2024
This is a project analyzing real data about billionaires such as their net worth and the industries they operate in.
Link to Code:
BACKGROUND: Received raw data of billionaires and industries they operate in and aimed to find insights as to which industries produced the most.
PROCESS: Used Python (Jupter Notebooks) with Pandas for data cleaning and manipulation and Seaborn and Matplotlib for visualization.
First I imported the data:
import pandas as pd
df = pd.read_csv(r"C:\Users\Usuario\Desktop\Billionaires Statistics Dataset.csv")
df.head()
I then continued with an exploratory data analysis performing aggregate operations on data to find important insights:
df["age"].mean()
df["age"].min()
I then sorted the dataset by ascending age to see who the youngest billionaires were and in what country:
df.sort_values(by = "age")
In this case it is an 18 year old from Italy.
Aggregating number of billionaires per country:
df["country"].value_counts()
Here we ccan see the top 5 countries with the most billionaires.
Aggregating most billionaires per industry:
df["category"].value_counts()
Now, I am ready for some data visualization to put this into an easy to understand format.
I'm going to import two data viz libraries: Seaborn and Matplotlib. I'm also going to create a variable of the top 10 industries to assign to my bar chart.
import seaborn as sns
import matplotlib.pyplot as plt
top_10_industries = df["category"].value_counts().head(10)
top_10_industries
Condensed dataset:
Here is my code for the visual:
plt.figure(figsize=(12, 6))
sns.barplot(x=top_10_industries.values, y=top_10_industries.index, palette="viridis")
plt.xlabel("Number of Billionaires")
plt.ylabel("Industry")
plt.title("Top 10 Industries with the Most Billionaires")
plt.show()
Now for the bar chart showing top countries. I also used enumerate to add the data label of exact numbers next to each individual bar, adding another layer of insight.
top_10_countries = df["country"].value_counts().head(10)
top_10_countries
plt.figure(figsize=(12, 6))
colors = sns.color_palette("pastel")
sns.barplot(x=top_10_countries.values, y=top_10_countries.index, palette="muted")
for i, v in enumerate(top_10_countries.values):
plt.text(v + 1, i, str(v), color="black", va="center")
plt.xlabel("Number of Billionaires")
plt.ylabel("Country")
plt.title("Top 10 Countries with the Most Billionaires")
plt.xticks(rotation=45)
plt.show()
Comments