top of page

Billionaires Analysis Project - Python

  • Ryan Deuter
  • Sep 6, 2024
  • 2 min read

Updated: Oct 3, 2024

This is a project analyzing real data about billionaires such as their net worth and the industries they operate in.


Link to Code:


BACKGROUND: Received raw data of billionaires and industries they operate in and aimed to find insights as to which industries produced the most.


PROCESS: Used Python (Jupter Notebooks) with Pandas for data cleaning and manipulation and Seaborn and Matplotlib for visualization.


First I imported the data:


import pandas as pd
df = pd.read_csv(r"C:\Users\Usuario\Desktop\Billionaires Statistics Dataset.csv")
df.head()
ree

I then continued with an exploratory data analysis performing aggregate operations on data to find important insights:



df["age"].mean()
df["age"].min()
ree
ree









I then sorted the dataset by ascending age to see who the youngest billionaires were and in what country:


df.sort_values(by = "age")
ree

In this case it is an 18 year old from Italy.


Aggregating number of billionaires per country:


df["country"].value_counts()
ree



Here we ccan see the top 5 countries with the most billionaires.








Aggregating most billionaires per industry:


df["category"].value_counts()

ree




















Now, I am ready for some data visualization to put this into an easy to understand format.


I'm going to import two data viz libraries: Seaborn and Matplotlib. I'm also going to create a variable of the top 10 industries to assign to my bar chart.


import seaborn as sns
import matplotlib.pyplot as plt
top_10_industries = df["category"].value_counts().head(10)
top_10_industries

Condensed dataset:


ree













Here is my code for the visual:

plt.figure(figsize=(12, 6))
sns.barplot(x=top_10_industries.values, y=top_10_industries.index, palette="viridis")
plt.xlabel("Number of Billionaires")
plt.ylabel("Industry")
plt.title("Top 10 Industries with the Most Billionaires")
plt.show()
ree

Now for the bar chart showing top countries. I also used enumerate to add the data label of exact numbers next to each individual bar, adding another layer of insight.

top_10_countries = df["country"].value_counts().head(10)
top_10_countries
plt.figure(figsize=(12, 6))
colors = sns.color_palette("pastel")
sns.barplot(x=top_10_countries.values, y=top_10_countries.index, palette="muted")

for i, v in enumerate(top_10_countries.values):
    plt.text(v + 1, i, str(v), color="black", va="center")

plt.xlabel("Number of Billionaires")
plt.ylabel("Country")
plt.title("Top 10 Countries with the Most Billionaires")
plt.xticks(rotation=45)
plt.show()
ree

 
 
 

Comments


bottom of page