Visualizing Loans Awarded by Kiva

In this project we'll visualize insights using a dataset from Kaggle. The dataset contains information about loans awarded by the non-profit Kiva.

Using Seaborn, we'll explore the average loan amount by country using aggregated bar charts. We'll also visualize the distribution of loan amounts by project type and gender using box plots and violin plots.

Step 1: Import Necessary Python Modules

In [1]:
import pandas as pd
import matplotlib.ticker as mtick
from matplotlib import pyplot as plt
import seaborn as sns

Step 2: Ingest The Data

Load kiva_data.csv into a DataFrame

In [2]:
kiva = pd.read_csv('kiva_data.csv')

Step 3: Preview The Data

If you would like, you can examine the raw CSV file on your local machine. You can find kiva_data.csv in the project's folder.

Overview of the dataset:

Each entry (row) in the dataset represents a loan that Kiva awarded to a particular project. The loan_amount column shows the amount (in U.S. dollars) awarded to the project. The activity column has the category type that the project falls under. The country column is the country where the project is located. The gender column represents the gender of the primary person who applied for the loan.

In [3]:
kiva.head(100)
Out[3]:
loan_amount activity country gender
0 625 Food Production/Sales Pakistan female
1 250 Food Production/Sales Pakistan female
2 400 Food Production/Sales Pakistan female
3 400 Food Production/Sales Pakistan female
4 500 Food Production/Sales Pakistan female
5 500 Food Production/Sales Pakistan female
6 400 Food Production/Sales Pakistan female
7 500 Food Production/Sales Pakistan female
8 400 Food Production/Sales Pakistan female
9 450 Food Production/Sales Pakistan female
10 250 Food Production/Sales Pakistan female
11 300 Food Production/Sales Pakistan female
12 275 Food Production/Sales Pakistan female
13 425 Food Production/Sales Pakistan female
14 425 Food Production/Sales Pakistan female
15 475 Food Production/Sales Pakistan female
16 225 Food Production/Sales Pakistan female
17 475 Food Production/Sales Pakistan female
18 525 Food Production/Sales Pakistan female
19 425 Food Production/Sales Pakistan female
20 475 Food Production/Sales Pakistan female
21 550 Food Production/Sales Pakistan female
22 450 Food Production/Sales Pakistan female
23 250 Food Production/Sales Pakistan female
24 600 Food Production/Sales Pakistan female
25 500 Food Production/Sales Pakistan female
26 450 Food Production/Sales Pakistan female
27 800 Food Production/Sales Pakistan male
28 250 Food Production/Sales Pakistan female
29 300 Food Production/Sales Pakistan female
... ... ... ... ...
70 500 Food Production/Sales Pakistan female
71 300 Food Production/Sales Pakistan female
72 675 Food Production/Sales Pakistan female
73 575 Food Production/Sales Pakistan female
74 775 Food Production/Sales Pakistan female
75 400 Food Production/Sales Pakistan female
76 300 Food Production/Sales Pakistan female
77 300 Food Production/Sales Pakistan female
78 400 Food Production/Sales Pakistan female
79 775 Food Production/Sales Pakistan female
80 300 Food Production/Sales Pakistan female
81 500 Food Production/Sales Pakistan female
82 450 Food Production/Sales Pakistan female
83 350 Food Production/Sales Pakistan female
84 525 Food Production/Sales Pakistan female
85 500 Food Production/Sales Pakistan female
86 400 Food Production/Sales Pakistan female
87 250 Food Production/Sales Kenya female
88 525 Food Production/Sales Kenya female
89 700 Food Production/Sales Kenya male
90 600 Food Production/Sales Kenya female
91 350 Food Production/Sales Kenya female
92 125 Food Production/Sales Kenya female
93 250 Food Production/Sales Kenya female
94 250 Food Production/Sales Kenya female
95 250 Food Production/Sales Kenya female
96 350 Food Production/Sales Kenya female
97 375 Food Production/Sales Kenya female
98 125 Food Production/Sales Kenya female
99 75 Food Production/Sales Kenya female

100 rows × 4 columns

Step 4: Bar Charts

We'll create a bar plot using Seaborn to visualize the average size of Kiva loans given to projects, by country and gender.

In [4]:
# Set color palette
sns.set_palette(sns.color_palette(['#1b9e77','#d95f02']))

# Set style
sns.set_style('whitegrid')

# Creates the figure
f, ax = plt.subplots(figsize=(9, 6))

# Add a title
ax.set_title('Mean Loans given by Kiva, by Country and Gender')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

# Plot the data
sns.barplot(data=kiva, x='country', y='loan_amount', hue='gender')
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352cf9e780>

Analysis

Across the board, men have been awarded larger grants than women, though in El Salvador that difference is small. Kiva should probably review their awarding procedures to eliminate any gender bias, whether it happens consciously or not. They should also consider other possible reasons for the disparity. For example, see if women consistently tend to apply for smaller loan amounts, and if found to be so, encourage them to apply for more.

Step 7: Box Plots

Distribution by Country

Now we will make a box plot to compare the distribution of loans by country.

In [5]:
f, ax = plt.subplots(figsize=(9, 6))

# Add a title
ax.set_title('Distribution of Loans given by Kiva, by Country')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

sns.boxplot(data=kiva, x='country', y='loan_amount')
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352cf096a0>

Analysis

From this chart it seems Kenya has the widest distribution in loan amounts given, followed closely by Cambodia and El Salvador. We can also tell that Cambodians are more likely to recieve a larger grant.

Distribution by Activity

Instead of visualizing the loan amount by country, we'll use sns.boxplot() to plot the loan amount by activity.

In [6]:
f, ax = plt.subplots(figsize=(9, 6))

# Add a title
ax.set_title('Distribution of Loans given by Kiva, by Activity')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

sns.boxplot(data=kiva, x='activity', y='loan_amount')
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352e6293c8>

Analysis

We see that the loans given for farming tend to be significantly larger than those for other activities. Perhaps this is the true explanatory variable infuencing loan amounts. For example, if farming projects necessitate greater loans, and Cambodia has a greater economic presence in farming relative to the other activities and countries, that would explain why they are awarded higher loans. If men tend to be more interested in farming projects, and women tend to be more interested in food production, that might explain the disparity between loans awarded to each gender.

Step 8: Violin Plots

Distrubition by Activity

Now we'll combare the distributions by activity using violin plots.

In [7]:
f, ax = plt.subplots(figsize=(9, 6))

# Add a title
ax.set_title('Distribution of Loans given by Kiva, by Activity')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

sns.violinplot(data=kiva, x='activity', y='loan_amount')
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352e6ad208>

Distribution by Country

Now we'll reverty to visualizing loan distribution by country, using a violin plot again.

In [8]:
f, ax = plt.subplots(figsize=(9, 6))

# Add a title
ax.set_title('Distribution of Loans given by Kiva, by Country')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

sns.violinplot(data=kiva, x='country', y='loan_amount')
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352e6ad128>

Step 9: Split Violin Plots

Distribution by Country and Gender

We'll use the hue parameter to visualize the distribution of loan amounts by country and gender.

In [9]:
f, ax = plt.subplots(figsize=(9, 6))

# Change color palette
sns.set_palette("Spectral")

# Add a title
ax.set_title('Distribution of Loans given by Kiva, by Country and Gender')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

sns.violinplot(data=kiva, x='country', y='loan_amount', hue='gender', split=True)
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352e93c588>

Analysis

We can see that while the range of loan amounts is similar between genders, men tend to recieve higher loan amounts.

Distribution by Activity and Gender

Earlier we hypothesized that the disparity of loan amount between genders may have to do with their interest in specific activities. Let's do a split violin plot to see if that is the case

In [10]:
f, ax = plt.subplots(figsize=(9, 6))

# Change color palette
sns.set_palette("Spectral")

# Add a title
ax.set_title('Distribution of Loans given by Kiva, by Activity and Gender')

# Format y-axis ticks
fmt = '${x:,.0f}'
tick = mtick.StrMethodFormatter(fmt)
ax.yaxis.set_major_formatter(tick)

sns.violinplot(data=kiva, x='activity', y='loan_amount', hue='gender', split=True)
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x2352e9b6128>

Analysis

We see that while some of the variation in loan amounts is correlated with the activity (farming tending to recieve the largest), there is definitely disparity between loan amounts by gender within a given activity (women recieving significantly less). We conclude that Kiva should continue to seek the source of this disparity and address it.