Data Visualisation — How to Plot a Scatter Bubble Chart by Plotly

PropTech@ecyY
4 min readJun 26, 2021

Unlike a scatterplot that can only show the relationship between 2 variables, a Scatter Bubble Chart (SBC) can depict the relationships of three or even four variables. For example, the SBC below shows the relationship between Life Expectancy and GDP per capita in y and x axis, but also using different colors to differentiate between continents and using different sizes of bubble to differentiate between population sizes. Thus, it does not only show a linear relationship between life expectancy and GDP per capita (in log scale), but it can also show a lower (higher) GDP per capita in Africa (North America) and a larger population size in two Asian countries. SBC can hardly be produced in Excel, but it can easily be produced in just a few lines of codes by Plotly as shown below:

Figure 1 A Scatter Bubble Chart of Life Expectancy versus GDP per capita (in log scale). Source of Data: Our World in Data
  1. Install plotly and import plotly.express in Google Colab
! pip install plotly==5.0.0
import plotly.express as px

2. import a data file that can have 3 to 4 numeric variables to be shown in a Scatter Bubble Chart. I processed the data file on life-expectancy versus GDP per capita provided by Our World in Data at https://ourworldindata.org/life-expectancy, you may create your own csv file of any data you want. Here I import the data from my google drive which requires an authorisation code.

import pandas as pd
from google.colab import drive
drive.mount('/content/drive/')
data = pd.read_csv("drive/MyDrive/Colab Notebooks/life-expectancy-vs-gdp-per-capita.csv")
data.head(5)

3. To contrast the differences, here I plot a scatterplot first as shown in Figure 2. It does not tell which dots are which countries and they do not show any simple relationships.

#define df2015 as the data of Year=2015df2015 = data.query("Year==2015")#plot the scatter plot by px of df2015 with x, y axis definedfig2015 = px.scatter(df2015, x="GDP per capita", y="Life expectancy")
fig2015.show()
Figure 2 A scatterplot of life expectancy versus GDP per capita.

4. It allows a simple change of the axis to log scale to identify a stronger linear relationship between the two variables as shown below and in Figure 3.

df2015 = data.query("Year==2015")#just add log_x=True to convert x-axis to log scalefig2015 = px.scatter(df2015, x="GDP per capita", y="Life expectancy", log_x=True)
fig2015.show()
Figure 3 A scatterplot of life expectancy versus GDP per capita in log scale — a stronger linear relationship is shown.

5. A Scatter Bubble Chart can easily be plotted by adding size, size_max, color and hover_name parameters as shown below. Size refers to the size of each bubble, if size is specified as the figure of the variable “Total population”, then each bubble size will be proportional to the population size. Similarly, if color is specified as the figure of the variable “Continent”, then bubbles representing different continents will have different colors. By specifying hover_name, it also allows showing detailed information of each bubble by moving the cursor to the bubble. A legend on the color codes will automatically be shown on the top right hand corner as shown in Figure 4.

df2015 = data.query("Year==2015")
fig2015 = px.scatter(df2015, x="GDP per capita", y="Life expectancy", log_x=True, size="Total population", size_max=60, color="Continent", hover_name="Country")
fig2015.show()
Figure 4 A scatter bubble chart of life expectancy versus GDP per capita in log scale — a stronger linear relationship is shown and the clusters of countries in different continents and their population sizes can easily be compared.

A youtube showing the steps one by one is provided at Yiu (2021a). The codes are also shared in github (Yiu, 2021b).

For more details on plotly, please refer https://plotly.com/python/basic-charts/ for Plotly Python Open Source Graphing Library Basic Charts. It provides 18 types of basic charts as summarised in Figure 5 below, some of them are hard to produce in Excel, such as Sunburst Charts and Sankey Diagram. You can share your experience with us.

Figure 5 Plotly Python Open Source graphing library basic charts. source https://plotly.com/python/basic-charts/

Scatter Bubble Charts can be further improved, for example, the scatter bubble charts provided in Our World in Data webpage can even provide a time bar below to show a continuous change of the scatterplot over time, and allow users to choose different combinations of countries to plot as shown in Figure 6 or at https://ourworldindata.org/grapher/life-expectancy-vs-gdp-per-capita. You can share with us if you can further improve the plotting of scatter bubble charts

Figure 6 The Scatter Bubble Chart provided by Our World in Data webpage. https://ourworldindata.org/grapher/life-expectancy-vs-gdp-per-capita

Reference

Yiu, C.Y. (2021a) Data Visualisation — How to Plot a Scatter Bubble Chart by Plotly, Youtube, June 27. https://youtu.be/xWEPuXEKChk

Yiu, C.Y. (2021b) How to Plot a Scatter Bubble Chart by Plotly, Github, June 26. https://github.com/Chung-collab/great/blob/master/Plotly_Scatter_Bubble_Chart.ipynb

--

--