Getting Data from API in Colab

PropTech@ecyY
6 min readJun 14, 2021

--

We have tried learning various methods to read data in Colab, including

  1. read a csv data file from an url by pandas (Yiu, 2021a);
  2. read time series data from Yahoo Finance by web.DataReader (Yiu, 2021a);
  3. read a csv data file from google drive by drive.mount (Yiu, 2021b); and
  4. read GIS data from geopandas’ datasets (Yiu, 2021c).

However, all these methods rely on the data structure prepared by the data sources. The best way is of course to download from the raw data with the data structure defined by yourself, such as getting data from an API (Application Programming Interface).

But then it is quite complicated and difficult to code. There are some Apps that can get API data without coding, such as the Parabola which I have shown in a Youtube (Yiu, 2021d). Yet, it requires licensing fee. Some webpages nowadays provide API builders, which are quite user friendly in building the required JSON URL by choosing the corresponding parameters.

API (Application Programming Interface) in JSON (JavaScript Object Notation)

Here shows you an example of API Builder provided by the Rating and Valuation Department (RVD) of the Hong Kong Government at https://data.gov.hk/en-data/dataset/hk-rvd-tsinfo_rvd-property-market-statistics. Figure 1 shows an example of generating the API query string by selecting data filter in the database of Private Retail — Rental and Price Indices (from 1978) of Hong Kong. It allows adding filtering rules (blue button at the middle right edge of Figure 1) such as the Years, Rents and Prices. After setting the filters and press the Get result button (dark green button at the middle left edge of Figure 1), it generates automatically the API query string below as well as the API query result for cross-checking.

Figure 1 an example of API Builder by Rating and Valuation Department, Hong Kong Government. Source: https://data.gov.hk/en-data/dataset/hk-rvd-tsinfo_rvd-property-market-statistics/resource/87045fd8-2fce-4c3e-9b5e-de3d29f640df

Getting Data by a Representational State Transfer (REST) API in Colab

The API query string can then be used in Colab for coding to get the specified data. Let me explain about the format of the API string.

https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.rvd.gov.hk%2Fdoc%2Fen%2Fstatistics%2Fhis_data_12.xls%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%2C%22filters%22%3A%5B%5B1%2C%22bw%22%2C%5B%221997%22%5D%5D%5D%7D

The first section https://api.data.gov.hk/v2/ refers to the host_URL and version number. All the %22% are separators. For those who are familiar with the file naming system used by RVD would recognise that replacing the %22% by / in the second section (after filter?q=…) will show the filename of the data: https://www.rvd.gov.hk/doc/en/statistics/his_data_12.xls. The third section shows that the format is in JSON and the filter is from year 1997, etc.

Let’s try using this REST API in JSON to get the Hong Kong retail data in Colab. First, we import requests library and specify the URL as the API string provided. Then the command requests.get(url) will get the data according to the JSON API instructions. These codes can be found at the API Tutorial provided by Google Colab at https://colab.research.google.com/github/nestauk/im-tutorials/blob/3-ysi-tutorial/notebooks/APIs/API_tutorial.ipynb#scrollTo=hVZl4Fr07TM5

import requests  # Import the requests library

url = ('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.rvd.gov.hk%2Fdoc%2Fen%2Fstatistics%2Fhis_data_12.xls%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%2C%22filters%22%3A%5B%5B1%2C%22bw%22%2C%5B%221997%22%5D%5D%5D%7D')
print(url)

response = requests.get(url) # Make a GET request to the URL

# Print status code (and associated text)
print(f"Request returned {response.status_code} : '{response.reason}'")

# Print data returned (parsing as JSON)
payload = response.json() # Parse `response.text` into JSON

import pprint
pp = pprint.PrettyPrinter(indent=1)
pp.pprint(payload)
print(url)

The results are in JSON format as follows:

https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.rvd.gov.hk%2Fdoc%2Fen%2Fstatistics%2Fhis_data_12.xls%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%2C%22filters%22%3A%5B%5B1%2C%22bw%22%2C%5B%221997%22%5D%5D%5D%7D
Request returned 200 : 'OK'
[{'售價 Prices': 150.8, '年 Year': '1997', '月 Month': '1', '租金 Rents': 119.3},
{'售價 Prices': 156.2, '年 Year': '1997', '月 Month': '2', '租金 Rents': 123.5},
{'售價 Prices': 168, '年 Year': '1997', '月 Month': '3', '租金 Rents': 120.5},
{'售價 Prices': 170.7, '年 Year': '1997', '月 Month': '4', '租金 Rents': 123},
{'售價 Prices': 182.9, '年 Year': '1997', '月 Month': '5', '租金 Rents': 122.9},
{'售價 Prices': 180, '年 Year': '1997', '月 Month': '6', '租金 Rents': 126.8},
{'售價 Prices': 181.7, '年 Year': '1997', '月 Month': '7', '租金 Rents': 124.2},
{'售價 Prices': 195.9, '年 Year': '1997', '月 Month': '8', '租金 Rents': 125.2},
{'售價 Prices': 202.6, '年 Year': '1997', '月 Month': '9', '租金 Rents': 124.5},
{'售價 Prices': 192.2, '年 Year': '1997', '月 Month': '10', '租金 Rents': 123.5},
{'售價 Prices': 181.1, '年 Year': '1997', '月 Month': '11', '租金 Rents': 123.7},
{'售價 Prices': 165.3, '年 Year': '1997', '月 Month': '12', '租金 Rents': 125.2}]

API Query String Structure

However, if there is no API Builder, then how can we specify a valid JSON API query string to get data? I find many websites’ explanations very confusing, no wonder the tutorial picks the Eurostat’s to illustrate, as it is very structured and provides with examples.

Eurostat also provides a very powerful Query Builder, which can facilitate users to build an API without coding. Yet, with the structure of the API designed by Eurostat, it is quite easy to specify the API without using the Query Builder.

For example, if we are going to get the data of the 28 EU countries’ GDPs from 2011 to 2020 (nama_10_gdp) at https://ec.europa.eu/eurostat/web/main/data/database,

But if we want to filter only 2019–2020 data, how can we specify the API? Their website provides a detailed explanation of the API string as follows (https://ec.europa.eu/eurostat/web/json-and-unicode-web-services/getting-started/rest-request):

API Query String structure. Source: Eurostat at https://ec.europa.eu/eurostat/web/json-and-unicode-web-services/getting-started/rest-request

It is universal that the first section is the host_URL and version, etc. The second section is specifying the JSON format and language. The third section is the datasetCode and filters. There is no %2% but clear /s.

They also provide an example to explain how to specify the filters. For example, here are the four filters specified:

  • Time: 2019 and 2020 (‘&time=2019&time=2020’)
  • Geo: Germany and the Netherlands (‘&geo=DE&geo=NL’)
  • Unit : Chain-Linked Value (2010) million euro (‘&unit=CLV10_MEUR’)
  • National Account indicator : B1GQ — Gross domestic product at market prices (‘&na_item=B1GQ’)
  • The documentation for Eurostat Annual National Accounts (nama10) is available at https://ec.europa.eu/eurostat/cache/metadata/en/nama10_esms.htm

Let’s use the Eurostat’s Query Builder to generate the API string first, as it will show you all the symbols and short-forms of the filters. The Query Builder is available at https://ec.europa.eu/eurostat/web/json-and-unicode-web-services/getting-started/query-builder, please note that it requires users to append the result of the query generator to the fixed part of the url as shown in the figure below:

eurostat’s query builder

and here is the generated dataset code (API string in JSON):

nama_10_gdp?na_item=B1GQ&sinceTimePeriod=2019&precision=1&geo=DE&geo=NL&unit=CLV10_MEUR

Appending it to the fixed part of the url, the JSON API string becomes as follows:

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/nama_10_gdp?precision=1&geo=DE&geo=NL&&na_item=B1GQ&unit=CLV10_MEUR&time=2019&time=2020

Let’s put this JSON API into Colab to try.

import requests  # Import the requests library

# Query URL
url = ('http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/nama_10_gdp?precision=1&geo=DE&geo=NL&&na_item=B1GQ&unit=CLV10_MEUR&time=2019&time=2020')
print(url)

response = requests.get(url) # Make a GET request to the URL

# Print status code (and associated text)
print(f"Request returned {response.status_code} : '{response.reason}'")

# Print data returned (parsing as JSON)
payload = response.json() # Parse `response.text` into JSON

import pprint
pp = pprint.PrettyPrinter(indent=1)
pp.pprint(payload)

The results show that the GDPs of Germany and the Netherlands in 2019 and 2020 are as follows:

‘value’: {‘0’: 2978183.8, ‘1’: 2836259.3, ‘2’: 726188.6, ‘3’: 699019.5}

If you want to present it in a table format, please refer to google colab’s API Tutorial at https://colab.research.google.com/github/nestauk/im-tutorials/blob/3-ysi-tutorial/notebooks/APIs/API_tutorial.ipynb#scrollTo=WXa3OkKyp1TG

I have uploaded the codes at the following Github: https://github.com/Chung-collab/great/blob/master/get_api.ipynb

I also produce a Youtube at (Yiu, 2021e) to explain in more details.

References

Yiu, C.Y. (2021a) Forecasting by FB Prophet in Colab, Medium, May 31. https://ecyy.medium.com/forecasting-by-fb-prophet-in-colab-c9d4db2d4195

Yiu, C.Y. (2021b) Build My First AVM by Sklearn in Colab, Medium, Jun 7. https://ecyy.medium.com/build-my-first-avm-by-sklearin-colab-2db661c67b95

Yiu, C.Y. (2021c) Mapping by Geopandas in Colab, Medium, Jun 13. https://ecyy.medium.com/mapping-by-geopandas-in-colab-fe4b63b9ac00

Yiu, C.Y. (2021d) Big Data — Automatic Data Downloading by API using Parabola, Youtube, Apr 28.https://youtu.be/BaWZ5-5gsk0

Yiu, C.Y. (2021e) Getting Data from API in Colab, Youtube, https://youtu.be/kdqt9LngIvQ

--

--

No responses yet