Data Analytics — Coding in Colab

Having written many articles on analytics, I want to make an index article summarising the whole picture and their relationships. First, data analytics is defined as follows:

“Data analytics is the pursuit of extracting meaning from raw data using specialized computer systems. These systems transform, organize, and model the data to draw conclusions and identify patterns.” (Informatica, 2021)

It therefore involves many processes, including data acquisition, data processing, data visualisation, data storage, data analysis, data forecasting, data simulations, predictive analytics by machine learning and artificial intelligence, decision making, etc. (Figure 1)

Figure 1 Data Analytics Processes. source: by the author

Each process can involve different software and different skills. Figure 2 shows some of the common software and apps for the process. For example, parsehub can do automatic data scrapping from webpages, parabola can automatically download data from API, Excel is powerful in data processing, which provides Power Query to handle big data. Data analysis can be divided into numerical data analysis and geographical data analysis, the former can have many powerful statistical or econometric software such as EViews, STAT, SPSS, the latter includes ArcGIS and QGIS, to name just a few. Data visualisation can have a lot of different software, such as Power BI and tableau. Data storage can include Google Cloud and Microsoft Dropbox, etc. Anylogic is good at doing simulations, Azure, Colab and Jupyter are developing platforms for machine learning and artificial intelligence.

Figure 2 Examples of different software and apps for different data analytics processes

It can cause a lot of communication and learning problems when one has to learn and link so many different software and apps. Sometimes we have to use several different software to accomplish just one goal. I just have a case that involves using ArcGIS to geocode a big dataset, then using Excel’s Power Query and pivot table to tabulate and extract the required data, and then importing into EViews to carry out econometric analysis.

That’s is the reason why I start to use Google Colab to try doing data analytics tasks. First, it does not require local data storage facilities as all processing can be done at the CPU and GPU provided by Colab and the files can be saved at the Google Drive / Cloud or Github. It also facilitates co-creation of knowledge and sharing of experience. Better still, it becomes a common platform to use different Python libraries. I have shown using OpenCV for face detection, Numpy for linear algebra, Pandas for panel data analysis, Geopandas for GIS analysis, Contextily for geographical data visualisation, Matplotlib for numerical data visualisation, FbProphet for forecasting, Tensorflow.Keras for machine learning (Yiu, 2021f), Scikit-learn for optimisation, etc. (Figure 3)

Figure 3 Tools and space accessible by Google Colab

Data Collection:

Data Processing:

Data Visualisation:

Data Analysis:

Data Forecasting:

Machine Learning:

*refer to post-July3 additions


Informatica (2021) What is Data Analytics? Glossary, Support & Training. Available online:

Yiu, C.Y. (2021a) Getting Data from API in Colab, Medium, June 14.

Yiu, C.Y. (2021b) Automation versus Machine Learning — Text-to-Speech versus Speech-to-Text Apps, Medium, Feb 10.

Yiu, C.Y. (2021c) Plot a Scatter Bubble Chart, Medium, June 26.

Yiu, C.Y. (2021d) Forecasting by FB Prophet in Colab, Medium, May 31.

Yiu, C.Y. (2021e) Build My First AVM by Sklearn in Colab, Medium, June 7.

Yiu, C.Y. (2021f) Mapping by Geopandas in Colab , June 12.

Yiu, C.Y. (2021g) Mapping GIS Data on a Basemap by Contextily in Colab, June 29.

Yiu, C.Y. (2021h) Learning Machine Learning - How to Code without Learning Coding, Medium, Feb 6.

Yiu, C.Y. (2021i) Write a Selfie and Face Detection App, Medium, July 2.

Yiu, C.Y. (2021j) A Simple Rent-to-Price ML Estimator by Keras in Colab, Medium, June 22.

Yiu, C.Y. (2021k) Build My First Mortgage Calculator by Numpy in Colab, Medium, July 10.

Yiu, C.Y. (2021l) Build a Home Buying Investment e-Analyser by Numpy in Colab, Medium, July 11.

Yiu, C.Y. (2021m) Measuring Areas and Distances of GIS Data by Geopandas in Colab, Medium, July 21.

ecyY — easy to understand why, easy to study why. Finding the truths scientifically is the theme.