Data Analytics — Coding in Colab

Having written many articles on analytics, I want to make an index article summarising the whole picture and their relationships. First, data analytics is defined as follows:

“Data analytics is the pursuit of extracting meaning from raw data using specialized computer systems. These systems transform, organize, and model the data to draw conclusions and identify patterns.” (Informatica, 2021)

It therefore involves many processes, including data acquisition, data processing, data visualisation, data storage, data analysis, data forecasting, data simulations, predictive analytics by machine learning and artificial intelligence, decision making, etc. (Figure 1)

Figure 1 Data Analytics Processes. source: by the author

Each process can involve different software and different skills. Figure 2 shows some of the common software and apps for the process. For example, parsehub can do automatic data scrapping from webpages, parabola can automatically download data from API, Excel is powerful in data processing, which provides Power Query to handle big data. Data analysis can be divided into numerical data analysis and geographical data analysis, the former can have many powerful statistical or econometric software such as EViews, STAT, SPSS, the latter includes ArcGIS and QGIS, to name just a few. Data visualisation can have a lot of different software, such as Power BI and tableau. Data storage can include Google Cloud and Microsoft Dropbox, etc. Anylogic is good at doing simulations, Azure, Colab and Jupyter are developing platforms for machine learning and artificial intelligence.

Figure 2 Examples of different software and apps for different data analytics processes

It can cause a lot of communication and learning problems when one has to learn and link so many different software and apps. Sometimes we have to use several different software to accomplish just one goal. I just have a case that involves using ArcGIS to geocode a big dataset, then using Excel’s Power Query and pivot table to tabulate and extract the required data, and then importing into EViews to carry out econometric analysis.

That’s is the reason why I start to use Google Colab to try doing data analytics tasks. First, it does not require local data storage facilities as all processing can be done at the CPU and GPU provided by Colab and the files can be saved at the Google Drive / Cloud or Github. It also facilitates co-creation of knowledge and sharing of experience. Better still, it becomes a common platform to use different Python libraries. I have shown using OpenCV for face detection, Numpy for linear algebra, Pandas for panel data analysis, Geopandas for GIS analysis, Contextily for geographical data visualisation, Matplotlib for numerical data visualisation, FbProphet for forecasting, Tensorflow.Keras for machine learning (Yiu, 2021f), Scikit-learn for optimisation, etc. (Figure 3)

Figure 3 Tools and space accessible by Google Colab

Data Collection:

Data Processing:

Data Visualisation:

Data Analysis:

Data Forecasting:

Machine Learning:

*refer to post-July3 additions

References

Informatica (2021) What is Data Analytics? Glossary, Support & Training. Available online: https://www.informatica.com/nz/services-and-training/glossary-of-terms/data-analytics-definition.html

Yiu, C.Y. (2021a) Getting Data from API in Colab, Medium, June 14. https://ecyy.medium.com/getting-data-from-api-in-colab-57b9a445c791

Yiu, C.Y. (2021b) Automation versus Machine Learning — Text-to-Speech versus Speech-to-Text Apps, Medium, Feb 10. https://ecyy.medium.com/automation-versus-machine-learning-text-to-speech-versus-speech-to-text-apps-9ddb8fc26fde

Yiu, C.Y. (2021c) Plot a Scatter Bubble Chart, Medium, June 26. https://ecyy.medium.com/data-visualisation-how-to-plot-a-scatter-bubble-chart-by-plotly-d987c8cc6a3

Yiu, C.Y. (2021d) Forecasting by FB Prophet in Colab, Medium, May 31. https://ecyy.medium.com/forecasting-by-fb-prophet-in-colab-c9d4db2d4195

Yiu, C.Y. (2021e) Build My First AVM by Sklearn in Colab, Medium, June 7. https://ecyy.medium.com/build-my-first-avm-by-sklearin-colab-2db661c67b95

Yiu, C.Y. (2021f) Mapping by Geopandas in Colab , June 12. https://ecyy.medium.com/mapping-by-geopandas-in-colab-fe4b63b9ac00

Yiu, C.Y. (2021g) Mapping GIS Data on a Basemap by Contextily in Colab, June 29. https://ecyy.medium.com/mapping-gis-data-on-a-basemap-by-contextily-in-colab-dfff5837eec

Yiu, C.Y. (2021h) Learning Machine Learning - How to Code without Learning Coding, Medium, Feb 6. https://ecyy.medium.com/learning-machine-learning-how-to-code-without-learning-coding-9fc4291902b

Yiu, C.Y. (2021i) Write a Selfie and Face Detection App, Medium, July 2. https://ecyy.medium.com/write-a-selfie-and-face-detection-app-4c51e75ee9c5

Yiu, C.Y. (2021j) A Simple Rent-to-Price ML Estimator by Keras in Colab, Medium, June 22. https://ecyy.medium.com/a-simple-rent-to-price-ml-estimator-by-keras-in-colab-bd1fcfe16fe1

Yiu, C.Y. (2021k) Build My First Mortgage Calculator by Numpy in Colab, Medium, July 10. https://ecyy.medium.com/build-my-first-mortgage-calculator-by-numin-colab-d5390cbb6b9

Yiu, C.Y. (2021l) Build a Home Buying Investment e-Analyser by Numpy in Colab, Medium, July 11. https://ecyy.medium.com/build-a-home-buying-investment-e-analyser-by-numpy-in-colab-575ed7944a0a

Yiu, C.Y. (2021m) Measuring Areas and Distances of GIS Data by Geopandas in Colab, Medium, July 21. https://ecyy.medium.com/measure-area-and-distance-by-gis-data-by-geopandas-in-colab-47c1ffe5af6d

ecyY — easy to understand why, easy to study why. Finding the truths scientifically is the theme.