Data Analytics — Coding in Colab
Having written many articles on analytics, I want to make an index article summarising the whole picture and their relationships. First, data analytics is defined as follows:
“Data analytics is the pursuit of extracting meaning from raw data using specialized computer systems. These systems transform, organize, and model the data to draw conclusions and identify patterns.” (Informatica, 2021)
It therefore involves many processes, including data acquisition, data processing, data visualisation, data storage, data analysis, data forecasting, data simulations, predictive analytics by machine learning and artificial intelligence, decision making, etc. (Figure 1)
Each process can involve different software and different skills. Figure 2 shows some of the common software and apps for the process. For example, parsehub can do automatic data scrapping from webpages, parabola can automatically download data from API, Excel is powerful in data processing, which provides Power Query to handle big data. Data analysis can be divided into numerical data analysis and geographical data analysis, the former can have many powerful statistical or econometric software such as EViews, STAT, SPSS, the latter includes ArcGIS and QGIS, to name just a few. Data visualisation can have a lot of different software, such as Power BI and tableau. Data storage can include Google Cloud and Microsoft Dropbox, etc. Anylogic is good at doing simulations, Azure, Colab and Jupyter are developing platforms for machine learning and artificial intelligence.
It can cause a lot of communication and learning problems when one has to learn and link so many different software and apps. Sometimes we have to use several different software to accomplish just one goal. I just have a case that involves using ArcGIS to geocode a big dataset, then using Excel’s Power Query and pivot table to tabulate and extract the required data, and then importing into EViews to carry out econometric analysis.
That’s is the reason why I start to use Google Colab to try doing data analytics tasks. First, it does not require local data storage facilities as all processing can be done at the CPU and GPU provided by Colab and the files can be saved at the Google Drive / Cloud or Github. It also facilitates co-creation of knowledge and sharing of experience. Better still, it becomes a common platform to use different Python libraries. I have shown using OpenCV for face detection, Numpy for linear algebra, Pandas for panel data analysis, Geopandas for GIS analysis, Contextily for geographical data visualisation, Matplotlib for numerical data visualisation, FbProphet for forecasting, Tensorflow.Keras for machine learning (Yiu, 2021f), Scikit-learn for optimisation, etc. (Figure 3)
Data Collection:
- by API requests — Yiu (2021a) Getting Data from API in Colab, June 14.
- by url, google.drive, Yahoo Finance, local drive — Yiu (2021a) Getting Data from API in Colab, June 14.
Data Processing:
- by gtts — Yiu (2021b) an Automatic Text-to-Speech Machine, Feb 10.
- by geopandas — Yiu (2021m) measure areas and distances on maps, Jul 21.*
Data Visualisation:
- by Plotly— Yiu (2021c) Plot a Scatter Bubble Chart, June 26.
- by Matplotlib — Yiu (2021d) Plot forecasting charts, May 31.
- by Seaborn — Yiu (2021e) Plot AVM heatmap and scatterplot, May 31.
- by Geopandas — Yiu (2021f) Mapping, June 12.
- by Contextily — Yiu (2021g) Map GIS Data on a Basemap, June 30.
Data Analysis:
- by Sklearn (regression) — Yiu (2021e) an AVM , June 26.
- by Numpy-financial (formula calculation) — Yiu (2021k) a Mortgage Calculator, July 10.*
- by Numpy-financial (what-if simulation) — Yiu (2021l) a NPV-IRR e-Analyser, July 11.*
Data Forecasting:
- by Prophet— Yiu (2021d) Forecasting HSI, May 31.
Machine Learning:
- by Numpy, Tensorflow & Keras — Yiu (2021h) Learning Machine Learning, Feb 6.
- by OpenCV— Yiu (2021i) Selfie and Face Detection, July 2.
- by Keras — Yiu (2021j) a Rent-to-Price ML Estimator, June 22.
*refer to post-July3 additions
References
Informatica (2021) What is Data Analytics? Glossary, Support & Training. Available online: https://www.informatica.com/nz/services-and-training/glossary-of-terms/data-analytics-definition.html
Yiu, C.Y. (2021a) Getting Data from API in Colab, Medium, June 14. https://ecyy.medium.com/getting-data-from-api-in-colab-57b9a445c791
Yiu, C.Y. (2021b) Automation versus Machine Learning — Text-to-Speech versus Speech-to-Text Apps, Medium, Feb 10. https://ecyy.medium.com/automation-versus-machine-learning-text-to-speech-versus-speech-to-text-apps-9ddb8fc26fde
Yiu, C.Y. (2021c) Plot a Scatter Bubble Chart, Medium, June 26. https://ecyy.medium.com/data-visualisation-how-to-plot-a-scatter-bubble-chart-by-plotly-d987c8cc6a3
Yiu, C.Y. (2021d) Forecasting by FB Prophet in Colab, Medium, May 31. https://ecyy.medium.com/forecasting-by-fb-prophet-in-colab-c9d4db2d4195
Yiu, C.Y. (2021e) Build My First AVM by Sklearn in Colab, Medium, June 7. https://ecyy.medium.com/build-my-first-avm-by-sklearin-colab-2db661c67b95
Yiu, C.Y. (2021f) Mapping by Geopandas in Colab , June 12. https://ecyy.medium.com/mapping-by-geopandas-in-colab-fe4b63b9ac00
Yiu, C.Y. (2021g) Mapping GIS Data on a Basemap by Contextily in Colab, June 29. https://ecyy.medium.com/mapping-gis-data-on-a-basemap-by-contextily-in-colab-dfff5837eec
Yiu, C.Y. (2021h) Learning Machine Learning - How to Code without Learning Coding, Medium, Feb 6. https://ecyy.medium.com/learning-machine-learning-how-to-code-without-learning-coding-9fc4291902b
Yiu, C.Y. (2021i) Write a Selfie and Face Detection App, Medium, July 2. https://ecyy.medium.com/write-a-selfie-and-face-detection-app-4c51e75ee9c5
Yiu, C.Y. (2021j) A Simple Rent-to-Price ML Estimator by Keras in Colab, Medium, June 22. https://ecyy.medium.com/a-simple-rent-to-price-ml-estimator-by-keras-in-colab-bd1fcfe16fe1
Yiu, C.Y. (2021k) Build My First Mortgage Calculator by Numpy in Colab, Medium, July 10. https://ecyy.medium.com/build-my-first-mortgage-calculator-by-numin-colab-d5390cbb6b9
Yiu, C.Y. (2021l) Build a Home Buying Investment e-Analyser by Numpy in Colab, Medium, July 11. https://ecyy.medium.com/build-a-home-buying-investment-e-analyser-by-numpy-in-colab-575ed7944a0a
Yiu, C.Y. (2021m) Measuring Areas and Distances of GIS Data by Geopandas in Colab, Medium, July 21. https://ecyy.medium.com/measure-area-and-distance-by-gis-data-by-geopandas-in-colab-47c1ffe5af6d