Automation versus Machine Learning — Text-to-Speech versus Speech-to-Text Apps
Do you want to write an app to read your text automatically? That is a text-to-speech app. It just requires 2 blocks of coding because it does NOT require machine learning algorithms, and is just an automated process.
Automation and machine learning can both be considered intelligence, but the former is the intelligence of the programmer, while the latter is the intelligence of the machine. However, they can easily be mixed up. I have defined automation as the 1st generation of intelligence, and learning as the 4th generation of intelligence since 2006 (Yiu, 2019).
The definitions are easy to spell out. When a rule is given by a programmer to a machine to perform a task, then it is an automation. When an algorithm is given by a developer to a machine to learn the rule by undergoing training, then it is machine learning (Fig 1).
In my previous article (Yiu, 2021a), I have shown a machine learning app to learn how to differentiate two persons by scanning their photos. Since we do not tell the machine the rules how to differentiate, just feed the machine with a lot of photos and answers, and the machine can learn the rules from the training data.
Then, this article is going to show you what is an automation app. I will show you with just 2 blocks of codes in Colab on how to make a text-to-speech app. It is amazing to hear the machine reading out everything you type in. But it is basically the intelligence of the gTTS, which is a kind of dictionary storing all the words and their corresponding pronunciations, defined as follows:
gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate’s text-to-speech API.” (https://gtts.readthedocs.io/en/latest/)
So the codes simply link the computer to the gTTS dictionary and find the words that you type in and read them out. You can try by yourself or listen to my Youtube (Yiu, 2021b) to see how it works.
In contrast, a speech-to-text app cannot be produced by the automation approach. It is because different people can have very different accents that the computer may not be able to understand accurately. Thus a better solution is to feed the machine with thousands or millions of your speech with the correct text as the answer, so that the machine can understand your pronunciations and translate them into texts more and more accurately after training. A similar app can be found in Youtube that you can choose to have auto-generated English subtitles (captions). But sometimes you will find errors in the subtitles probably because of the lack of training of the machine on the accent of the speaker.
The coding of a speech-to-text app is much more complicated, as it is not an automation but a machine learning algorithm. If you are interested in knowing how the codes can work, you may refer to an example at https://colab.research.google.com/github/scgupta/yearn2learn/blob/master/speech/asr/python_speech_recognition_notebook.ipynb
Yiu, C.Y. (2019) From Automation to Machine Learning, Medium, Jan 4. https://ecyy.medium.com/from-automation-to-machine-learning-c61fefe483f5
Yiu, C.Y. (2021a) Learning Machine Learning — Training People to Train a Machine, Medium, Feb 2. https://ecyy.medium.com/learning-machine-learning-7e273c1a4728
Yiu, C.Y. (2021b) Automation Text to Speech by Colab, Youtube, Feb 8. https://youtu.be/k2q2iT5zeqc