Use RNN to Classify Names of Hong Kong Pinyin

Figure 1 Hui Ka Yan, 431st on the Bloomberg Billionaires Index, Bloomberg (29 January 2022) https://www.bloomberg.com/billionaires/

Hong Kong Pinyin System

Figure 2 Examples of pinyin of Chinese names in different regions. Source: Cheung, Chan, Li & Yiu (2021)

Developed a Names Classifier

What is RNN

Figure 3 RNN recurrent neural network. https://morioh.com/p/1bc305d7dbd

Chinese Names Classifier

1. Download training materials

2. Input various program units

import pandas as pd
import tensorflow as tf
from sklearn import preprocessing
from tensorflow.python.client import device_lib
from keras.layers import Activation, Dense, Dropout, Input, Embedding, CuDNNLSTM, CuDNNGRU, GlobalMaxPooling1D, GlobalAveragePooling1D,Reshape, Conv1D, MaxPooling1D
import sklearn.model_selection
from sklearn.model_selection import train_test_split
import keras
import numpy as np
import os
import keras
import pickle

3. Install Google Drive and GPU

from google.colab import drive
drive.mount(‘/content/drive’)
with
open('/content/drive/My Drive/Colab Notebooks/model/tokenizer.pkl' , 'rb') as f:
tokenizer = pickle.load(f)

GRU_model = keras.models.load_model('/content/drive/My Drive/Colab Notebooks/model/m1.h5')

with open('/content/drive/My Drive/Colab Notebooks/model/le.pkl' , 'rb') as f:
le = pickle.load(f)

4. Do the training

import numpy as np
def proc_x(x, tokenizer):
tensor = tokenizer.texts_to_sequences(x)
tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,padding='post',maxlen=50)
return tensor


def predict_name(name, model, tokenizer,le, max_len=50):
fit = le.classes_[np.argmax(model.predict( proc_x(name, tokenizer) ), axis=1)]
return pd.DataFrame({'name':name, 'prediction':fit})

5. Test

predict_name([“HUI KA YAN”,”XU JIAYIN”], GRU_model, tokenizer, le, max_len=50)

6. Results

References:

--

--

--

ecyY — easy to understand why, easy to study why. Finding the truths scientifically is the theme.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Nexar’s Deep Learning Challenge: the winners reveal their secrets

Image Processing with Python: Image Segmentation using RG Chromaticity

Support Vector Machine

Machine Learning Laboratory Protocols

The Journey of a Machine Learning model from Building to Retraining

CycleGAN in layman terms

Four Important and Effective Statistical Models to Forecast your Demand

Performing Speech-to-Text Recognition with OpenVINO™

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ecyY

ecyY

ecyY — easy to understand why, easy to study why. Finding the truths scientifically is the theme.

More from Medium

Cluster Analysis on Adult Dataset

Making the Pass, Part 2: Training a Neural Network with KNIME

Using Geospatial Data to Discover Volcanoes in Hawaii

Hierarchical Clustering