ABSTRAK Ivan Fadillah
Terbatas Irwan Sofiyan
» ITB
Terbatas Irwan Sofiyan
» ITB
Lip reading is very important when the information conveyed by someone is not heard clearly. However, lip reading is difficult for even a professional to do. This is because the subjective human view and the human visual effect on the movement of the phonemes that a person pronounces have many interpretations or ambiguous. Because of this difficulty, it is necessary to make a model that can automatically read the lips (lip reading). Currently, there have been several studies on automatic lip reading models, but some of the models currently being built can only predict English sentences and for Indonesian language research there is still very little and needs to be improved. Therefore, in this final project, the accuracy of the Indonesian lip reading model will be improved.
Indonesian lip reading model is implemented by utilizing a set of Indonesian-language video datasets, namely the AVID dataset obtained from research by Maulana and Fanany, (2017). This process begins with several preprocessing processes such as face detection, facial landmark extraction, facial alignment and cutting in the lip area. Then, the cutting results in the lip area were extracted with the Spatio Temporal Convolutional Neural Network (STCNN) to get features that are ready to be trained with the Recurrent Neural Network (RNN) algorithm and Connectionist Temporal Classification (CTC).
From the experimental results that have been implemented in this final project, LipNet architecture combined with correction word model provides the best performance Indonesian lip reading model with the achievement 8.26% WER (Word of Error Rate) which is evaluated from the AVID dataset. The evaluation of the model shows that the resulting model has better performance than direct human observation with the achievement around 52.98% WER (Maulana and Fanany, 2017).
Perpustakaan Digital ITB