Facial emotion recognition using deep learning
Master thesis
Published version
Permanent lenke
https://hdl.handle.net/11250/3017218Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
Sammendrag
Rapid advancements in Machine Learning (ML) have made it possible to equip computers with the ability to analyze, recognize and understand emotions. Facial Emotion Recognition (FER) is a technology that analyzes facial expressions in images to reveal information about a person's emotional state. Researchers from a variety of sectors are becoming increasingly interested in FER as it has a wide range of applications, but of special importance is human-computer interaction.
In this thesis, a main purpose is to evaluate how can transfer learning influence model performance, and can traditional sequential-built models be reduced by transfer learning and still obtain similar accuracies. Seven categories of emotions are categorized from the FER-2013 database containing a total of 35887 images. The dataset is divided with a k-fold split method of 10 folds in order to separate independent training and testing indices.
Among many techniques for FER, deep learning models have shown a great potential for powerful automatic feature extractions and computational efficiencies, especially the \acrfull{cnn}'s. This thesis will investigate the performance of one sequential-built CNN model, one pre-trained Resnet50 model and one pre-trained VGG16 model. Each model is compared and evaluated based on chosen metrics such as f1 and confusion
matrices. Finally, a few selected images from the Japanese Female Facial Expres-
sion (JAFFE) database is presented to the models as an experiment to introduce unseen facial expression images.
Results show that a sequential base model of only a few convolutions ran on 50 epochs gave accuracy result of 89.7\%, with an average time of training per k-fold of 21 minutes. The two pre-trained models had fewer convolutions and gave accuracy results of 86.5\% without fine-tuning after only 20 epochs. Average training time was 12 minutes per k-fold. Use of slightly more epochs would have given more similar results. A pre-trained model with the use of transfer learning is therefore a recommended choice for saving computational training time and model building, but still receive similar accuracies in results.