Facial emotion recognition using deep learning

Kirkvik, Emilia Basioli

Kirkvik, Emilia Basioli

Master thesis

Published version

Åpne

kirkvik-acit2022.pdf (11.03Mb)

Permanent lenke

https://hdl.handle.net/11250/3017218

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

TKD - Master i Anvendt data- og informasjonsteknologi (ACIT) [243]

Sammendrag

Rapid advancements in Machine Learning (ML) have made it possible to equip computers with the ability to analyze, recognize and understand emotions. Facial Emotion Recognition (FER) is a technology that analyzes facial expressions in images to reveal information about a person's emotional state. Researchers from a variety of sectors are becoming increasingly interested in FER as it has a wide range of applications, but of special importance is human-computer interaction.

In this thesis, a main purpose is to evaluate how can transfer learning influence model performance, and can traditional sequential-built models be reduced by transfer learning and still obtain similar accuracies. Seven categories of emotions are categorized from the FER-2013 database containing a total of 35887 images. The dataset is divided with a k-fold split method of 10 folds in order to separate independent training and testing indices.

Among many techniques for FER, deep learning models have shown a great potential for powerful automatic feature extractions and computational efficiencies, especially the \acrfull{cnn}'s. This thesis will investigate the performance of one sequential-built CNN model, one pre-trained Resnet50 model and one pre-trained VGG16 model. Each model is compared and evaluated based on chosen metrics such as f1 and confusion

matrices. Finally, a few selected images from the Japanese Female Facial Expres-

sion (JAFFE) database is presented to the models as an experiment to introduce unseen facial expression images.

Results show that a sequential base model of only a few convolutions ran on 50 epochs gave accuracy result of 89.7\%, with an average time of training per k-fold of 21 minutes. The two pre-trained models had fewer convolutions and gave accuracy results of 86.5\% without fine-tuning after only 20 epochs. Average training time was 12 minutes per k-fold. Use of slightly more epochs would have given more similar results. A pre-trained model with the use of transfer learning is therefore a recommended choice for saving computational training time and model building, but still receive similar accuracies in results.

Utgiver

OsloMet - storbyuniversitetet

Serie

ACIT;2022