Vis enkel innførsel

dc.contributor.advisorZouganeli, Evi (Hovedveileder)
dc.contributor.advisorLentzas, Nasos (Biveileder)
dc.contributor.authorKirkvik, Emilia Basioli
dc.date.accessioned2022-09-12T10:55:51Z
dc.date.available2022-09-12T10:55:51Z
dc.date.issued2022
dc.identifier.urihttps://hdl.handle.net/11250/3017218
dc.description.abstractRapid advancements in Machine Learning (ML) have made it possible to equip computers with the ability to analyze, recognize and understand emotions. Facial Emotion Recognition (FER) is a technology that analyzes facial expressions in images to reveal information about a person's emotional state. Researchers from a variety of sectors are becoming increasingly interested in FER as it has a wide range of applications, but of special importance is human-computer interaction. In this thesis, a main purpose is to evaluate how can transfer learning influence model performance, and can traditional sequential-built models be reduced by transfer learning and still obtain similar accuracies. Seven categories of emotions are categorized from the FER-2013 database containing a total of 35887 images. The dataset is divided with a k-fold split method of 10 folds in order to separate independent training and testing indices. Among many techniques for FER, deep learning models have shown a great potential for powerful automatic feature extractions and computational efficiencies, especially the \acrfull{cnn}'s. This thesis will investigate the performance of one sequential-built CNN model, one pre-trained Resnet50 model and one pre-trained VGG16 model. Each model is compared and evaluated based on chosen metrics such as f1 and confusion matrices. Finally, a few selected images from the Japanese Female Facial Expres- sion (JAFFE) database is presented to the models as an experiment to introduce unseen facial expression images. Results show that a sequential base model of only a few convolutions ran on 50 epochs gave accuracy result of 89.7\%, with an average time of training per k-fold of 21 minutes. The two pre-trained models had fewer convolutions and gave accuracy results of 86.5\% without fine-tuning after only 20 epochs. Average training time was 12 minutes per k-fold. Use of slightly more epochs would have given more similar results. A pre-trained model with the use of transfer learning is therefore a recommended choice for saving computational training time and model building, but still receive similar accuracies in results.en_US
dc.language.isoengen_US
dc.publisherOsloMet - storbyuniversiteteten_US
dc.relation.ispartofseriesACIT;2022
dc.subjectMachine learningen_US
dc.subjectDeep learningen_US
dc.subjectEmotion recognitionen_US
dc.subjectConvolutional neural networken_US
dc.titleFacial emotion recognition using deep learningen_US
dc.typeMaster thesisen_US
dc.description.versionpublishedVersionen_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel