Exploring multilingual and contextual properties in word representations from BERT

Aaby, Pernille

dc.contributor.advisor	Mello, Gustavo
dc.contributor.advisor	Yazidi, Anis
dc.contributor.author	Aaby, Pernille
dc.date.accessioned	2022-09-13T08:44:36Z
dc.date.available	2022-09-13T08:44:36Z
dc.date.issued	2022
dc.identifier.uri	https://hdl.handle.net/11250/3017423
dc.description.abstract	Nowadays, contextual language models can solve a wide range of language tasks such as text classification, question answering and machine translation. These tasks often require the model to have knowledge about general language understanding, like how words relate to each other. This understanding is acquired through a pre-training stage where the model learn features from raw text data. However, we do not fully understand all the features the model learns through this pre-training stage. Does there exists information yet to be utilized? Can we make predictions more explainable? This thesis aims to extend the knowledge of what features a language model have acquired. We have chosen the model architecture BERT and have analyzed its word representations from two feature perspectives. The first perspective investigated similarities and dissimilarities between English and Norwegian word representations by evaluating their performance on a word retrieval task and a language detection task. The second perspective analyzed how a word representation changes if the word stands in the wrong context or if the word was inferred through the model without context.	en_US
dc.language.iso	eng	en_US
dc.publisher	OsloMet - storbyuniversitetet	en_US
dc.relation.ispartofseries	ACIT;2022
dc.subject	Multilingual models	en_US
dc.subject	Word embeddings	en_US
dc.title	Exploring multilingual and contextual properties in word representations from BERT	en_US
dc.type	Master thesis	en_US
dc.description.version	publishedVersion	en_US

Tilhørende fil(er)

Filnavn:: aaby-acit2022.pdf
Størrelse:: 3.653Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

TKD - Master i Anvendt data- og informasjonsteknologi (ACIT) [243]

Vis enkel innførsel