Multimedia Datasets: Challenges and Future Possibilities
Nguyen, Thu; Storås, Andrea; Thambawita, Vajira L B; Hicks, Steven; Halvorsen, Pål; Riegler, Michael Alexander
Chapter, Peer reviewed, Conference object, Journal article
Accepted version
Permanent lenke
https://hdl.handle.net/11250/3123919Utgivelsesdato
2023Metadata
Vis full innførselSamlinger
Originalversjon
https://doi.org/10.1007/978-3-031-27818-1_58Sammendrag
Public multimedia datasets can enhance knowledge discovery and model development as more researchers have the opportunity to contribute to exploring them. However, as these datasets become larger and more multimodal, besides analysis, efficient storage and sharing can become a challenge. Furthermore, there are inherent privacy risks when publishing any data containing sensitive information about the participants, especially when combining different data sources leading to unknown discoveries. Proposed solutions include standard methods for anonymization and new approaches that use generative models to produce fake data that can be used in place of real data. However, there are many open questions regarding whether these generative models hold information about the data used to train them and if this information could be retrieved, making them not as privacy-preserving as one may think. This paper reviews some important milestones that the research community has reached so far in important challenges in multimedia data analysis. In addition, we discuss the long-term and short-term challenges associated with publishing open multimedia datasets, including questions regarding efficient sharing, data modeling, and ensuring that the data is appropriately anonymized.