Multimedia Datasets: Challenges and Future Possibilities

Nguyen, Thu; Storås, Andrea; Thambawita, Vajira L B; Hicks, Steven; Halvorsen, Pål; Riegler, Michael Alexander

Nguyen, Thu; Storås, Andrea; Thambawita, Vajira L B; Hicks, Steven; Halvorsen, Pål; Riegler, Michael Alexander

Chapter, Peer reviewed, Conference object, Journal article

Accepted version

Åpne

MMM2023_Multimedia_datasets__challenges_and_future_possibilities.pdf (210Kb)

Permanent lenke

https://hdl.handle.net/11250/3123919

Utgivelsesdato

2023

Sammendrag

Public multimedia datasets can enhance knowledge discovery and model development as more researchers have the opportunity to contribute to exploring them. However, as these datasets become larger and more multimodal, besides analysis, efficient storage and sharing can become a challenge. Furthermore, there are inherent privacy risks when publishing any data containing sensitive information about the participants, especially when combining different data sources leading to unknown discoveries. Proposed solutions include standard methods for anonymization and new approaches that use generative models to produce fake data that can be used in place of real data. However, there are many open questions regarding whether these generative models hold information about the data used to train them and if this information could be retrieved, making them not as privacy-preserving as one may think. This paper reviews some important milestones that the research community has reached so far in important challenges in multimedia data analysis. In addition, we discuss the long-term and short-term challenges associated with publishing open multimedia datasets, including questions regarding efficient sharing, data modeling, and ensuring that the data is appropriately anonymized.

Utgiver

Springer