Multimedia Datasets: Challenges and Future Possibilities
Nguyen, Thu; Storås, Andrea; Thambawita, Vajira L B; Hicks, Steven; Halvorsen, Pål; Riegler, Michael Alexander
Chapter, Peer reviewed, Conference object, Journal article
Accepted version
Date
2023Metadata
Show full item recordCollections
Original version
https://doi.org/10.1007/978-3-031-27818-1_58Abstract
Public multimedia datasets can enhance knowledge discovery and model development as more researchers have the opportunity to contribute to exploring them. However, as these datasets become larger and more multimodal, besides analysis, efficient storage and sharing can become a challenge. Furthermore, there are inherent privacy risks when publishing any data containing sensitive information about the participants, especially when combining different data sources leading to unknown discoveries. Proposed solutions include standard methods for anonymization and new approaches that use generative models to produce fake data that can be used in place of real data. However, there are many open questions regarding whether these generative models hold information about the data used to train them and if this information could be retrieved, making them not as privacy-preserving as one may think. This paper reviews some important milestones that the research community has reached so far in important challenges in multimedia data analysis. In addition, we discuss the long-term and short-term challenges associated with publishing open multimedia datasets, including questions regarding efficient sharing, data modeling, and ensuring that the data is appropriately anonymized.