SoccerRAG: Multimodal Soccer Information Retrieval via Natural Queries
Abstract
The rapid growth and increasing availability of sports multimedia and metadata demands advanced information retrieval systems capable of efficiently processing vast amounts of multimodal data. This paper introduces SoccerRAG, an innovative framework that leverages Retrieval Augmented Generation (RAG), SQL agents, and Large Language Models (LLMs) to extract soccer-related information through natural language queries. By utilizing a multimodal dataset, SoccerRAG enables database querying, automatic data validation, and enhanced user interaction, making sports archives more accessible. Our evaluations demonstrate that SoccerRAG effectively handles complex queries and improves the accuracy of traditional SQL query systems. The results highlight the potential of RAG, agents, and LLMs in sports analytics, paving the way for future innovations in the accessibility and real-time processing of sports data.