DCASE Workshop and Freesound Day

The Detection and Classification of Acoustic Scenes and Events (DCASE) Workshop takes place at the Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain, Oct.29-31, 2025 (including BioDCASE that focuses on bio-acoustics on Oct.29). DCASE is an annual challenge and international conference on computational audio scene analysis and environmental audio AI. Prof. Shuo Zhang participates in DCASE 2025 in Barcelona, where he serves as the co-chair of industrial liaisons and a visiting researcher at the MTG from October 20 to October 31.

The Freesound 20th anniversary celebration event also takes place in the same week on October 28, 2025. Freesound.org is a premier platform for crowd sourced audio and sound recordings / sound effects sharing platform since 2005. In recent years, it has become an indispensable resource for numerous audio datasets in deep learning and audio AI, including the FSD50k dataset. The Freesound Day will include talks by the Freesound team and members of the community. Participants will share their personal and professional experiences with the platform, as well as highlight various projects that have emerged around it. There is also a composition competition using Freesound.

PhD thesis defense at MTG, UPF

Benno Weck (Music Technology Group, Universitat Pompeu Fabra) is presenting his doctoral thesis defense for PhD in Information and Communication Technologies on Oct 27, 2025, in Barcelona, Spain. Prof.Shuo Zhang is a thesis committee member in attendance. The thesis is entitled “Content-based retrieval in large-scale audio collections with natural language as the interface”. It is a body of research centered on the intersection between natural language processing and audio/music AI. This PhD thesis is supervised by Prof. Xavier Serra of MTG, UPF.

Abstract: Audio collections, ranging from music archives to environmental sound libraries, have been growing quickly. However, these vast resources remain largely underutilised due to sparse metadata and limited search capabilities. This thesis investigates content-based retrieval in large-scale audio collections using natural language as the interface, with the goal of enabling more intuitive and expressive access to audio content. We address three central challenges: system design, data availability, and evaluation.  For system design, we explore two primary directions. First, in audio captioning, we compare combinations of pretrained word embedding and machine listening models within a Transformer-based architecture. Second, in language-based retrieval, we investigate fine-tuning strategies for pretrained encoder models in a bi-encoder setup, considering different loss functions and the effects of augmenting training data with noisy audio-text pairs. To address the scarcity of paired text-music data, we introduce two novel datasets: Song Describer, a crowd-sourced collection of music captions, and WikiMuTe, which pairs music audio with encyclopedic textual descriptions. These datasets provide new resources for both evaluating and training multimodal models. In our evaluation work, we identify data leakage issues in an existing benchmark and propose more realistic dataset splits. We also introduce MuChoMusic, a multiple-choice question-answering benchmark designed to assess music understanding in multimodal models. Additionally, a user study explores how system constraints shape natural language query behaviour, revealing a tendency toward short queries despite a willingness to provide more detailed input. Together, these contributions aim to advance the integration of natural language and audio understanding and lay the foundations for richer interaction with audio content.

Visionular CEO Zoe Liu Tufts ECE talk

Dr. Zoe Liu, CEO of Visionular

On September 26, 2025, Prof.Shuo Zhang invited Visionular CEO Dr.Zoe Liu to give a talk at the Tufts ECE Seminar (10:30-noon, JCC170), hosted by Prof. Yingjie Lao and AIDA collaborator, ECE PhD student Rui Chu. 30+ students and faculty from ECE, DA, and others attended the lecture, preceded/followed by faculty meeting and lunch with Rui Chu, Prof. Shuchin Aeron, Prof. Shuo Zhang and Prof. Peter Lu.