- 3 views
THESIS DFENSE
Author : Sabah Shahnoor Anis
Advisor: Dr. Christian O'Reilly
Date: May 16, 2025
Time: 10:00 am
Place: Teams
Meeting Link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_ODY3YTA2NWMtMjk5…
Abstract
Ultrasonic vocalizations (USVs) are critical for understanding rodents' emotional states and social behaviors. However, manual analysis of USVs is time-consuming, subjective, and prone to errors. This thesis presents an automated pipeline that addresses these challenges by performing efficient USV detection and clustering. The proposed approach significantly reduces the time and effort needed to analyze USV data while improving accuracy and reproducibility.
To address this gap, we introduce ContourUSV, a five-step pipeline for USV detection. First, it begins with generating spectrograms from audio recordings, which are then pre-processed to enhance the contrast between USVs and background noise. Key steps include median filtering, global thresholding, and morphological operations to clean the spectrograms and highlight the contours of the USVs. Contours are detected using the OpenCV findContours function, and bounding boxes are drawn around each detected USV. The bounding box coordinates are then used to compute the time and frequency annotations of the USVs, allowing for precise temporal and spectral localization of the USVs. The detection system is validated against manually annotated datasets, demonstrating high precision, recall, and overall reliability.
In the clustering phase, deep unsupervised clustering of USVs (DUCUSV) is introduced, where preprocessed USV contours are further analyzed to reveal distinct patterns in vocal behavior. Dimensionality reduction is achieved using deep autoencoders, which compress the high-dimensional spectrogram data into a latent space suitable for clustering. After testing multiple unsupervised clustering algorithms, a hierarchical clustering approach called HDBSCAN is applied to group the USVs based on their spectro-temporal features. Various scores (Validity Index, Silhouette Coefficient, Calinski-Harabasz Index, and Davies-Bouldin Index) are used to evaluate the algorithms and determine the optimal number of clusters. Lastly, two data dimensionality reduction algorithms (Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE)) are employed to visualize the results.
For robustness and reliability, the ContourUSV detector was tested and compared against three other state-of-the-art systems with two datasets containing various rodent USV recordings. On average, across the two datasets, ContourUSV outperformed the other three systems by 1.51× improvement in precision, 1.17× in recall, 1.80× in F1 score, and 1.49× in specificity while achieving an average speedup of 117.07×. The DUCUSV pipeline is developed by comparing different autoencoder architectures and clustering algorithms. Our benchmark results show better performance with dense autoencoders and hierarchical clustering based on evaluation metrics such as the silhouette score and validity index. To address the limitation due to a scarcity of open-access datasets in this research area, we made a subset of our internal dataset open-access. This contribution will allow the research community to benchmark the reliability of USV detection and clustering tools using a common dataset.
This fully automated USV detection and clustering pipeline offers a scalable, objective, and accurate solution for rodent ultrasonic vocalization analysis. The integration of advanced clustering techniques enables researchers to uncover novel patterns in vocal behavior, providing deeper insights into rodent communication. The work presented in this thesis contributes to more efficient and reliable methods for USV analysis, supporting future research in behavioral neuroscience.