Εμφάνιση απλής εγγραφής

dc.contributor.advisorΓιαννακόπουλος, Θεόδωρος
dc.contributor.authorΜουχάκης, Βασίλειος
dc.date.accessioned2023-11-09T09:29:00Z
dc.date.available2023-11-09T09:29:00Z
dc.date.issued2023-10-18
dc.identifier.urihttps://amitos.library.uop.gr/xmlui/handle/123456789/7675
dc.identifier.urihttp://dx.doi.org/10.26263/amitos-1178
dc.descriptionΜ.Δ.Ε. 94el
dc.description.abstractThis master thesis explores the application of Deep Metric Learning (DML) in the context of audio data representations. DML is a technique that leverages deep neural networks to automatically learn hierarchical representations from raw audio waveforms, aiming to capture the intricate relationships between audio samples. The central objective of this research is to evaluate the effectiveness of two prominent loss functions, Triplet Loss and Contrastive Loss, and their impact on creating meaningful audio embeddings. These embeddings are crucial for preserving the inherent similarities and dissimilarities between audio samples. In the investigation of Triplet Loss, eight different models were trained using Convolutional Neural Networks (CNNs), with the goal of optimizing the embeddings to position anchor points closer to their respective positive samples while maintaining a significant distance from negative samples. The research considered various distance metrics and scalers to assess their impact on the model's performance. The findings highlighted the third Triplet Loss model as the most successful, achieving remarkable results with distance metrics such as euclidean, minkowski, and cosine, particularly when combined with the Normalizer scaler. This demonstrates the capability of Triplet Loss in generating audio embeddings that effectively preserve the underlying similarities between songs, with the normalization step being a critical factor in enhancing the model's performance. Conversely, the Contrastive Loss experiments involved two distinct models using the ResNet50 architecture. Contrastive Loss aims to minimize the distance between similar audio samples and maximize the distance between dissimilar ones. The research explored the influence of various distance metrics and scalers on the performance of these models. The results revealed that the second Contrastive Loss model outperformed the other variant, achieving notable scores when evaluated with the correlation distance metric, particularly when paired with the MinMaxScaler. This underlines the ability of Contrastive Loss to generate highly discriminative audio embeddings that capture the pairwise relationships in audio data representations. In summary, this research emphasizes the importance of selecting the most appropriate loss function, depending on the nature of the audio data and the specific requirements of the task at hand. Triplet Loss, with its emphasis on relative comparisons between samples, is shown to be a potent choice, especially when normalized audio embeddings are utilized. Contrastive Loss, on the other hand, focuses on pairwise comparisons and demonstrates its effectiveness in capturing inherent pairwise similarities in audio data representations. Moreover, the evaluation of various distance metrics and scalers highlights their substantial impact on the model's performance. The choice of distance metric should align with the nature of the audio data and the specific similarity task, while selecting an appropriate scaler is vital for optimizing models to learn meaningful embeddings from audio data representations. In conclusion, this research contributes valuable insights into the realm of deep metric learning applied to audio data, particularly audio representations of songs. The findings demonstrate that both Triplet Loss and Contrastive Loss can be valuable tools for capturing song similarities in their respective ways. This knowledge will serve as a foundation for further advancements in deep metric learning for audio similarity tasks, offering guidance to researchers and practitioners in selecting appropriate loss functions, network architectures, and evaluation methodologies to achieve optimal performance in various audio-related applications, such as music recommendation, audio retrieval, and content-based audio search. As the field continues to evolve, we anticipate further refinements in loss functions, network architectures, and evaluation techniques, pushing the boundaries of deep metric learning and its applications in audio data analysis.el
dc.format.extent58el
dc.language.isoenel
dc.publisherΠανεπιστήμιο Πελοποννήσουel
dc.rightsΑναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/gr/*
dc.titleDeep Metric Learning for Music Information Retrievalel
dc.typeΜεταπτυχιακή διπλωματική εργασίαel
dc.contributor.committeeΠαλιούρας, Γεώργιος
dc.contributor.committeeΜοσχολιός, Ιωάννης
dc.contributor.departmentΤμήμα Πληροφορικής και Τηλεπικοινωνιώνel
dc.contributor.facultyΣχολή Οικονομίας και Τεχνολογίαςel
dc.contributor.masterΕπιστήμη Δεδομένωνel
dc.subject.keyworddeep metric learningel
dc.subject.keyworddeep learningel
dc.subject.keywordaudio representationsel
dc.subject.keywordtriplet lossel
dc.subject.keywordcontrastive lossel
dc.subject.keywordaudio embeddingsel
dc.subject.keywordmusic information retrievalel
dc.subject.keywordaudio similarityel


Αρχεία σε αυτό το τεκμήριο

Thumbnail

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής

Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα
Εκτός από όπου επισημαίνεται κάτι διαφορετικό, το τεκμήριο διανέμεται με την ακόλουθη άδεια:
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα