AFTER I.M.A.Γ.E.

Intelligent Music hierArchical Genre classification framEwork






Overview Artificial Neural Networks + Musical Analysis + Statistical Classification

AFTER I.M.A.Γ.E. is a framework designed to augment and optimize the use of existing tools for real-time musical genre classification, using Artificial Neural Networks (ANNs). It should be noted that there is an inherently subjective component of this field of work, due to the subjective nature of the interpretation of music itself. By training these discriminatory agents to make decisions as close to that of human beings as possible, ANNs help to tackle this ambiguity--essentially appealing to an anthropocentric perception of music, to achieve greater classification accuracy. AFTER I.M.A.Γ.E. is powered by MARSYAS, "an open source software framework for audio processing with specific emphasis on Music Information Retrieval applications. It has been designed and written by George Tzanetakis (gtzan@cs.uvic.ca) with help from students and researchers from around the world. Marsyas has been used for a variety of projects in both academia and industry." AFTER I.M.A.Γ.E. leverages and optimizes MARSYAS, by analyzing MARSYAS's analysis, and rate of success on data, under specific settings and configurations. This meta-analysis is then used to automatically determine the best possible configuration of MARSYAS for a specific dataset. This approach has proven to not only achieve high rates of accuracy on the dataset in question, but also data outside the test parameters. AFTER I.M.A.Γ.E. is open source for Linux/Unix, and is distributed under the MIT license.



Components The Inner Workings

This component controls the flow of the classification processes a user creates, and manages the run of that process. Depending on the options chosen by the user, the Driver will either preprocess all the data and dependencies of the process, or prompt the user during the run for more configuration information, based on the state of the classification operation at that time.

The Driver also has the ability to install all the dependencies of the software suite, including those specific to M.A.R.S.Y.A.S. via the '-install' option

The varibles, or knobs in essence, of the classification process are as follows:

  1. Audio Normalization

    "The application of a constant amount of gain to an audio recording to bring the average or peak amplitude to a target level (the norm). Because the same amount of gain is applied across the given range, the signal-to-noise ratio and relative dynamics are generally unchanged." - Wikipedia. This guards against erroneous classification decisions made based on descrpancies in loudness between tracks.

  2. Memory Size

    The size/length of the samples accumulated within each analysis window.

  3. Accumulator Size

    The number of audio samples to be averaged for each analysis window within a song.

The Driver has two modes for running the classfication process:

  • Tunnel Mode

    This mode asks the user to configure Normalization, Memory Size, Accumulator Size, & dataset choice before beginning, after which the classification process runs uninterrupted until completion.

  • Pipleine Mode

    This mode asks the user to pick the dataset to use and make initial Normalization, Memory Size & Accumlator Size configurations. At the beginning of several stages of the run these options may be reconfigured based on preliminary results from the prior stages of the classification process. This is particularly useful in the early stages of research where seeing partial results based on a specific configuration.

This component is designed to optimize and simplify processing and preprocessing of data files for the Marsyas suite. This "Butler" handles many of the more pedestrian, but important operations needed to leverage the full power of the Marsyas suite of audio analysis and machine learning tools. These operations include:

  • Automation of music collection creation and labeling--conforming to the structure expected by Marsyas. This is a modular operation that can be selectively used for any combination of the Dataset & the Dummy Plug
  • Feature extraction from music collections, as well as tweaking the parameters and agents of the extraction process
  • House Keeping elements like maintaining logs & cleaning up after processes to ensure the program's stability.

Listed below is a breakdown of the datasets used in the original experiment by genre:


Dataset (11)

Blues
Classical
Country
Dance
Hip-Hop/Rap
Jazz
Metal
Pop
Reggae
Rock
Techno

Dummy Plug (18)

Breakbeat
Chill
Chiptune
Deathcore
Djent
Downtempo
Drum & Bass
Dubstep
Easy Listening
Electronicore
Experimental
Folk
Hardcore
Psychedelic
Industrial
Metalcore
Samba
World

FEATURED TRACKS


This program controls the dynamic creation and ranking of musical feature sets as well as combinations of those feature sets, which go through several rounds of analysis, recombination & consolidation; ultimately a single combination is used to train a Feed-Forward Artficial Neural Network which makes the classification decisions. Essentially, optimal individual musical feature sets are combined in various permutations, of which the optimal permutation(s) are run in conjunction with one of six classification configurations for M.A.R.S.Y.A.S. The best overall combination of musical features & configurations is what ultimately is used in training the FF-ANN. This process creates an ANN that is not only fine tuned to the dataset used, but also trained to generally be quite accurate in classifying music that the network is completely unfamiliar with.

α Configuration : Window size: 512, samples Hop size: 64 samples
β Configuration : Window size: 512 samples, Hop size: 128 samples
γ Configuration : Window size: 512 samples, Hop size: 256 samples
δ Configuration : Window size: 256 samples, Hop size: 16 samples
ε Configuration : Window size: 256 samples, Hop size: 64 samples
ζ Configuration : Window size: 256 samples, Hop size: 128 samples

  1. GS Classifier (Naive Bayes)

    A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions.

  2. SVM Classifier

    "Supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier." -Wikipedia

  3. ZEROR Classifier

    ZeroR is the simplest classification method which relies on the target and ignores all predictors. ZeroR classifier simply predicts the majority category (class). Although there is no predictability power in ZeroR, it is useful for determining a baseline performance as a benchmark for other classification methods.



Usage Like Clockwork

#Install M.A.R.S.Y.A.S dependecies 
usr@machine> $perl driver.pl -install

#Interactive Mode
usr@machine> $perl driver.pl -pipeline

#Run a pipline process with normalization on, a memory size of 20 samples per window, an accumulator size of 2000 samples, and send ouptut to Standard Out.
usr@machine> $perl driver.pl -tunnel 1 20 2000 1 

#Run the pipline process ouptut sent to file. 
usr@machine> $perl driver.pl -tunnel 1 20 2000 1 > outputfile.txt 

#Run real-time musical analysis using the Omega Neural Network generated by AFTER I.M.A.Γ.E. for the M.A.R.S.Y.A.S. "sfplugin" binary. The "ln" flag controls the length of playback. Therfore in this example each song will be played for 15 seconds before the framework attempts to classify the genere of the piece.
usr@machine> $../marsyas_master/bin/sfplugin -p Omega Neural Network.mpl -ln 15

#By default every process and subprocess run using M.A.R.S.Y.A.S. is logged. 
#Log files can be can be found in ai/project_files/logs/bext/


Statistics Just the Figures

Experiment

The results of this study were 63% accuracy on the stock dataset of 11 genres, while accuracy on the combination of the dataset and the “Dummy Plug" was 45% accuracy on average.

Experiment

In the case of just the dataset, the genres that were accurately classified the most often were Rock, Pop, Metal, and Dance. The genres that were incorrectly classified the most often were Reggae, Techno, and Jazz. The genres in between this range were Blues, Country, and Hip- Hop/Rap.

In the case of the dataset combined with the "Dummy Plug", for a grand total of 29 genres, the genres that were accurately classified the most often were Breakbeat, Dance, Dubstep, Hardcore, Metal, Metalcore, Psychedelic, and Rock. The genres that were incorrectly classified the most often were Country, Jazz, Reggae, Samba, and Techno. The genres in between this range were Blues, Chill, Chiptune, Classical, Deathcore, Electronicore and Hip-Hop/Rap.




Demonstration The Acid Test



Documentation Clarity is Paramount

Due to the simplicity of the AFTER I.M.A.Γ.E.'s components and scripts, the extent of the documentation for this framework is a help file that goes through sample usage and frequently asked questions. As work on the project continues more detailed docs will be available. The thesis describing this study, that helped to create this framework is also included, particularly due to nuances about the algorithms employed and the overall structure and logic of the approach that it contains. Additionally, the presentation given on this study in the fall of 2013 is also included in PDF form. This presentation is useful because it visualizes many of the abstract concepts created and referenced by the study.




Thesis

December 2013

403 KB


"On Training Neural Networks to Intuitively Classify Music Genre"

© Alexander A. Reid 2013

Documentation

December 2013

146 KB


This document is a combination of software documentation, FAQ's, and general help.

© Alexander A. Reid 2013

Presentation

December 2013

2.4 MB


This document is a combination of software documentation, FAQ's, and general help.

© Alexander A. Reid 2013

Doc Catalog

December 2013

2.4 MB


This package contains the article on AFTER I.M.A.Γ.E. (I), the documantation (II), and the presentation (III)

© Alexander A. Reid 2013

December 2013

32 MB

3e67d8cfc7d4ab2f0defb2201e27a316


This is the AFTER I.M.A.Γ.E. package, containing all the tools and components needed to recreate or extend this experiment, that are described above. AFTER I.M.A.Γ.E. is distributed under the MIT License.

© Alexander A. Reid 2013