Music Search’s Coming Of Age

Published on: January 27, 2007February 18, 2014Author: jiinjooComment: 0

People who know me well enough would have heard me dreaming away about being able to “decompile” music, i.e. take an audio file and reverse it into a piece of sheet music, or score, which is the “source code” of music. Although I derive lots of freelancing revenue from doing just that, I always hoped that it could just be automated, and then fine tuned maybe later. Just like how we can decompile Java source code suing Mocha and the likes.

Civilization is progressing though.

Some time ago I’ve found a British company that has taken the first step in the commercial world that would “recognize” tunes hummed by people. Shazam Entertainment, the pioneering brand behind Avery Wang‘s work on sound separation since 1995 (article, presentation, articles) made it possible for people to name the tune that’s in their mind by humming at their mobile / hand / cell phone. Apparently this was license somewhere in Singapore but I have absolutely no idea what it is used for here. In short, quoting Avery’s words – his method is “non-symbolic”.

Today another site also got publicity. Midomi was starred at news.com for doing the same thing, but providing a lot more social networking elements on their website (actually Shazam has since done the same). The comparison engine, unlike that of Shazam, compares your humming to other’s submission as well and allow you to see what others submitted, wanting to create a mini-American Idol site maybe? Long live Stanford and CCRMA (if I have the energy at 31, I might just come spend some years here).

Both these are good examples of taking sound from human singing, remove the background noise, fix the distortion, match it against some huge, incredibly smartly indexed, database of millions of pop songs to yield some result. Yet one can easily see that matching audio samples is a humongous task, especially when “copyrights” on music, thus making the music unique, are based on the “mechanics”, i.e. the melody of the music (at least in the 21st century on mankind). Wouldn’t it be great if I can hum a tune like “Ode to Joy”, and back comes a classical recording of Beethoven’s 9th Symphony, of which the “theme” of the last movement is indeed this “Ode to Joy” folk tune? I don’t sing like a choir, nor will my voice ever equate a 100 man orchestra + choir, but because of the mechanics of the melody, I can find the tune that I want.

Sibelius music, where I tried to do some publishing for fun thing started introducing this meta data collection where you would key in the most important 12 notes, with no rhythm, as part of the score you publish, as their focus is really on the score part. Going the forward direction – from metadata to score or even to recordings and audio samples is a gazillion times much easier than going backwards.
Let’s continue to wait for the two to converge. It is very important to go back to the “source”, as that’s what can really be preserved over the years like books, where the writings can be copied from source to source and be performed for eternity. Getting back the source gives future generations a very deep insight of the generations in the past, and will permanently change what “copyright protection” means also.

Unfortunately this is probably the last thing that my bosses will let me work on…

Leave a Reply