| Subscribe via RSS

How to visualize music using animated spectrograms with open-source everything

April 29th, 2008 | No Comments | Posted in music

A while ago I was playing with music visualizations that actually corresponded to the noted being played (in contrast to most of the default visualizations in music players which are just meant to look pretty). This was what I came up with before:

In which the vertical axis corresponds to the frequency of the note being played and the horizontal motion is time. I’ve been poking around with this again recently, I’ll demonstrate the new improved version when I don’t have acres of aluminum to machine and essays about nuclear RNA export to write, which should be around Friday. In the meantime, various people have asked for information about how this early example was put together, so for posterity…

First I obtained music in WAV form, I chose to make this completely open source so used music from Wikimedia. This wave data was then converted into a text based data file giving the waveform as a value at each second on the left and right channels, by using a piece of software called Sox (short for SOund eXchange):

sox filename.wav filename.dat

Which gives you a file that looks like:

0.089342404   -0.00051879883  -0.00079345703
0.089365079   -0.00030517578   -0.0009765625
0.089387755   -0.00030517578  -0.00061035156

...

In which the first column is the time in seconds and the second and third are the left and right channels. Then I used a Python script (using PIL and numpy) to load these numbers and perform a running Fourier transform to extract all of the pure frequencies present at each timepoint, producing a spectrogram. The script is pretty messy as I was simultaneously learning Python, but I’ll try and explain.

First we load the required modules (numpy for FFTs, Image for rendering frames) and read the sample rate from the data file, as well as calculating the number of samples, etc:

import numpy
from numpy.fft import fft
import Image, ImageDraw, ImageOps, sys

filename=sys.argv[1]

f=open(filename, "r")
data=f.readlines()
print str(len(data)) +" samples"
samplerate=data[0].split()[3]
print samplerate+"Hz => "+str(1/float(samplerate))+" seconds per sample"
print str(len(data)/float(samplerate))+"s"
lengthins=len(data)/float(samplerate)

The next section sets, calculates and displays various pieces of information (for example, the last time point to process given the width of the transform and the desired length of time to process). The width indicates the amount of time to process with each transform, whereas the spacing is the separation in time between each of them:

length_to_process=100

fourierspersecond=24
fourierwidth=0.3
fourierspread=1.0/fourierspersecond

totaltransforms=round(length_to_process*fourierspersecond)
fourierspacing=round(fourierspread*float(samplerate))

fourierwidthindex=fourierwidth*float(samplerate)
print "For Fourier width of "+str(fourierwidth)+" need "+str(fourierwidthindex)+" samples each FFT"
print "Doing "+str(fourierspersecond)+" Fouriers per second"
print "Total " + str(totaltransforms*fourierspread)
print "Spacing: "+str(fourierspacing)
print "Total transforms "+str(totaltransforms)

lastpoint=round(length_to_process*float(samplerate)+fourierwidthindex)

The following initializes several arrays with zeros:

fourierarray=numpy.zeros(fourierwidthindex)
time=numpy.zeros(lastpoint)
sound=numpy.zeros(lastpoint)

This next bit averages the two channels and stores the result into the sound array:

for line in range(2,lastpoint):
  row=data[line].split()
  time[line]=float(row[0])
  sound[line]=(float(row[2])+float(row[1]))/2
  f.close

Now the real meat of the program. This first allocates an image large enough to store every time point to be calculated. Then we iterate through the data, extracting an array of the desired size at each point and loading it into fourierarray, which then has it’s Fourier transform taken. Finally, the data in the Fourier transform output array outfft is iterated through and the values are scaled to pixel values:


im=Image.new("RGB",(totaltransforms+offset,300))
imd=ImageDraw.Draw(im)

for position in range(0,totaltransforms):
  print "FFT: ",str(position).zfill(3)
  fourierarray=sound[((position*fourierspacing)):((position*fourierspacing)+(fourierwidthindex))]
  outfft=fft(fourierarray)
  for x in range(300):
    imd.point((position+offset,x),((255*((outfft[x].real)**2)/160),0,0))

This section renders movie frames by extracting parts of the complete spectrogram, recoloring them and adding a line. Each frame is output as a numbered JPG:

moviescanrange=100*24
moviewidth=400
movieheight=300
offset=round(moviewidth/2)

lowfrequency=30
highfrequency=1000

frame=Image.new("RGB",(moviewidth,movieheight))
linepos=(moviewidth/2)-fourierspersecond*fourierwidth

for xp in range(0,moviescanrange):
  print "Rendering frame "+str(xp).zfill(4)
  leftpart=im.crop((xp,0,xp+linepos,movieheight-1)).point(lambda i:i*0.4)
  rightpart=im.crop((xp+linepos,0,xp+moviewidth-1,movieheight-1))
  frame.paste(leftpart,(0,0))
  frame.paste(rightpart,(linepos,0))
  frame=ImageOps.flip(frame)
  framed=ImageDraw.Draw(frame)
  framed.line([(linepos,0),(linepos,500)],fill=(0,255,0))
  frame.save("frame_"+str(xp).zfill(4)+".jpg")

Finally, the JPGs are animated and combined with the original music into a movie using ffmpeg:

ffmpeg -i frame_%04d.jpg -i music.wav output.avi

And there you have it. You can also download all the source in one file soxtoframes.py.

Tags: , , , , , , , ,

The Goldberg Variations, Right Up Close and Personal

April 27th, 2008 | 1 Comment | Posted in bach, classical music, shostakovich

We just arrived back from a jaunt down to the city. The main attraction was my girlfriend’s brother’s kickarse piano recital including a Haydn sonata (which I thought had surprising amounts of dissonance, but maybe I’m just getting accustomed to that older stuff) a Shostakovich Prelude and Fugue, and the Mendelssohn Variations Sérieuses.

The main meat of the performance was the Goldberg Variations. This is one of those must-love/must-know-really-well pieces for all serious classical music people. But I don’t know it. Well, I’ve maybe heard it through once before when a friend lent me his Glenn Gould CD, but that’s about it. In general I think theme and variations form is awesome (it can beat up rondo form any day of the week), but it’s often well hard the first times through to put it together. In order to actually hear the variations you kind of need to know the unvaried theme enough to hum it (and mentally fill in the harmonics), or you tend to lose track of things in the variations.

This was definitely in effect yesterday. It wasn’t aided at all by the descriptions of the variations, which are not very descriptive. Take a peak over the extensive list of them on wikipedia. Instead of (relatively) easy to pick up on describers like “presto” they are instead mostly labeled with either “a 1 Clav” or “a 2 Clav”. This only served to confuse me more, as I knew enough score-Italian to suspect that Clav means keyboard. Are some of the variations meant to be played on two pianos?

Well, it turns out that is actually sort of correct, but in this case it’s 2 “manuals” on a harpsichord which as my limited understanding of harpsichord design tells me are multiple keyboards on the same instrument. This means that your hands are less likely to collide. However, on a piano you of course don’t have this luxury and it becomes extra-hard to perform, but he did a totally awesome job of it.

The other descriptive element which became a lot more descriptive after the fact are the labels Canone alla Seconda, alla Terza, etc. which I did manage to deduce were numbers, but did not manage to deduce meant that these were variations based on the major second interval, the interval of a third, and so on. In fact they are specifically in the form of a Canon, which is a form in which there is leading melody that gets followed by various imitating melodies.

If I had known all that I might have had a chance at spotting them as they came at me, but as it was I just basked in the music. Next time I shall be better mentally equipped to parse them all out. The concert still rocked though, despite my ignorance. He is truly an amazing pianist.

(Oh, by the way, if anyone is looking for parking recommendations to get into Manhattan with minimum fuss I can heartily recommend taking the ferry from Port Imperial at Weehawken. It’s $10 a day to park, and there are ferries to Midtown minimally every 20 minutes which take only about 5 minutes to cross. Then there are free buses when you get to 38th street.)

Tags: , ,

Absenteeism

April 26th, 2008 | 1 Comment | Posted in classical music

We’re going to the city today for g’s brother’s recital. I’ll leave you with links to the NPRs story on the Dakah hip hop orchestra, and the news that ASIMO, Honda’s favorite robot, will be “conducting” the Detroit Symphony Orchestra with Yo Yo Ma on May 13th. Be good while I’m gone.

Tags: , ,

Why are we this far through April already?

April 23rd, 2008 | No Comments | Posted in non music

Ah. I mean aahhh! Does that look scary and in peril? It should do, because today my PI (which for those not living in graduate labs is the terse version of Principal Investigator, the ultra big boss of the research group) casually announced that we need a bunch of results before the next grant deadline in order to get funding. Unfortunately the next grant deadline is in about five weeks; and the results are supposed to occur on a new microscope which is currently partly in pieces, and partly a bunch of diagrams on my computer.

This has led me to consider going into lab at 7:30am for the future few weeks, although it is even more likely to lead to huge amounts of guilt after hitting snooze and going in at 9:00.

What I am building is a new optical tweezers setup, which uses a laser beam to pull on microscopic stuff, and measure the forces and positions that the stuff experiences. In our lab we use it to poke and prod on the proteins which do things to your DNA. There are hundreds of tiny biological machines which unwind your DNA strands, or run along them like trains, or chop them up. For the most part we don’t really have a very good idea of how they work. If they were cars it would be as though we understood that they use an engine which needs gasoline and air, and if you rip the wheels off or replace one of them with a shoe then it won’t run, but we don’t understand how the engine actually works or what most of the other things under the hood are for. That’s what we are trying to work out.

So what appropriate music shall I listen to in order to commiserate with myself and pretend I am going to wake up early to? Shosty’s violin sonata I think, it’s a tough sucker to beat for the sparse futility earfeel effect factor.

Tags: ,

Rhythm and Intelligence Are Correlated

April 21st, 2008 | 3 Comments | Posted in music

A recent study conducted at Karolinska Institutet and Umeå University has shown a correlation between intelligence and being able to accurately tap out a rhythm. The journal article is published in Neroscience here. Unfortunately, since we live in an era in which most academic journals are not free (which is a damn shame) you probably can’t see it unless you are accessing it from a university.

Handily, I am accessing it from a university, and the gist is as follows:

They got a bunch of local Stockholm males, around 30 of them (yeah, I know it’s weird that it’s around thirty but they excluded araven.jpg couple from certain analyses because of technical problems) to perform the rather grandiosely described “isochronous interval production (tapping).” Additionally, the volunteers all took an intelligence test in the form of Raven’s Progressive Matrices, which consist of those really annoying identify-the-pattern type problems.

The tapping-testing consisted of each volunteer being played twenty cowbell hits at a certain tempo. The volunteer then tapped out the same rhythm for twenty beats, with a metronome as a guide, and finally tapped forty-five more beats completely on their own. They tried this for seven different tempos, doing each one twice, and took MRI images of the volunteers while this task was being performed. When they compared the amounts by which the taps were off to the score on the intelligence test, they found a correlation of around 0.5 (which means it’s a pretty convincing relationship).

Unfortunately the discussion of the results is jammed way too full of neuroscience jargon to extract much information. It seems like they are hypothesizing that a certain area of the brain (the right prefrontal cortex) is important for both automatic, repetitive timing tasks and working memory. Tantalizingly, they hint that there is a major neurological difference between a repetitive timing task such as this, and ones which require more “explicit” thought, which seems particularly interesting from a musical point of view. I wonder if perhaps after practicing a rhythm it moves from being controlled by the explicit timing processes into the automatic ones.

Time to break out the ol’ MRI machine, I guess.

Tags: , , , , ,