soundhack – Les said, the better II

How To Data Bend

I got email asking me how to do data bending for audio. (If you want to know how to do it with images, check out Hello Catfood’s posts.) Databending means taking one kind of data and using it as another kind of data. For example, playing an image file as a sound. Or processing an audio file with an image program and then returning it to it’s audio format. This post will focus on how to open non-audio files as if they were audio.
There are two different programs I’ve used for databending. One is Sound Hack, which is free, but mac-only and the other is Audacity, which is also free and cross-platform. For mac users, I suggest running 1.2.6 instead of the beta version.

Data Files

Opening data files with either Sound Hack or Audacity is easy. A data file is a file used by an application, for example a text file created by Word or Open Office or an image file or anything you might find in your Documents folder. With Sound Hack, under the File menu, select “Open Any” and pick a file. The go to the Hack menu and select “Header Change.” You can try a few different headers and listen to them until you pick one that you like. Once you’ve found a good one, go to the File menu and select “Save a Copy.” That will open a new dialog. At the top is the file name. Add a “.aiff” (without the quotes) to the end of the file name, no matter what you decide to name it. At the bottom, make sure to set the Format to “Audio IFF” and the Encoding to “16 bit linear.” I’ve found that Sound Hack does not save reliably into other formats.
To open a data file with Audacity, under the Project menu, select “Import Raw Data.” Pick the file you want to open. A dialog will pop up asking what header you want. I usually go with the default values, but you can try playing around with that. You can then modify the file with Audacity, using the Effects or whatever. When you think you’re done, first go to Preferences and then go to the File Format tab. Make sure that the format you want (Aiff, WAV, etc) is selected. Then, go to the File menu and “Export As” that file type.

Applications

You can also open applications as audio, but this is a bit weird on the mac. Go to the finder and find the application you want to open. Control-click on it. (by holding down the control key as you click). In the menu that pops up, select, “Show Package Contents” A new window should open with a folder in it called Contents. Open that folder, and you should find some stuff in it including a subfolder called MacOS. In that folder, you’ll find, probably, a file with the same name as the program. Like in Garage Band, under Contents/MacOS/ there’s a file called GarageBand and two other files, all of which may be interesting. Control-click on the file and select “Open With”. Then select “Other . . .”. A new dialog will open. In the Bottom part of the window, change the menu from “Recommended Applications” to “All Applications”. (Do NOT check the box under that!) Then find Audacity or Sound Hack, select it and click the Open button. If you use Sound Hack, you can try out different headers, by doing a Header Change under the Hack menu. Save these files is the same as described above.

Examples

I haven’t used this technique for years, but if you’re searching for examples of how it sounds, I’ve got some pieces. My supervisor, Scott Wilson, also uses this on his CD Muellmusik, in the track Photo Shopped Music. Other examples abound. You’ll find that if you do this a bit, you’ll not only be able to recognize other people doing it, but also sometimes be able to recognize what kind of file they’re using. Stochastic Synthesis also sounds quite a lot like data bending.

Podcast: Play in new window | Download

Talking about music

Aileen asks, “Is it just my limited experience, or is there really a paucity of sound-related terms in English?”
I’m intrigued by her question! Do she mean for describing a single sound event? I don’t speak any of my second languages well enough to give a comparison answer, but here are some technical terms in English:

timbre, tuning, tone, rhythm, tempo, loud, soft, dry, resonant, rich, rough, pure, metallic, high, low, nasal, tremolo, trill, vibrato, dissonant, consonant, atonal

If you are talking about a single sound event, most words would talk about the sound quality, so some of the above wouldn’t apply, as they refer to to multiple sound events. So you would likely want to talk about duration, amplitude, timbre and pitch. Amplitude is straight forward enough. And pitch is usually described in terms of high and low. timbre, the quality of the sound, is where you get the most words

Scientifically, any sound can be described by a finite number of sine waves. Specify their amplitude, pitch and phase and how they change over time. Timbre is strongly linked to how these sine waves are related to each other. A pure sound is one with few, harmonically related sine tones.
Harmonically related means that the frequencies of the sines are related by whole number ratios that are relatively simple. If you add the numerator and the denominator together, the smaller the sum is, the more pure the timbre. When talking about sounds this way, the lowest sine wave is often called the fundamental, and the higher ones are called overtones. (Many musical sounds have overtones that are just simple multiples of the fundamental.)
When the component sine waves of a sound are close to each other – specifically, so close that they fall within the critical band, you get roughness. (think of an instrument tuning, the sound is first rough, and then there’s a beating sound which gradually slows until they’re in tune.)
A rich sound is one with a more harmonically related overtones. If you get a whole lot of overtones (I think specifically odd ones), the sound is nasal.

Overtones that don’t have simple relationships with each other are called enharmonic. Enharmonic sounds are often described as metallic, especially if they have a lot of low or mid range frequency content and few highs and a bit of duration to them.
Noises, like twigs break, things clicking, etc, have more high frequency content, and are strong enharmonic and also very short. Sustained sounds with lots and lots of enharmonic content are called noise. Mathematically, noise can be described as the sum of an infinite number of sine waves – over an infinite amount of time, of course. Very short noise, as mentioned, is usually called “clicking.”
Vibrato is where the pitch moved up and down around a central pitch, where the deviation is too small to be perceived as moving to another note. A glissando is when it goes from one pitch to another, where the source and destination are perceived as separate notes. A trill is a sound that moved quickly between two pitches which are perceived as separate notes. A tremolo is a fast variation in amplitude, again with a specific amplitude center. Becoming louder is crescendoing or fading in. Becoming softer is decrescendoing or fading out.
Some timbrel terms describe the environment (whether “real” or “electronic”) in which a sound event occurred. A cathedral has a really long decay. If you clap your hands, the echo can go on for several seconds. a sound recorded in that environment would be described as resonant. Similarly, a sound recorded in a room with no echo would be described as dry.
There are a lot of other ways in which people describe sounds, but these are often metaphorical, describing the means of production, or comparing it to another sound. for instance a “booming” sound, is a low sound like, well, a boom. Vocal sounds are made with the voice. Screeching sounds. String sounds. Etc. some words are onomatopoetic. Crackle. Crunch. Crack. Clunk. Thunk. Boom. Whoosh. Sploosh. Splat. Vroom.
We have the most terms to talk about musical sounds, but the sounds most essential to survival are the non musical ones. A breaking twig does not have harmonically related overtones or sustained duration, but it might mean a predator is about to get you. It might not be a coincidence that so many of our onomatopoetic sounds describe these kinds of noises. Important sounds that communicate practical information.
Ok, a lot of the technical terms that I’ve named are actually italian, but are also part of the musical esperanto in that they’ve been adopted almost everywhere. (Crescendo in actual esperanto would be malsilentigxo.) I haven’t done much in other languages but dabble and listen to news podcasts, but if there’s one language that seems to never suffer from a paucity of terminology, it’s English.
I will concede, though, that English is not the best language for expression emotions, except for anger, which it, alas, excels at.
In unrelated news, tomorrow my dog goes to a kennel and in the afternoon/evening, I go to London. the day after, I fly to New York, where I still don’t know where I’ll be staying. Which I’m trying not to think too much about.

Subliminals, Timbre and Convolution

Recently, in Boing Boing, there was a post about a company marketing a subliminal message to gamers. They would hear the message 10000 – 20000 times a second. That’s 10 kHz – 20 kHz. Those repetitions are almost too high to be in the audio range! I can’t hear 20 kHz all that well. Also, what about scaling? To keep from peaking, the maximum amplitude of
each message would have to be between 0.00005 – 0.0001 of the total amplitude range. That’s pretty subliminal, all right.

I went to work trying to play a short aiff file over and over at that rate. My processor crapped out really fast. That’s a lot of addition. As I was falling asleep that night, I calculated that on a CD, each new message would start every 4 – 10 bytes! Why at that rate, it’s practically convolution.
Indeed, it is more than “practically” convolution, it is convolution and as such it doesn’t need to be done via real-time additions, but can be done via free software like SoundHack. The first step is getting a series of impulses. To try to create a “subliminal” message, you need a series of positive impulses that vary randomly between 10000 – 20000 times per second. I wrote a short SuperCollider program to produce such impulses.

SynthDef("subliminal-impulse", {arg out = 0;

 var white, ir;
 white = WhiteNoise.kr;
 white = white.abs;
 white = white * 10000;
 white = white + 10000;
 ir = Dust.ar(white);
 Out.ar(out, ir);
 
}).play

The WhiteNoise.kr produces random values between -1 and 1. We take the absolute value of that to just get numbers between 0 – 1. Then we multiply, to make them numbers between 0 – 10000 and add to put them in the range 10k – 20k.
Dust makes impulses at random intervals. The impulses are between 0 – 1. The argument is the average number of impulses per second. So Dust makes 10k – 20k impulses per second. Record the output of that to disk and you’ve got some noise, but it’s noise with some important characteristics – all the impulses are positive and they have zeros between them. This is what we need if we’re going to be subliminal at gamers.
Ok, so I’m going to take that file and open it SoundHack and save a copy of it as a 16bit file, rather and a 32 bit file. Then I’ll split the copy into separate mono files. (This is all under the file menu.) then, to save disk space, I’ll throw away the 32 bit file and the silent right channel. So now I have a 16bit mono file full of impulses open in SoundHack
Under the Hack menu, there’s an option called “Convolution.” Pick that. Check the box that says “Normalize” (that will handle the amplitude for you so the result is neither too quiet or too loud) and then hit the button that says “Pick Impulse.” This will be our recording of spoken text that we want made subliminal. (Fortunately, I had such a message at hand.) In actuality, it doesn’t matter which file is the one with the clicks and which is the one with the text. Convolution treats both files as equal partners. Then it asks us to name the output file. Then it goes, then we’re done. Here’s my result.
If you suddenly feel like forming a militia or running in fear, then it worked. If not, well, the sonic result is still kind of interesting. The timbres are all totally present but the actual sound events are unintelligible (at least to the conscious mind). For every one of our little impulses created by Dust.ar, we’ve got a new copy of Jessica plotting revolution. (The text is actually from Lesbian Philosophy: Explorations by Jeffner Allen (Palo Alto: Institute of Lesbian Studies, 1987) and the piece I originally made with it is here.)
This is actually a lot like granular synthesis, if you think about it. Imagine that instead of convolving the whole audio file, we just did 50ms bits of it. Every impulse would start a new copy of the 50 ms grain, but instead of with additions, with FFTs, which are faster – we can have many, many more grains. And they could be smaller and still be meaningful. Heck, they could be the size of the FFT window.
The FFT version of a convolution involves taking a window of the impulse and another of the IR (our subliminal message – normally known as an impulse response). You add the phases together and multiply the amplitudes. The amplitudes multiplications give us the right pitch and the phase addition gives us the right timing – almost. Some additions will be too big for the window and wrap around to the beginning. You can avoid that by adding zero padding. You double the size of the window, but only put input in the first half. Then none of your phases will wrap around.
We can get some very granular like processes, but with nicer sound and better efficiency. For example, time stretching. We could only update the IR half as often as the impulse stream and do window-by window convolutions. There are other applications here. I need to spend time thinking of what to do with this. Aside from sublimating revolution.

Podcast: Play in new window | Download