Article | Posted on September 2013

Loading sound files faster using Array Buffers and Web Audio API

Reading time: 10 minutes

Topics: JavaScript, Web Audio API, Typed Arrays

This is a legacy post, ported from the old system:
Images might be too low quality, code might be outdated, and some links might not work.

This idea came during the process of making Gravity more lightweight. After reducing images, minifying CSS and JS files, compacting long XML 3D assets files into binary arrays, etc. there were still a lot of separate requests for sound files. Each sound effect came in its own file.

While working in redacted I tried several approaches to compact files so downloads and loads were more efficient. One of the first solutions was using zip.js, which works really well, but has some problems with workers and memory management: depending on the size of the file to uncompress, the impact of memory thrashing is too high.

The final solution implied using Typed Arrays and Blobs to create a binary file containing several assets, in a contiguous file, with a binary-encoded JSON string that contained all the indices and metadata of the files.

Reducing requests by concatenating files

The idea is to reduce the number of different requests by unifying all of them in a single file (or several files, see below), which can act like a repository of sounds. At the same time, the goal is to make it transparent, so in the end you just load a file, get an onprogress and an onload notification, and end up with a collection of HTML5 Audio objects.

It's a similar idea to audio spritesheets, but, unlike spritesheets, it doesn't take any special skill or audio editing software to create the files, and it's really easy to include in the deployment process of any site.

There's a python script that concatenates the sound files into a single file, and a JavaScript library that loads that file and uses Web Audio API to decode the different parts of that unified file, returning usable sound objects.

Structuring the file

So the file consists of a series of appended chunks with two fields for each file: 4 bytes specify the length, and the content of the file itself.

I'm aware 32 bits to specify the length might not be enough for some files lengths, but if you're trying to download several files weighting over 4GB, then you have a different kind of problem :D.

The python script gets all the files provided as arguments, and outputs the concatenated binary file.

Concatenation script

Python - run on terminal

python concat.py file1.mp3 file2.mp3 file3.mp3 > concatenated_file.mp3

Reading a chunk of the file

So let's load the file exactly as normal, with xmlHttpRequest, specifying that we want the result as an ArrayBuffer.

Loading the concatenated file as an Array Buffer

JavaScript - index.html

/*
  Uses xmlHttpRequest to asynchronously load
  the concatenated sound file as an array buffer
*/
var request = new XMLHttpRequest();

request.open( 'GET', 'concatenated_file.mp3', true );
request.responseType = 'arraybuffer';

request.onload = function() {
  processConcatenatedFile( request.response );
}

request.send();

After the file is received, the onload listener receives the data as an ArrayBuffer. We can then start parsing it. We'll use a pointer into the file, and move it as we read data. The first value to read is the 4 bytes that define the length. Then increment the pointer 4 positions, and read as many bytes as said length. Increment the pointer by that length, and we are at the beginning of the next chunk. If the pointer is the same as the length of the downloaded file, we are at the end of the file and there's nothing else to read.

The processConcatenatedFile() function create a DataView to easily access the ArrayBuffer. offset is our pointer into the file.

Locate buffers for each concatenated sound

JavaScript - index.html

/*
  Runs through the loaded array buffer and
  extract each individual chunk that contains
  each original sound file buffer.
*/
function processConcatenatedFile( data ) {
  var bb = new DataView( data );
  var offset = 0;
  
  while( offset < bb.byteLength ) {
    var length = bb.getUint32( offset, true );
    offset += 4;
    var sound = extractBuffer( data, offset, length );
    offset += length;
    createSoundWithBuffer( sound.buffer );
  }

}

The extractBuffer() function takes the src (our data buffer), an offset (the pointer that defines the position in our file) and length (the length of the current file we're reading). The function creates a new 8-bit unsigned int ArrayBuffer and copies the data in it. That's our original sound file, like if it had been loaded individually.

Extract sound buffer from concatenated buffer

JavaScript - index.html

/*
  Create a new buffer to store the compressed sound
  buffer from the concatenated buffer.
*/
function extractBuffer( src, offset, length ) {
  var dstU8 = new Uint8Array( length );
  var srcU8 = new Uint8Array( src, offset, length );
  dstU8.set( srcU8 );
  return dstU8;
}

And finally we do something cool with the sound with the function createSoundWithBuffer.

Create Audio Source from extracted buffer

JavaScript - index.html

/*
  Uses Web Audio API decodeAudioData() to decode
  the extracted buffer.
*/
function createSoundWithBuffer( buffer ) {
  /*
  This audio context is unprefixed!
  */
  var context = new AudioContext();
  var audioSource = context.createBufferSource();
  audioSource.connect( context.destination );
  
  context.decodeAudioData( buffer, function( res ) {
    audioSource.buffer = res;
    /*
      Do something with the sound, for instance, play it.
      Watch out: all the sounds will sound at the same time!
    */
      audioSource.noteOn( 0 );
  } );

}

This is the part that deals with Web Audio API. It's currently supported by Chrome, Safari and Firefox Nightly. The code here uses the version without vendor prefixes. You'll have to adapt the code to the user agent implementation. Also, remember that Chrome can read MP3 file, and Firefox can read OGG files.

Wrapping up

There's an obvious advantage when using this method: a lot less requests, which means faster downloads, and only using on download slot for sound.

Another advantage that we perceived, but don't have a lot of numbers to back it up, it's that enabling gzip compression on the server for a single concatenated file gets a bit better compression ratios than compressing each individual file.

There's also the possibility of concatenating sound in different files, acting as sound libraries, so you could, ideally, download packs of sounds as you needed them.

One disadvantage is that, although possible, streaming would require a more sofisticated library.

Support right now is for browsers with Typed Arrays and Web Audio:

Code and demos

Here's a small demo, improving a bit what's in this post. The code is uploaded to GitHub.

Comments