PlaySound

Software Channels

In audio mixers, each sound being played is generally assigned to a software channel, structures describing sound sources that are to be played. Here we'll just call a software channel a sound, and store it's attributes in a struct sound:


struct sound {
  unsigned int ID;
  Uint8 *data;
  Uint32 length;
  Sint32 pos;
  Uint32 loop;
  float vol;
  float pan;
};

Where ID is a number we can use to identify the sound while it's playing. The audio context structure that we pass in is


struct audioContext {
  SDL_AudioDeviceID dev;
  SDL_AudioSpec *spec;
  float* mixbuffer;
  struct sound *sound;
  int nSound;
  unsigned int IDhigh;
};

where there is now a separate buffer for mixing audio into, and an array of software channels for sounds to be placed in. Also we introduce IDhigh to store the highest given ID to the sounds being played. We need to change the audio callback to run through a list of software channels mixing in the sounds, and also to be able to expand mono sound sources (which is what most sound effects are) to stereo with panning.

Mixing Function

At this point we will separate the mixing into a separate function called from the callback, just to make the callback code smaller. We will mix from mono S16 sources to a stereo float format, so we'll call the function MixI1s16O2f32(struct sound *s, float *mixbuffer, int bSamples), I1s16 for mono s16 input, and O2f32 for stereo float output.


void MixI1s16O2f32(struct sound *s, float *mixbuffer, unsigned int bSamples) {
  if(!s || !s->data || !mixbuffer) return;

  Sint16 *audio = (Sint16*)s->data;

  unsigned int i = 0, llen = 0;
  int rBuffer, rAudio;
  float vol[2];

  do {
    rAudio = s->length/sizeof(Sint16) - s->pos; /* remaining audio samples for the src */
    rBuffer = bSamples - i; /* remaining buffer samples to be filled */
    llen += rAudio < rBuffer ? rAudio : rBuffer; /* how far into the buffer we copy data */

    /* calculate volume for each channel, normalizing samples between -1 and 1 */
    vol[0] = s->vol * ((1.1+cos(s->pan))/2.1) / 32767.0f;
    vol[1] = s->vol * ((1.1-cos(s->pan))/2.1) / 32767.0f;
    for(;i < llen-1; i+=2, s->pos++) {
      mixbuffer[i] += audio[s->pos] * vol[0];
      mixbuffer[i+1] += audio[s->pos] * vol[1];
    }

    if(i < bSamples) { /* if buffer is not filled yet */
      if(s->loop) { /* looping */
        s->pos = 0;
        if(s->loop>0) s->loop--;
      } else { /* release sound channel and end */
        s->data = NULL;
        i = bSamples;
      }
    }
  } while(i < bSamples); /* loop until buffer is filled */
  return;
}

Callback

The callback just looks through the array of software channels and passes any that have sound data to the mixing function, then mixes the mixbuffer to the output buffer with scaling.


/* Play sounds, using float mixing */
void MyAudioCallback(void* userdata, Uint8* stream, int len) {
  Sint16 *buffer;
  int i, bSamples, sSize;
  struct sound *s;

  struct audioContext *aCtx = userdata;
  if(!aCtx || !aCtx->spec || !aCtx->mixbuffer || !aCtx->sounds) return;

  buffer = (Sint16*)stream;
  sSize = 2; /* sample size in bytes */
  bSamples = len / sSize; /* number of samples in stream buffer */

  float *mixbuffer = aCtx->mixbuffer;
  SDL_memset(mixbuffer, 0, bSamples*4);

  /* mix sounds into mixbuffer */
  for(i = 0, s = aCtx->sound; i < aCtx->nSound; i++, s++) {
    if(!s->data) continue; /* empty sound channel */
    MixI1s16O2f32(s, mixbuffer, bSamples);
  }

  /* copy mixbuffer to the actual sound buffer */
  for(i = 0; i < bSamples; i++) buffer[i] = mixbuffer[i] * 32767.0f / 5.0f;

  return;
}

More explanations

The looping through software channels is done with


  /* mix sounds into mixbuffer */
  for(i = 0, s = aCtx->sounds; i < aCtx->nSounds; i++, s++) {
    if(!s->data) continue; /* empty sound channel */
    MixI1s16O2f32(s, mixbuffer, bSamples);
  }

Where we just use NULL values for the data pointer to indicate a software channel is free. Expansion to stereo for mono sounds is


    /* calculate volume for each channel, normalizing samples between -1 and 1 */
    vol[0] = s->vol * ((1.1+cos(s->pan))/2.1) / 32767.0f;
    vol[1] = s->vol * ((1.1-cos(s->pan))/2.1) / 32767.0f;
    for(;i < llen-1; i+=2, s->pos++) {
      mixbuffer[i] += audio[s->pos] * vol[0];
      mixbuffer[i+1] += audio[s->pos] * vol[1];
    }

The panning here is very simple, but you shouldn't really need more complexity than that for now. It uses the angle stored in s->pan to multiply the volume between 0.1 and 1.0 for each ear. It could scale to 0 when the sound is on the opposite side of the head, but I don't like having audio in just one ear.
When a sound reaches the end and it isn't looping, it is stopped by setting it's data pointer to NULL:


    if(i < bSamples) { /* if buffer is not filled yet */
      if(s->loop) { /* looping */
        s->pos = 0;
        if(s->loop>0) s->loop--;
      } else { /* release sound channel and end */
        s->data = NULL;
        i = bSamples;
      }
    }

The callback doesn't free the sound, that's up to the whoever added it. Finally the mixbuffer is scaled by a small number to prevent clipping when playing back multiple sounds, and that value is scaled back to the range of an s16 sample:


  for(i = 0; i < bSamples; i++) buffer[i] = mixbuffer[i] / 5.0f * 32767.0f;

Here's the PlaySound function:


/* Play mono sound. Return ID on success, 0 on fail */
int PlaySound(struct audioContext *aCtx, struct sound* s, int loop, int pos, float vol, float pan) {
  if(!aCtx || !aCtx->dev || !aCtx->sounds || !s) return 0;

  int i, ret = 0;
  /* clamp vol to range [0,1] */
  if(vol > 1) vol = 1;
  if(vol < 0) vol = 0;
  SDL_LockAudioDevice(aCtx->dev); /* lock so another thread doesn't add a sound in the same slot */
  for(i = 0; i < aCtx->nSounds; i++) /* search for empty sound channel */
    if(!aCtx->sounds[i].data)
      break;
  if(i < aCtx->nSounds) {
    ret = aCtx->IDhigh++;
    aCtx->sounds[i].ID = ret;
    aCtx->sounds[i].data = s->data;
    aCtx->sounds[i].length = s->length;
    aCtx->sounds[i].pos = pos;
    aCtx->sounds[i].loop = loop;
    aCtx->sounds[i].pan = pan;
    aCtx->sounds[i].vol = vol;
  }
  SDL_UnlockAudioDevice(aCtx->dev);

  return ret;
}

After checking for errors in the parameters it runs through the channels of the audio context looking for an empty slot. You need to lock the audio device while looking at the data if you are calling the PlaySound function from multiple threads. If you were only going to call it from one thread then you wouldn't need to lock while reading as the callback will only set the data values to NULL. An ID is assigned to each sound played, which is just to allow a calling function to test whether the sound has finished or to stop the sound sometime later on, using a function similar to PlaySound.

Exercises

When you move around in games, the sound effects can change both volume and direction (panning). Write another mixing function that will change these smoothly, if the current and next values for the pan and volume are stored in the sound structure. (Related to the volume and frequency exercises in the sine examples)
How many sounds can you play simultaneously before your callback falls behind? Don't bother with the output, you might get arithmetic overflow in the mixing buffer values and it will sound bad, just have it write the output to a dummy buffer and fill the real buffer with a sine wave, so you'll hear when it falls behind.
Open a game or something that will load your system with it's own audio, physics and logic and see how many sounds you can play now. What do you think the ratio is between time spent processing audio and other elements?
The pos variable is a signed integer. Doing this allows a negative pos to be passed to playsound. Modify the mixing function to treat a negative pos as extra silence at the beginning and skip until pos == 0. How could you add a simple echo for the sounds using this?
Using the returned ID values, how would you pause and resume the sounds? How would you stop them? How about stopping and then deallocating their memory? Work on a looping sound.
Our brains use a mixture of the phase difference and loudness difference between our ears to determine the direction of noises. At lower frequencies the phase is more important, and at higher frequencies the loudness is all that really matters.
If your sounds are moslty low frequencies, say < 500 Hz, then panning, although effective, might not sound as good as a combination of panning and phase difference could. What would implementing the phase difference involve, and do you think it's worth it?

Next:
Prev: Playing a WAV File
Up: SDL2 Audio

Uses javascript SyntaxHighlighter for syntax highlighting.
Uses javascript MathJax for LateX rendering.

To the extent possible under law, Craig Hughes has waived all copyright and related or neighboring rights to the text and source code in this example/tutorial.