Changing the Sampling Rate of a Sampled Sound

Let's say you have a sound recorded at 44.1 kHz, and an application requires the audio to be 48 kHz, or 22.05 kHz. What you need to do, is create new sample points from the original data, but with the correct spacing in time for the required sampling frequency. These new sample points might lie on an old sample point, in which case that is the value they will take, but if the sampling calls for a sample at say, position 12.7, you have to work out what value the sound should take at that position. You could do something like fourier transform a section or the whole sound to find the underlying frequencies present and then calculate a scaled inverse transform (apparently Bluestein's algorithm can do that?), or use a windowed sinc funcion. Those are quite expensive, so instead of that, let's look at linear interpolation and cubic interpolation, which perhaps have worse sounding results, but are much faster to calculate.

Linear Interpolation

Linear interpolation is simply a weighted average of the samples on either side of a point. If the most recent sample index is \(i\) and the offset from this sample is \( \delta \) then linear interpolation would be

\[ y_j = x_i + \delta (x_{i+1} - x_i) \]

or in C


int lerp(float *in, float *out, int N, float step) {
  if(!in || !out || N <= 1 || step < 0) return -1;
  int i = 0, j = 0, k;
  float delta = 0;
  for(; i < N-1; j++) {
    out[j] = in[i] + delta * (in[i+1] - in[i]);
    delta += step;
    k = floor(delta);
    i += k;
    delta -= k;
  }
  for(; i < N; j++) {
    out[j] = in[i] * (1 - delta);
    delta += step;
    k = floor(delta);
    i += k;
    delta -= k;
  }
  return 0;
}

Where "lerp" is apparently the standard abbreviation for "linear interpolation". N is the number of input samples, so the caller would have to ensure that floor((M-1)*step)+1 < N where M is the number of output samples in out, or alternatively M = floor(N/step). The step size is determined by the sampling rates that we are resampling between, e.g. upsampling 44.1 kHz to 48 kHz, 44.1/48 gives a step size of 0.91875, whereas downsampling to 8 kHz would be 44.1/8 = 5.5125 samples per step. If a point landed just past the end of the input data, between indices N-1 and N, It is as if the next sample were 0.

Cubic Interpolation

Linear interpolation is almost the simplest interpolation (nearest neighbor and most recent value are the simplest), and it is ok for downsampling, where a lot of information is lost, but for upsampling it can be problematic, because it tends to make straight lines where the waveform should be curved. A Better approximation for this would be cubic interpolation, where the waveform between samples is described as a cubic polynomial

\[ f(x) = ax^3 + bx^2 + cx + d \]

The first derviative of this is

\[ f'(x) = 3ax^2 + 2bx + c \]

If we assume 2 samples we are interpolating between are placed at positions 0 and 1, and we take the values of the cubic and it's first derivative, we arrive at the relations

\[ f(0) = d = x_i \] \[ f(1) = a + b + c + d = x_{i+1} \] \[ f'(0) = c \] \[ f'(1) = 3a + 2b + c \]

Where the delta value determines the position between 0 and 1. With some more rearranging, and assuming the slopes at the 2 samples we are interpolating between are simply the slopes between the neighboring samples on either side

\[ d = x_i \] \[ c = (x_{i+1} - x_{i-1}) / 2 \] \[ a = 2d - 2x_{i+1} + c + (x_{i+2} - x_i) / 2 \] \[ b = ((x_{i+2} - x_i) / 2 - c - 3a) / 2 \]

With this we can create an example in C


int cerp(float *in, float *out, int N, float step) {
  if(!in || !out || N <= 1 || step < 0) return -1;
  int i = 0, j = 1, k;
  float a, b, c, d;
  out[0] = in[0];
  float delta = step;
  k = floor(delta);
  i += k;
  delta -= k;
  for(; !i; j++) { /* between first and second samples */
    d = in[i];
    c = (in[i+1])/2;
    a = 2*d - 2*in[i+1] + c + (in[i+2] - in[i])/2;
    b = 0.5 * ((in[i+2] - in[i])/2 - c - 3*a);
    out[j] = ((delta * a + b) * delta + c) * delta + d;
    delta += step;
    k = floor(delta);
    i += k;
    delta -= k;
  }
  for(; i < N-2; j++) { /* other samples */
    d = in[i];
    c = (in[i+1] - in[i-1])/2;
    a = 2*d - 2*in[i+1] + c + (in[i+2] - in[i])/2;
    b = 0.5 * ((in[i+2] - in[i])/2 - c - 3*a);
    out[j] = ((delta * a + b) * delta + c) * delta + d;
    delta += step;
    k = floor(delta);
    i += k;
    delta -= k;
  }
  for(; i < N-1; j++) { /* between last 2 samples */
    d = in[i];
    c = (in[i+1] - in[i-1])/2;
    a = 2*d - 2*in[i+1] + c + (-in[i])/2;
    b = 0.5 * ((-in[i])/2 - c - 3*a);
    out[j] = ((delta * a + b) * delta + c) * delta + d;
    delta += step;
    k = floor(delta);
    i += k;
    delta -= k;
  }
  for(; i < N; j++) { /* between last sample and 0 */
    d = in[i];
    c = (-in[i-1])/2;
    a = 2*d + c + (-in[i])/2;
    b = 0.5 * ((-in[i])/2 - c - 3*a);
    out[j] = ((delta * a + b) * delta + c) * delta + d;
    delta += step;
    k = floor(delta);
    i += k;
    delta -= k;
  }
  return 0;
}

If you were mostly upsampling using this method of interpolation, you could set k=1 initially and then check if the coefficitients needed to be recalculated with if(k) each loop. This method is made more complicated because of the need to treat both the beginning and 2 end cases specially, on top of the extra calculations.

References

Uses javascript SyntaxHighlighter for syntax highlighting.
Uses javascript MathJax for LateX rendering.

To the extent possible under law, Craig Hughes has waived all copyright and related or neighboring rights to the text and source code in this example/tutorial.