Upgrading STM32Cube USB Audio Class driver – More advanced playback

I got back to the audio part of the project for a post. I heard annoying pops and crackles during the playback, and after about 3-4 seconds, a complete distorted sound. I know that I left this part unfinished before.

At last, I routed my 1&2 channels (out of 4 channels) to the audio DAC of the Discovery board with a simple software loop that splits my audio buffer into two separate buffers after the samples came in. My audio buffer stored 16 packets, which means 8ms latency and about 6KByte buffer. The original Audio Class implementation uses 80 samples, and has some other strange thing that I mentioned earlier.

The crackles are caused by the different clock domain of the USB and the I2S clock. The ideal would be 1ms USB clock with 48 samples/ms. And a theorethical 48KHz clock for the DAC. But if you configure your I2S interface with STM32Cube MX for example, you can see that the actual clock is 47991 Hz. In practice it means, that USB write my audio buffer faster than I2S DMA reads it and pushes it to the DAC. In the original implementation, this difference was compensated after each DMA write, so each 40ms with using 1 sample less or 1 sample more from audio buffer, depending on the difference. I rewrote it more or less, but now I need a more accurate synchronization method.

First I reduced my sample buffer to 2 packets. Which means I’m going to work with 1ms audio latency!!! I read a packet and immediatelly send it out.

A little bit of math

To compensate the speed difference, I only can modify the I2S part. (I don’t want to implement a feedback endpoint which effects the host). I can send more samples to the DAC if I need to fill some extra time, or I can send less samples if I need to hurry.

The resolution is 1 sample (obviously can’t send half samples). For my 48K samples/sec, 1 sample means 20.833 us time. So I can compensate each 20-21 us clock drifts with sending 1 extra sample or 1 less. Or in other words if the difference changes more than 20us I need to send 1 sample less or 1 sample more to compensate.

I measured my USB Data_Out interrupts vs. DMA complete interrupts (When the data arrives to audio buffer, and when it’s written to DAC) The initial deltaT was 1576us and it was drifted 424 more by the end of the measurement. The initial 1576us latency is 1000us from the buffering of 1 packet, and another 576us probably for the STM32 to start the DAC playback function with some I2C commands. So actually my latency is 1,576ms.

I found that when playing audio from iPad air, the USB core misses the second audio packet from the host. And in that way, my circular buffer is not used with the right timings and makes a distorted sound. The reason is because when getting the very first audio packet, we prepare the endpoint for receiving after we issued the DAC start playback I2C command and DMA command. The I2C takes too long before the endpoint preparaton and we miss the second packet. So I moved the preparation before the DAC start, and that solved the issue. It’s only an issue when we first send out the data to the DAC as the I2C is only needed this time.

This is where the magic happens after 3700 ms. The purple signal passes by the edge of green signal. It means that USB overwrites the samples that are currently sent to the DAC. This is because it is faster.


I calculated, that the drift is around 110ns per miliseconds. So I only need to compensate with 1 sample at each 180ms roughly. Which I think is not a noticable thing. I will play 47995 samples/second instead of 48000. Of course these numbers depends on the USB Host and the Discovery board, so this is for my laptop. It can be different in another configuration, but the solution is universal.

The compensation method

It is very straightforward. I configured TIM6 for 1us resolution. I started a timer when I got a packet and stopped when I had written it to the DAC. (Actually I only doing this when the buffer is full, so in 2 packet intervals). My reference is the first measurement. That’s the initial phase offset between filling the buffer and reading it. If the difference between a current deltaT and the initial reference value is bigger than 20 than I compensate. In this way the phase for the next cycle will be close to the reference again.

Here is the compensation period. My calculations were be good. It fluctuates around 180ms.


Now I have the less latency possible, with the less compensation possible. Win-win! It sounds good now 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s