I’ve translated the reference C implementation of the QOA format to Jai.

The project page will kept up to date as it changes over time, so for this post I wanted to add some context and thoughts on how it went and where I’d like to go with it.

Motivation #

I was inspired to translate the qoa reference implementation to Jai by Raylib and MoonWorks. Both frameworks support the qoa format as audio sources (both using the reference C implementation).

Because I am interested in adding compressed audio support to wpak, and I am satisfied with the quality of Qoa; I thought translating the Qoa reference C implementation to Jai would be interesting and useful project while maintaining a single language implementation for the wpak module and cli tool.

Process #

The simplicity and conciseness of the reference implementation was encouraging, and I’m happy it was as straight forward as I figured it would be. Jai and C are quite different, but C has some universality to it that makes things almost automatic.

The process was essentially to copy each part of qoa.h and line by line convert it to valid Jai.

However this was not quite enough!

For example, during the translation of the decoder the first attempts at decoding were absolutely smashed and only a narrow band of frequencies in the middle were being reproduced. Well… it turns out that even though the LMS state is stored as 16-bit, when you look at qoa.h almost everything is using int and thereby actually working with 32-bits during the decode process. Because Jai is a lot more typesafe than C, I had to have two structs for the LMS state, one for the file encoded struct, and another for the encoding/decoding state that was 32-bits.

Another example was that in the reference implementation qoa_clamp_s16 is defined as an optimization which I also brought over. But what’s quite interesting is that in Jai by default casts are checked by default so that if you’re casting to a lower bit-depth it’ll assert if the cast would truncate the value. So for that particular case I had to use cast,no_check(...) and it’s all good. But why this is interesting is because how infrequently it occurred. It definitely was dependent on the actual content being encoded or decoded.

Aside from small things like this, I decided to approach the iteration of samples, channels, and slices somewhat differently. In the C implementation the iteration follows a pattern of "sample group" -> "channel" -> "slice", but in my implementation I take the approach "slice" -> "sample/residual" and keep track of the channel per-slice (current_channel = (current_channel+1) % header.channel_count). I did this mainly because of the nice type-safety provided by Jai array views and the C implementation is largely managing iteration over a big char* buffer.

Helpful tooling #

During the translation, I benefitted a lot from having ImHex open with two files. On the left was a qoa file generated by the reference qoaconv and on the right was a qoa file generated by my Jai qoaconv.

imhex-screenshot

There are many integrated hex editor options out there, but I’ve found that ImHex is quite nice when you take advantage of it’s pattern language. It definitely helped me understand some bugs as I worked through the translation.

Here is the pattern I used:

#pragma author Chip Collier
#pragma description Quite OK Audio Format (QOA)
#pragma MIME audio/qoa
#pragma endian big

import std.mem;

// QOA File Header
struct QOAFileHeader {
    char magic[4] [[comment("Magic bytes 'qoaf'"), name("QOA Magic")]];
    u32 samples [[comment("Total samples per channel in file"), name("Total Samples")]];
};

// QOA Frame Header
struct QOAFrameHeader {
    u8 numChannels [[comment("Number of audio channels"), name("Channels")]];
    u24 sampleRate [[comment("Sample rate in Hz"), name("Sample Rate")]];
    u16 frameSamples [[comment("Samples per channel in this frame"), name("Frame Samples")]];
    u16 frameSize [[comment("Frame size including header"), name("Frame Size")]];
};

// LMS (Least Mean Squares) State per channel
struct QOALMSState {
    s16 history[4] [[comment("LMS history (most recent last)"), name("LMS History")]];
    s16 weights[4] [[comment("LMS weights (most recent last)"), name("LMS Weights")]];
};

// If you use this struct instead of `u64` then individual slices will be colored.
struct QOASliceRaw {
    u64 slice [[comment("64-bit slice: scalefactor + 20 residuals"), name("Slice Data")]];
};

// QOA Frame
struct QOAFrame {
    QOAFrameHeader header;

    // LMS state for each channel
    QOALMSState lmsState[header.numChannels] [[comment("LMS state per channel")]];

    // Calculate number of slices needed
    // Each slice contains 20 samples, so we need ceil(frameSamples / 20) slices per channel
    u32 slicesPerChannel = (header.frameSamples + 19) / 20;

    // Slices organized as [slice][channel]
    // Using raw bytes for simplicity - you can switch to QOASlice for detailed view
    u64 slices[slicesPerChannel * header.numChannels] [[comment("Audio data slices")]];
};

// Main QOA File Structure
struct QOAFile {
    QOAFileHeader fileHeader;

    // Frames continue until end of file
    QOAFrame frames[while(!std::mem::eof())] [[comment("QOA audio frames")]];
};

// Parse the QOA file starting at offset 0
QOAFile qoaFile @ 0x00;

Ideas for the future #

Integrating with Sound_Player #

The Sound_Player module that ships with Jai supports: Wav (both uncompressed PCM, and compressed ADPCM), and Ogg. I think it would be nice to add Qoa as another supported compressed audio format. For now you can decode qoa data into a Sound_Data struct and use it as an uncompressed source.

Fun fact
Compressed audio data is decoded on a separate thread when it’s attached to a Sound_Stream and that Sound_Stream can be intialized for a compressed audio source and given a sample based start and end point just as you can do with uncompressed audio.

API review #

Now that I have something that functions as intended, it’s the right time to dig deeper into even more of the features that Jai can provide for type safety. I already use array views extensively and they’re great, but what else am I possibly missing out on?

And because I’d like to add Qoa to the Sound_Player module it’ll be important to support seeking in the decoder. Some consideration and planning over the API will go a long way here. It’s easy enough to determine the number of frames, and even get the frame number for a given sample, but what’s missing is the API to decode a single frame.

Am I sure it’s correct? #

Possibly not! I haven’t set up a corpus of test data and methodically proven the boundaries of the implementation.

But I feel satisfied with the state of things at this point because the decoder is passing my ear test and encoded files have matching SHA256 sums between the Jai and C implementation.

That last bit gives me a lot of confidence but we’ll see if it lasts. :D