In my last post, I have a brief overview of the Arduino Audio library for the Teensy and M4 microcontrollers and we looked at some good starting points for each type of component in the library. In this post, I'll write some code to parse header files and extract the class and function information. We'll need this information in order to automatically generate code that uses the Audio library API.

The first thing we need to get working is the controller for the SGTL5000 audio codec as this is what allows us to have an audio input and output. To get started with the component, let's look at an example that is included from the library for the good old tone sweep synthesizer. This is just about as simple an example as you are likely to find:

#include <Audio.h>

AudioSynthToneSweep sweep;
AudioOutputI2S      i2s;

AudioConnection c1(sweep, 0, i2s, 0);
AudioConnection c2(sweep, 0, i2s, 1);

AudioControlSGTL5000 codec;

void setup(void)
{
  AudioMemory(2);

  codec.enable();
  codec.volume(0.5);

  sweep.play(0.8, 10, 22000, 10);
}

void loop(void)
{
}

Let's concentrate on just the SGTL5000 controller. It has two functions, enable() and volume(float). So we'll want to make sure that our header parsing code gets at least these two functions. We'll define that as success and any additional API that we pull out is just a bonus.

I wrote the simplest parser I could manage that did the job. Perhaps you could simplify it even more, dear reader! Check it out:

//
//  HppParser.swift
//  
//
//  Created by Dr. Brandon Wiley on 4/13/23.
//

import Foundation

import Text

public class HppParser: Parser
{
    public required init()
    {
    }

    public func findImports(_ source: String) throws -> [String]
    {
        let regex = try Regex("#include [<\"][A-Za-z0-9]+[>\"]")
        return source.ranges(of: regex).map
        {
            range in

            let substring = source[range].split(separator: " ")[1]
            return String(substring)
        }
    }

    public func findClassName(_ sourceURL: URL, _ source: String) throws -> String
    {
        let text = Text(fromUTF8String: source)

        let classLine = try text.substringRegex(try Regex("class [A-Za-z0-9]+"))
        let (_, className) = try classLine.splitOn(" ") // Discard the "class " part
        return className.toUTF8String()
    }

    public func findFunctions(_ source: String) throws -> [Function]
    {
        let mtext = MutableText(fromUTF8String: source)

        try mtext.becomeSplitOnTail("public:") // We want only the part after "public:"
        try? mtext.becomeSplitOnHead("private:") // If there is a private: section, trim it off.
        try? mtext.becomeSplitOnHead("protected:") // If there is a protected: section, trim it off.

        let publicSource: Text = mtext.toText()

        let regex = try Regex("^[ \\t]*[A-Za-z0-9_ ]+ [A-Za-z0-9_]+\\(.+\\)[ \\t]*[;{](//.*)?$")
        let lines = publicSource.split("\n")
        let goodLines = lines.filter
        {
            line in

            line.containsRegex(regex)
        }

        let functions: [Function] = goodLines.compactMap
        {
            functionText in

            if functionText.containsSubstring("__attribute__")
            {
                // Not actually a functiony
                return nil
            }

            if functionText.containsSubstring("*")
            {
                // No pointers allowed
                return nil
            }

            do
            {
                let name = try self.findFunctionName(functionText)
                let parameters = try self.findParameters(functionText)
                let returnType = try self.findFunctionReturnType(functionText)
                return Function(name: name, parameters: parameters, returnType: returnType, throwing: false)
            }
            catch
            {
                return nil
            }
        }

        var seenEnums: Set<String> = Set<String>()
        let uniqueFunctions: [Function] = functions.filter
        {
            function in

            let enumName = function.name.capitalized
            let seen = seenEnums.contains(enumName)
            seenEnums.insert(enumName)
            return !seen
        }

        return uniqueFunctions
    }

    func findFunctionName(_ function: Text) throws -> String
    {
        let mtext: MutableText = MutableText(fromText: function)
        try mtext.becomeSplitOnHead("(") // Left of the (
        try mtext.becomeSplitOnLastTail(" ") // Right of the space
        return mtext.toUTF8String()
    }

    func findParameters(_ function: Text) throws -> [FunctionParameter]
    {
        let mtext: MutableText = MutableText(fromText: function)
        try mtext.becomeSplitOnTail("(") // Right of the (
        try mtext.becomeSplitOnHead(")") // Left of the )

        if mtext.isEmpty()
        {
            return []
        }

        return mtext.split(", ").compactMap
        {
            part in

            do
            {
                let (type, name) = try part.splitOnLast(" ") // We must split on the last space for types such as "unsigned int"

                let mtype = MutableText(fromText: type)
                let mname = MutableText(fromText: name)

                if mname.startsWith("*")
                {
                    mtype.becomeAppended("*")
                    try mname.becomeDropFirst()
                }

                return FunctionParameter(name: mname.toUTF8String(), type: mtype.toUTF8String())
            }
            catch
            {
                return nil
            }
        }
    }

    func findFunctionReturnType(_ function: Text) throws -> String?
    {
        let mtext: MutableText = MutableText(fromText: function)
        try mtext.becomeSplitOnHead("(")

        let (type, name) = try mtext.splitOnLast(" ")
        let mtype: MutableText = MutableText(fromText: type)
        mtype.becomeTrimmed()

        if name.startsWith("*")
        {
            mtype.becomeAppended("*")
        }

        guard mtype != "void" else // A void return type in C++ means the function does not return anything, which we signify here with nil.
        {
            return nil
        }

        if mtype.containsSubstring("virtual")
        {
            throw HppParserError.noVirtualFunctionsAllowed
        }

        return mtype.toUTF8String()
    }
}

public enum HppParserError: Error
{
    case noVirtualFunctionsAllowed
}

I'm not going to walk you through the whole thing, but it's quite small, I would say, less than 200 lines. It extracts the class name, and all the function information such as the function name, parameter names and types, and return type. This code uses my Text library, which makes it much less verbose than using Swift Strings. I also use regular expressions, but with great parsimony and restraint. There is a famous quote:

💡
"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." - Jamie Zawinski

Regular expressions are a mini programming language. They are strictly less powerful than Swift code, but just as likely to cause bugs. I think they are handy for processing text, but you must use them as little as possible, only for matching text patterns. If you try to write your whole parser as one big regular expression, well... don't. Instead, I do some text pre-processing to make my regular expresions easier, match the lines I want to make my text processing easier, and then finish up with some more text processing. The reason to use regular expressions is because the equivalent Swift text processing code would be more complicated. Let's look at the hairest, most awful one of the bunch, for finding function declarations:

^[ \\t]*[A-Za-z0-9_ ]+ [A-Za-z0-9_]+\\(.+\\)[ \\t]*[;{](//.*)?$

This is messy because so are C++ function declarations found in header files. First you might have some various whitespace, then you have the return type, which unlike many languages can also contain spaces. Next you have the function name, and then the parameters in parentheses. It ends with a semicolon, or perhaps an open curly brace (and maybe some whitespace before that). One of the functions we want also has an inline comment at the end, so we need to handle that too.

Now if you look at my beautiful, single line regular expression, you might think about all of the possible C++ function declarations that will break it. There is no need to do this because we don't care. I have checked it against all of the functions in the SGTL5000 controller header file until it matched all of the functions I wanted and none of the functions I didn't want. It took a lot of careful tweaking from my first conception of what might work to something that actually worked in all of the relevant cases. So we're done! When we start to parse other header files, this is definitely going to break, and then we'll tweak it a bit more. We can do it like this, or we can write a C++ parser, but then we'd be writing a C++ parser and not making any audio play, so I would classify that as a failure.

I think it's worth some time to talk a little bit about some other small tweaks in this code. There are some places where I split the text on a space, and this didn't work at first because C++ types can have spaces in them (for instance, "unsigned int") so I had to modify the Text library to let you split on the last instance of a substring, as well as the usual implementation of splitting on the first instance. I also throw out all parameter and return types containing a "*" because those are pointers and I don't allow any pointer types in the API.

You may be thinking, "What the heck?!? Why no pointer types?" This is a great question. Well, I don't want to foreshadow too much of where we're going with this, because it might seem unreasonable! My motivation is that I would like to enforce a style of programming that I'm working on which I call "effective programming" and which you might also name "effect-oriented programming". In order to get there, we need to provide an API that only deals with immutable types for both parameters and return types. In C++, pointer types are mutable, so they are banned. Fortunately, you don't really need them for the Audio library. It mostly deals with simple value types such as ints and floats. We'll worry about pointer types in the API when we get to them, but for the SGTL5000 controller class there is no need.

At the end of this extraction process we will have extracted the class name and all the function information. From there we can start generating code that uses this class and these functions. If you would like a fun challenge, you can try simplifying my regular expressions and testing them on the SGTL5000 controller header file to see if you can maintain correctness while improving clarity.

Code Generation for Arduino Audio - Extracting the Audio API from the Source Code