In my last post, we ran the Clockwork code generation tool to generate C++ code to wrap components of the Arduino Audio library in a Smalltalk-like dynamic message dispatch system. That was neat, but the value of this approach has yet to be generated. In this post, I'll show you how to write a minimalist parser for a tiny Smalltalk-like language that we can compile down to C++ code that uses our message dispatch system.

Smalltalk is famous for its minimal syntax, which fits on a postcard. It is like the lisp of objects. I am particularly discouraged by the activity of writing parsers. I just hate it! If I see a BNF grammar, I suddenly remember how the world outside of the computer is full of blue skies and little birds chirping in the sunshine. So I am going to do some cheats to making writing the parser so extremely easy that even I will not get the overwhelming desire to touch grass. Also, I should mention that I absolutely detest nested delimiters such as parentheses, so no, we won't just be using lisp syntax!

Here is an example program:

codec : Audio:Control:SGTL5000
tonesweep : Audio:Synth:ToneSweep 
i2s : Audio:Output:I2S

tonesweep -> i2s

setup : codec .volume 0.5 0.5 . tonesweep .play 0.8 256 512 10

There are three sections. The first is for declarations. We only declare effects. We need this because we can have multiple instance of each effect. For instance, we could have two tone sweeps going into different outputs. Since there are multiple instances, we need to name them. Please note that these are not variables, we are merely naming instances.

The second section is for declaring data flows. This is primarily what you do when you use the Teensy graphical configuration tool. I really don't like the programming interfaces where you have a 2D canvas of boxes and drag lines between them, as seen in the Teensy software, PureData, Max, and Resolume Wire. I don't like it because the 2D canvas metaphor contains no meaningful program information. Where you place things in space is purely aesthetic. All that matters is the actual graph structure, which you can specify just fine in symbolic language. So I opted to just do this in text with some ASCII arrows. We use the named declared in the previous section in order to specify which instances we want to connect.

The third section is the one most like traditional Smalltalk. Our system is going to be event-based (because we are exploring Effective Programming ideas). So in this section we specify event handlers. An event handler takes a code block, which consists of one or more complete code sentences separated by " . " and each consisting of a subject (the to which we should send the message), a verb (the message to send), and some arguments to the message. I am using different terms here to denote the difference between syntactic and semantic constructions. The subject is just the first thing in the sentence. It will always be an object of some sort. We also always need at least one verb as that is how we send a message and all work gets done in this system by sending messages. Arguments are optional, and depend on the message. The sentence-ending " . " is only a separator and not needed on the last sentence of an event-handler.

You might also have noticed that our verbs are of the form ".volume" and ".play". Why the prepended dot? This is so that we can syntactically tell verbs apart from arguments in a position-independent manner, which will allow use to chain together verbs in a sentence. Here are some examples:

event1 : subject1 .verb1 argument1 argument2
event2 : subject2 .verb1 .verb2 argument1
event3 : subject3 .verb1 argument1 .verb2 argument2

In event1, there is one message, subject1.verb1 and it takes two arguments, argument1 and argument2.

In event2, there are two messages. The first messages is subject2.verb1. The result of that message, let's call it "result1", becomes the target for the second message, result1.verb2, and it takes one argument, argument1. Finally, event three has two messages, subject3.verb1 with argument1 and the result of that is the target of the second message, result1.verb2 with argument2.

So that's the syntax! Here's the parser:

import Foundation

import Text

public class BellParser
{
    public required init()
    {
    }

    public func generateBellProgram(url: URL) throws -> BellProgram
    {
        let bellSource = try String(contentsOfFile: url.path)
        let source = Text(fromUTF8String: bellSource)
        return try self.generateBellProgram(source: source)
    }

    public func generateBellProgram(source: Text) throws -> BellProgram
    {
        let instances = try self.findModuleInstances(source)
        let flow2s = try self.findFlow2s(instances, source)
        let handlers = try self.findHandlers(instances, source)

        return BellProgram(instances: instances, flow2s: flow2s, handlers: handlers)
    }

    public func findClassName(_ sourceURL: URL, _ source: String) throws -> String
    {
        return sourceURL.deletingPathExtension().lastPathComponent
    }

    public func findModuleInstances(_ source: Text) throws -> [ModuleInstance]
    {
        let lines = source.split("\n").filter
        {
            line in

            return line.containsSubstring(" : ")
        }

        let parts = lines.map
        {
            line in

            return line.split(" : ")
        }

        let goodParts = parts.filter
        {
            parts in

            return parts.count == 2 && !parts[1].containsSubstring(" ")
        }

        let pairs = goodParts.compactMap
        {
            parts in

            let left = parts[0]
            let right = parts[1]

            return (left, right)
        }

        let results = pairs.compactMap
        {
            (instanceName, moduleName) in

            return ModuleInstance(module: moduleName, instanceName: instanceName)
        }

        return results
    }

    public func findFlow2s(_ instances: [ModuleInstance], _ source: Text) throws -> [Flow2]
    {
        let lines = source.split("\n").filter
        {
            line in

            return line.containsSubstring(" -> ")
        }

        let parts = lines.map
        {
            line in

            return line.split(" -> ")
        }

        let goodParts = parts.filter
        {
            parts in

            return parts.count == 2
        }

        return try goodParts.compactMap
        {
            parts in

            guard let source = parts.first else
            {
                throw BellParserError.noPart
            }

            guard let output = parts.last else
            {
                throw BellParserError.noPart
            }

            return Flow2(moduleA: source, moduleB: output)
        }
    }

    public func findHandlers(_ instances: [ModuleInstance], _ source: Text) throws -> [EventHandler]
    {
        let lines = source.split("\n").filter
        {
            line in

            return line.containsSubstring(" : ")
        }

        let pairs = try lines.map
        {
            line in

            return try line.splitOn(" : ")
        }

        return pairs.compactMap
        {
            (event, blockText) in

            guard let block = parseBlock(instances, blockText) else
            {
                return nil
            }

            return EventHandler(eventName: event, block: block)
        }
    }

    func parseBlock(_ instances: [ModuleInstance], _ text: Text) -> Block?
    {
        if text.containsSubstring(" . ")
        {
            let parts = text.split(" . ")
            let sentences = parts.compactMap { parseSentence(instances, $0) }
            return Block(sentences)
        }
        else
        {
            guard let sentence = parseSentence(instances, text) else
            {
                return nil
            }

            return Block([sentence])
        }
    }

    public func parseSentence(_ instances: [ModuleInstance], _ text: Text) -> Sentence?
    {
        let parts = text.split(" ")
        guard parts.count >= 2 else
        {
            return nil
        }

        let subjectText = parts[0]
        let verbText = parts[1]

        let argumentsText: [Text]
        if parts.count > 2
        {
            argumentsText = [Text](parts[2...])
        }
        else
        {
            argumentsText = []
        }

        let maybeInstance = instances.first
        {
            instance in

            return instance.instanceName == subjectText
        }

        guard let subject = maybeInstance else
        {
            return nil
        }

        let verb = Verb(name: verbText)
        let arguments: [Argument] = argumentsText.compactMap
        {
            (argumentText: Text) -> Argument? in

            if argumentText.startsWith(":")
            {
                return Argument.name(argumentText)
            }
            else
            {
                do
                {
                    let floatRegex = try Regex("^[0-9]+\\.[0-9]+$")
                    let integerRegex = try Regex("^[0-9]+$")

                    if argumentText.containsRegex(floatRegex)
                    {
                        return Argument.literal(.float(Double(string: argumentText.toUTF8String())))
                    }
                    else if argumentText.containsRegex(integerRegex)
                    {
                        guard let int = Int64(argumentText.toUTF8String()) else
                        {
                            return nil
                        }

                        return Argument.literal(.int(int))
                    }
                    else
                    {
                        return nil
                    }
                }
                catch
                {
                    return nil
                }
            }
        }

        return Sentence(subject: subject, verb: verb, arguments: arguments)
    }
}

public enum BellParserError: Error
{
    case noPart
    case wrongType
}

It's pretty small, less than 300 lines and there is no BNF or parser generator in sight. This grammar is probably a little bit too simple for all the code we're going to want to write eventually, but it's sufficient to write our tone sweep example, so we can move on to writing a code generator that compiles this little language down to C++, which is what we'll do in the next post.

Code Generation for Arduino Audio - Writing a Minimalist Smalltalk Parser