Day 5: KDLing up with some feeds (gone wrong!)

Welp, you win some, you lose some!

Today was probably the longest individual session of working on Séance since the start of my December Adventure. As expected, I spent the time implementing the parser for the config format I described yesterday. Initially, I tried do to this directly using the kdl crate. This would have worked, but the API exposed by that crate is a lot closer to the DOM than, say, a JSON parser. I have clearly been spoiled by Serde. I found myself repeatedly typing out messages like “expected node <x> to exist here”, which felt automatable.

In addition to the kdl crate, kdl.dev mentions knus, which exposes a similar derive-based API to Serde. This looked great! So I started switching to that instead.

#[derive(knus::Decode, Debug, PartialEq, Eq)]
struct Config {
    #[knus(child)]
    defaults: Defaults,

    #[knus(children(name="feed"))]
    feeds: Vec<Feed>
}

#[derive(knus::Decode, Debug, PartialEq, Eq)]
struct Defaults {
    #[knus(child)]
    keep: Keep
}

#[derive(knus::Decode, Debug, PartialEq, Eq)]
struct Feed {
    #[knus(argument)]
    url: String,

    #[knus(child)]
    keep: Option<Keep>
}

#[derive(Debug, PartialEq, Eq)]
enum Keep {
    All,
    Last(u64)
}

So far, so good. There was more cognitive overhead here than I am used to with config parsing; unlike JSON, KDL has multiple types of value hierarchies. You can store data in attributes, or positional arguments passed to the node, or as the node’s children. This means you have to specify where each value in a struct is supposed to come from (see eg. #[knus(argument)]).

The first big road block was parsing Keep. I want people to be able to use either a single argument or key-value pair for keep, like this:

keep all
keep last=3

but as far as I could tell, the derive macro wasn’t able to generate code for this automatically. No problem! I’ll just implement knus::Decode by hand, how hard can it be?

How hard it can be

impl<S> knus::Decode<S> for Keep 
    where S: ErrorSpan {
    fn decode_node(node: &knus::ast::SpannedNode<S>, ctx: &mut knus::decode::Context<S>)
        -> std::result::Result<Self, knus::errors::DecodeError<S>> {

        if let Some(arg) = node.arguments.first() {
            if let Some(arg2) = node.arguments.get(2) {
                return Err(DecodeError::unexpected(&arg2.literal, "argument", "only one argument allowed for keep"));
            }

            if let Some((prop, _value)) = node.properties.first_key_value() {
                return Err(DecodeError::unexpected(prop, "property", "keep only accepts arguments or properties, not both"));
            }

            match &*arg.literal {
                Literal::String(s) if &**s == "all" => Ok(Keep::All),
                _ => Err(DecodeError::unexpected(&arg.literal, "argument", "all is the only valid argument for keep"))
            }
        } else {
            if let Some(last) = node.properties.get("last") {
                for key in node.properties.keys() {
                    if &***key != "last" {
                        return Err(DecodeError::unexpected(&key, "property", "keep only accepts one property at a time"));
                    }
                }

                u64::decode(last, ctx).map(Keep::Last)
            } else {
                Err(DecodeError::missing(node, "either all or last=<entry-count> required for keep"))
            }
        }
    }
}

Harder than I expected, at least. I spent longer than I’d like to admit floundering at figuring out how to interact with a SpannedNode. I eventually noticed that SpannedNode<T> is a type alias for Spanned<Node<T>, ...>, and that Spanned<T, ...> implements Deref<Target=T>. This makes sense — a SpannedNode is just supposed to be a Node that also keeps track of where in the parsed source it came from (for error messages), but I always find the Deref implementations a bit sneaky in rustdoc. Because of that Deref implementation, I can just use any of the public properties on Node directly, which gave me all I needed.

This is clearly a lot more real logic than I had before. Enough that I was no longer comfortable only testing it manually. So I also created the first tests for the project.

Testing, and a twist

#[test]
fn basic_config_parsing() -> miette::Result<()> {
    let file_content = r#"
        defaults {
            keep all
        }

        feed "https://example.com" {
            keep last=3
        }
    "#;

    assert_eq!(
        parse_config("test.kdl", file_content)?,
        Config {
            defaults: Defaults {
                keep: Keep::All
            },
            feeds: vec![
                Feed {
                    url: "https://example.com".to_owned(),
                    keep: Some(Keep::Last(3))
                }
            ]
        }
    );

    Ok(())
}

With rising anticipation, I ran cargo test, and…

failures:
---- tests::basic_config_parsing stdout ----
Error:   × error parsing KDL

Error:
  × identifiers cannot be used as arguments
   ╭─[test.kdl:3:22]
 2 │             defaults {
 3 │                 keep all
   ·                      ─┬─
   ·                       ╰── unexpected identifier
 4 │             }
   ╰────
  help: consider enclosing in double quotes ".."

Huh? I could have sworn that was valid syntax according to the KDL spec. Maybe I totally imagined that? Lets check on KDL Play!

defaults
    keep
        "all"

feed
    "https://example.com"
    keep
        last=3

It parses just as I expected. Hm.

After some investigation, it turned out to be surprisingly simple (if disappointing): knus only supports KDL 1.0, and identifiers (ie. bare strings) as arguments were only added in KDL 2.0. Womp womp. If I add quotes where it suggests, then the test passes.

I think this means I’m going to have to start the parsing code over again, this time by hand. Unfortunate! But still a learning opportunity.