Day 1: Not that simple, really

RSS is the standard podcasts use for distribution. It stands for, among other things, “Really Simple Syndication”. I had a hunch going into this project that this wouldn’t be a super accurate name. An early survey backs that up.

The first thing I did was to update my code to print every tag (but not their properties etc) in a tree form. I hoped this would help me get my bearings in the format, and it did help.

// ---8<--- snip
let mut buf = Vec::new();
let mut depth = 0;
loop {
    match reader.read_event_into(&mut buf) {
        Ok(Event::Empty(tag)) => {
            for _ in 0..depth { print!("\t") }
            println!("{}", String::from_utf8_lossy(tag.name().0));
        },
        Ok(Event::Start(tag)) => {
            for _ in 0..depth { print!("\t") }
            println!("{}", String::from_utf8_lossy(tag.name().0));
            depth += 1;
        }
        Ok(Event::End(_)) => {
            depth -= 1;
        }
        Ok(Event::Eof) => break,
        _ => (),
    };

    buf.clear();
}
// ---8<--- snip

This gave me the following output (when run with my hard-coded feed URL)

rss
        channel
                title
                link
                pubDate
                lastBuildDate
                ttl
                language
                copyright
                webMaster
                description
                category
                category
                category
                category
                category
                category
                category
                category
                category
                category
                category
                generator
                docs
                image
                        url
                        title
                        link
                        width
                        height
                        description
                atom:link
                itunes:new-feed-url
                itunes:author
                itunes:type
                itunes:category
                itunes:image
                itunes:explicit
                itunes:owner
                        itunes:email
                        itunes:name
                itunes:subtitle
                itunes:summary
                media:copyright
                media:thumbnail
                media:category
                podcast:guid
                podcast:podping
                podcast:locked
                item
                        guid
                        title
                        pubDate
                        link
                        description
                        enclosure
                        itunes:title
                        itunes:subtitle
                        itunes:episodeType
                        itunes:duration
                        author
                        itunes:author
                        itunes:summary
                        itunes:image
                        media:content
                        podcast:transcript
                        content:encoded
                item
                        guid
---8<--- snip

So, we do have a lot of itunes namespaced tags, as I suspected. But they mostly seem to be metadata about the podcast, rather than something important like the file itself. I know from previous work that the file URL is an attribute on enclosure.

From what I know about XML, it is supposed to be eXtensible, with different bodies defining their own namespaces, all of which can live together in a single document. I suspect itunes is one such namespace that Apple defined when integrating podcasts into iTunes (adding things like images, descriptions, etc.).

Historical matters

Now to that great bastion of knowledge: Wikipedia! The Wikipedia page for RSS has a good overview of the format, and importantly, how it has evolved. I see that there are two family trees: the 1.x “RDF” branch, which sounds like it emerged from the broader semantic-web movement (fun!), and the 2.x “UserLand” branch, a supposedly simplified version initially advocated for by the author of a popular RSS reader. The RDF/1.0 branch is older (predictably), and apparently RSS originally stood for “RDF Site Summary”.

There’s an interesting story about accidental standards-capture (if that’s the term), where UserLand was so popular that its behaviour (eg. allowing HTML embeds) became a defacto standard. Reminds me of Mastodon, in a way. RSS and all.

Per a rather ancient survey, it sounds like the UserLand/2.x branch is more popular, but doesn’t fully dominate the landscape. I am also sympathetic to a lot of the semantic web’s goals, so I wouldn’t want to exclude it anyway. It was nice of them to keep off each others’ toes with the version numbers.

I also happen to know that Atom is a thing, supposedly an even simpler standard. The feed I’m testing with contains exactly one tag in the atom namespace. Maybe it is just an additional schema you can add to other RSS documents?

Well, that’s all the time I have today. Tomorrow, we read the specs! Or at least get a conceptual overview.