Day 1: Not that simple, really
RSS is the standard podcasts use for distribution. It stands for, among other things, “Really Simple Syndication”. I had a hunch going into this project that this wouldn’t be a super accurate name. An early survey backs that up.
The first thing I did was to update my code to print every tag (but not their properties etc) in a tree form. I hoped this would help me get my bearings in the format, and it did help.
// ---8<--- snip
let mut buf = Vec::new();
let mut depth = 0;
loop {
match reader.read_event_into(&mut buf) {
Ok(Event::Empty(tag)) => {
for _ in 0..depth { print!("\t") }
println!("{}", String::from_utf8_lossy(tag.name().0));
},
Ok(Event::Start(tag)) => {
for _ in 0..depth { print!("\t") }
println!("{}", String::from_utf8_lossy(tag.name().0));
depth += 1;
}
Ok(Event::End(_)) => {
depth -= 1;
}
Ok(Event::Eof) => break,
_ => (),
};
buf.clear();
}
// ---8<--- snip
This gave me the following output (when run with my hard-coded feed URL)
rss
channel
title
link
pubDate
lastBuildDate
ttl
language
copyright
webMaster
description
category
category
category
category
category
category
category
category
category
category
category
generator
docs
image
url
title
link
width
height
description
atom:link
itunes:new-feed-url
itunes:author
itunes:type
itunes:category
itunes:image
itunes:explicit
itunes:owner
itunes:email
itunes:name
itunes:subtitle
itunes:summary
media:copyright
media:thumbnail
media:category
podcast:guid
podcast:podping
podcast:locked
item
guid
title
pubDate
link
description
enclosure
itunes:title
itunes:subtitle
itunes:episodeType
itunes:duration
author
itunes:author
itunes:summary
itunes:image
media:content
podcast:transcript
content:encoded
item
guid
---8<--- snip
So, we do have a lot of itunes namespaced tags, as I suspected. But they mostly
seem to be metadata about the podcast, rather than something important like the
file itself. I know from previous work that the file URL is an attribute on enclosure.
From what I know about XML, it is supposed to be eXtensible, with different bodies defining
their own namespaces, all of which can live together in a single document. I suspect itunes
is one such namespace that Apple defined when integrating podcasts into iTunes (adding
things like images, descriptions, etc.).
Historical matters
Now to that great bastion of knowledge: Wikipedia! The Wikipedia page for RSS has a good overview of the format, and importantly, how it has evolved. I see that there are two family trees: the 1.x “RDF” branch, which sounds like it emerged from the broader semantic-web movement (fun!), and the 2.x “UserLand” branch, a supposedly simplified version initially advocated for by the author of a popular RSS reader. The RDF/1.0 branch is older (predictably), and apparently RSS originally stood for “RDF Site Summary”.
There’s an interesting story about accidental standards-capture (if that’s the term), where UserLand was so popular that its behaviour (eg. allowing HTML embeds) became a defacto standard. Reminds me of Mastodon, in a way. RSS and all.
Per a rather ancient survey, it sounds like the UserLand/2.x branch is more popular, but doesn’t fully dominate the landscape. I am also sympathetic to a lot of the semantic web’s goals, so I wouldn’t want to exclude it anyway. It was nice of them to keep off each others’ toes with the version numbers.
I also happen to know that Atom is a thing, supposedly an even simpler standard. The feed I’m
testing with contains exactly one tag in the atom namespace. Maybe it is just an additional schema
you can add to other RSS documents?
Well, that’s all the time I have today. Tomorrow, we read the specs! Or at least get a conceptual overview.