Day 2: Dir full of cache

Skimming the spec

I gave a quick scan of the RSS 2.0 Specification today, which to my relief doesn’t seem too complicated. I was wondering if each RSS file could map on to multiple channels, but it seems that each file only has one top-level <rss> containing a single <channel>. One weird thing is that it has its own version of cache control separate from HTTP, based on which days/hours it expects to have updates. These seem to just be advisory though, so I think I’ll stick to purely HTTP caching, at least for now.

My working mental model for RSS feeds is that they are a single channel, which has some metadata about the channel itself, and then a list of items. These items can be articles etc., and have associated titles, images and things, but for podcasts they also have an attached <enclosure>, which points to the file. I guess this means you could probably have a hybrid RSS feed for both articles and podcasts? Interesting.

Money laundering Cache cleanup

In Senace, I am using a crate called http_cache_reqwest to manage HTTP requests. This is a middleware for the popular reqwest HTTP client which automatically implements standards-compliant caching behaviour. I was scared straight by the Feed Reader Behavior Project, which quite rightly asks that people respect HTTP caching headers when they’re sent. It’s only polite, anyway. I’m not really sure what specific caching behaviours are used — I hope that conditional requests on Last-Modified time and ETags are included.

I need to store the cached HTTP requests somewhere. Previously I just stored them in an “http_cache” directory wherever you ran the program, which isn’t ideal. Today, I updated the code to use the directories crate to automatically generate a good place to store it, dependent on the OS. On Linux, this will use the XDG Base Directories.

let Some(project_dirs) = ProjectDirs::from("is", "geist", "seance") else {
    bail!("can't figure out where to put project files");
};

let cache_path = project_dirs.cache_dir();
let http_cache_path = path_dir.join("http_cache");

let client = ClientBuilder::new(Client::builder().redirect(Policy::limited(5)).build()?)
    .with(Cache(HttpCache {
        mode: CacheMode::Default,
        manager: CACacheManager::new(http_cache_path, true),
        options: HttpCacheOptions {
            cache_options: Some(CacheOptions {
                shared: false,
                ..Default::default()
            }),
            ..Default::default()
        },
    }))
    .build();

Choosing the crate to use here was a bit involved. I initially wanted to use cap_directories for security reasons, but this wouldn’t work because the cache manager creates/manages the files on my behalf. The idea behind cap_directories is to prevent you from accidentally concatenating paths in an unsafe way and allowing an attacker to access the rest of your computer. I don’t expect to have to build any dynamic paths anyway, so this should be fine.

What confused me for a bit is that cap_directories says that it uses directories_next internally. In turn, directories_next says that it is a still-maintained fork of the directories crate… but its last change was much longer ago than the regular directories crate!

directories was updated this year, and the recent work on it seems to be happening on Codeberg (nice!).

Tomorrow, maybe I’ll start figuring out how to store the configuration for what feeds to access etc.