Day 21: Whereof one cannot type

Today’s session completed the process of round-tripping HTTP out of– and back into– the agent. I can once again run seance sync and have the contents of my configured feeds printed to the terminal. This has been a big of a slog which I am now glad to get through. Excitingly, this opens the door to building out the UI next, which will be a nice break.

The most interesting part of this work was building out the design for my URL type. I ran into a few obscure corners of Rust’s type/module systems which I think are worth sharing.

A type of my own

Until now Séance’s code has contained multiple different (incompatible) URL types, determined by when I wrote the code or what was required by the libraries I was using. Specifically this included http::Uri, reqwest::Url and url::Url. I added http::Uri and url::Url manually into some of the core types so that I wasn’t representing URLs as unstructured strings everywhere. URLs follow specific rules, and as described in Parse, don't validate, we want to represent those rules in the type system as early as possible. The http::Uri code was written around when I was already thinking of basing my HTTP types around the http crate (see yesterday’s entry on request equality for why I moved away from that). Removing that duplication was easy enough.

This leads me with a bit of a conundrum, though: I don’t want to use reqwest::Url in all my types because that would couple all my code to the specific HTTP library I am using (and hope eventually to replace). However, the only thing I actually need to use the URLs for currently is integrating with that library. I could store my own internal representation of URLs and convert on-demand, but at time of writing the only way to create a reqwest::Url is by parsing one. Re-parsing (and re-allocating) the contents of a URL every time I actually use it feels wrong to me. Sure, the actual overhead would probably be tiny, but it would still feel worse to me. Imperfect. Spreading memory fragmentation through the heap. I consider minimizing dynamic/temporary allocations an artistic constraint on this project. It’s not just aesthetics though, small allocations can become a systemic performance issue. For example, it was once the case that Chrome would allocate 25,000 strings every time you typed a letter in the URL bar.

The solution I landed on here is to use the newtype pattern again, but in this case for the purpose of decoupling the rest of the code from the internal type.

#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) struct Url(reqwest::Url);

impl FromStr for Url {
    type Err = ParseError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        reqwest::Url::parse(s)
            .map(Url)
            .map_err(|_| ParseError::Unknown)
    }
}

Unspeakable types

The first issue I ran into with this strategy was in creating the ParseError type. The code itself doesn’t need to care about what specifically was wrong with the URL. If it failed to parse, the best we can do is inform the person using the software about that. Therefore, all we actually need out of this type is for it to be able to display a human-readable explanation. Because reqwest::Url already returns its own ParseError which has this explanation, I was hoping to just use that one in the mean time. Something like this (using thiserror):

#[derive(Error, Debug)]
pub(crate) struct ParseError(#[from] reqwest::ParseError);

However, I discovered that the ParseError exposed by reqwest is actually a re-export of url::ParseError from the previously mentioned url crate, which reqwest uses internally. This means I would actually have to add a dependency on a compatible version of url and then use that in the type:

#[derive(Error, Debug)]
pub(crate) struct ParseError(#[from] url::ParseError);

This works, but feels weird.

Even the Turbofish can’t save you now

Now that I have my URL type, how do I adapt it to be useable with reqwest? The cleanest solution would be for reqwest’s methods to be generic over some sort of trait that we could implement, such as AsRef<Url>. Unfortunately, the relevant methods (eg get) take an IntoUrl, which is a sealed trait, a trait that cannot be implemented outside of the crate that defines it. Also, the IntoUrl trait is only implemented on owned types, eg reqwest::Url and not &reqwest::Url, meaning we would need to consume whatever we pass to it, or clone it ahead of time.

I like to use the Rust standard-library traits when possible, so my first thought was to implement the Into trait, like this:

struct Url(reqwest::Url);

impl Into<reqwest::Url> for Url {
    fn into(self) -> reqwest::Url {
        self.0
    }
}

// and later

reqwest::get(url.into())

Unfortunately this doesn’t work. Because reqwest::get takes a generic type and the Into trait also contains a generic type, Rust’s type inference gives up. This is probably for the best: it would otherwise be possible for there to be multiple valid types we could infer, meaning Rust would have to choose one arbitrarily. For example, if Url also implemented Into<String> then both String and reqwest::Url would be valid inferred types.

But I’ve been around the block, I know what to do when you want to specify the your generic types manually: the Turbofish! Something like this:

reqwest::get(url.into::<reqwest::Url>())

But this also doesn’t work. The issue here is that the Turbofish defines the generic parameters for the method call, but we need to define the generic parameters for the trait itself. There is a syntax for this, but…

reqwest::get(Into::<reqwest::Url>::into(url))

Blech. No thanks. Maybe if I expected to be converting URLs to lots of different types I would still reach for Into, but this is likely the only conversion I will actually need. So instead, I landed on a simpler approach:

impl Url {
    fn into_reqwest(self) -> reqwest::Url {
        self.0
    }
}

// and later

reqwest::get(url.into_reqwest());