Day 21: Whereof one cannot type
Today’s session completed the process of round-tripping HTTP out of– and back into– the agent.
I can once again run seance sync and have the contents of my configured feeds printed to the
terminal. This has been a big of a slog which I am now glad to get through. Excitingly, this opens
the door to building out the UI next, which will be a nice break.
The most interesting part of this work was building out the design for my URL type. I ran into a few obscure corners of Rust’s type/module systems which I think are worth sharing.
A type of my own
Until now Séance’s code has contained multiple different (incompatible) URL types, determined by
when I wrote the code or what was required by the libraries I was using. Specifically this included
http::Uri, reqwest::Url and url::Url. I added http::Uri and url::Url manually into some
of the core types so that I wasn’t representing URLs as unstructured strings everywhere. URLs follow
specific rules, and as described in
Parse, don't validate,
we want to represent those rules in the type system as early as possible.
The http::Uri code was written around when I was already thinking of
basing my HTTP types around the http crate (see
yesterday’s entry on request equality for why I moved away from that).
Removing that duplication was easy enough.
This leads me with a bit of a conundrum, though: I don’t want to use reqwest::Url in all my types because that would
couple all my code to the specific HTTP library I am using (and hope eventually to replace). However, the only thing I
actually need to use the URLs for currently is integrating with that library. I could store my own internal representation
of URLs and convert on-demand, but at time of writing
the only way to create a reqwest::Url is by parsing one. Re-parsing
(and re-allocating) the contents of a URL every time I actually use it feels wrong to me. Sure, the actual overhead would
probably be tiny, but it would still feel worse to me. Imperfect.
Spreading memory fragmentation
through the heap. I consider
minimizing dynamic/temporary allocations an artistic constraint on this project. It’s not just aesthetics though, small allocations can become
a systemic performance issue. For example, it was once the case that
Chrome would allocate 25,000 strings every time you typed a letter in the URL bar.
The solution I landed on here is to use the newtype pattern again, but in this case for the purpose of decoupling the rest of the code from the internal type.
#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) struct Url(reqwest::Url);
impl FromStr for Url {
type Err = ParseError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
reqwest::Url::parse(s)
.map(Url)
.map_err(|_| ParseError::Unknown)
}
}
Unspeakable types
The first issue I ran into with this strategy was in creating the ParseError type. The code itself doesn’t need to care
about what specifically was wrong with the URL. If it failed to parse, the best we can do is inform the person using the software
about that. Therefore, all we actually need out of this type is for it to be able to display a human-readable explanation.
Because reqwest::Url already returns its own ParseError which has this explanation, I was hoping to just use that one
in the mean time. Something like this
(using thiserror):
#[derive(Error, Debug)]
pub(crate) struct ParseError(#[from] reqwest::ParseError);
However, I discovered that the ParseError exposed by reqwest is actually a re-export of url::ParseError from the
previously mentioned url crate, which reqwest uses internally. This means I would actually have to
add a dependency on a compatible version of url and then use that in the type:
#[derive(Error, Debug)]
pub(crate) struct ParseError(#[from] url::ParseError);
This works, but feels weird.
Even the Turbofish can’t save you now
Now that I have my URL type, how do I adapt it to be useable with reqwest? The cleanest solution would be for reqwest’s
methods to be generic over some sort of trait that we could implement, such as AsRef<Url>. Unfortunately, the relevant
methods (eg get) take an IntoUrl, which is a sealed trait, a trait that cannot be implemented outside
of the crate that defines it. Also, the IntoUrl trait is only implemented on owned types, eg reqwest::Url and not
&reqwest::Url, meaning we would need to consume whatever we pass to it, or clone it ahead of time.
I like to use the Rust standard-library traits when possible, so my first thought was to implement the Into trait, like
this:
struct Url(reqwest::Url);
impl Into<reqwest::Url> for Url {
fn into(self) -> reqwest::Url {
self.0
}
}
// and later
reqwest::get(url.into())
Unfortunately this doesn’t work. Because reqwest::get takes a generic type and the Into trait also contains
a generic type, Rust’s type inference gives up. This is probably for the best: it would otherwise be possible for
there to be multiple valid types we could infer, meaning Rust would have to choose one arbitrarily. For example, if
Url also implemented Into<String> then both String and reqwest::Url would be valid inferred types.
But I’ve been around the block, I know what to do when you want to specify the your generic types manually: the Turbofish! Something like this:
reqwest::get(url.into::<reqwest::Url>())
But this also doesn’t work. The issue here is that the Turbofish defines the generic parameters for the method call, but we need to define the generic parameters for the trait itself. There is a syntax for this, but…
reqwest::get(Into::<reqwest::Url>::into(url))
Blech. No thanks. Maybe if I expected to be converting URLs to lots of different types I would still reach for Into, but
this is likely the only conversion I will actually need. So instead, I landed on a simpler approach:
impl Url {
fn into_reqwest(self) -> reqwest::Url {
self.0
}
}
// and later
reqwest::get(url.into_reqwest());