Day 18: About time
I began my session today intending to add a timeout when attempting to load the config file. However, when I had written enough code to get that working, I realized that timeouts would actually be unhelpful behaviour.
I plan to make system status fully visible to the person using Séance. This means they would be able to see that they are waiting a long time to load the config. How much time is too long to wait is entirely up to them. It is also situational: I could expose a config setting for this timeout, but maybe they normally expect fast reads but right now they’re defragging their hard drive.
Reflecting on timeouts more, I think they only make sense as a default if the timed-out operation is optional, or is blocking other unrelated progress. I expect few such situations when synchronizing podcasts. We can always perform other I/O in parallel, so there’s no blocking at play. Similarly, there are no obvious cases where an I/O read is optional. Maybe if we time out reading from the file system for existing downloads, we could fall back to re-downloading them from the network, but that’s a niche scenario.
A whole new type
When performing an action such as reading from a file, we need a way to identify that particular request so that we can similarly identify its response. I do this by adding an ID field to the action that is sent back in the response, as follows:
Action::FileRead(file_read_id, app_path) => {
let path_buf = app_path_resolver.resolve(&app_path);
let message_tx = message_tx.clone();
tokio::task::spawn(async move {
message_tx.send(Message::FileReadComplete(file_read_id, result)).await
});
}
Note that the actual value of the ID is never inspected. The only interface that the
I/O driver requires of file_read_id is Copy. This gives us the freedom to change it
at will without making it harder to implement new I/O drivers.
Previously FileReadId was defined as a simple wrapper:
#[derive(Debug, ParitalEq, Eq, Clone, Copy)]
struct FileReadId(NonZeroUsize);
This is called the newtype idiom. By
creating a struct with a single anonymous field, we tag the inner type with additional compiler-verified
metadata. I used the pattern in this case so that I could keep track of the various types of IDs that I
expect to create. Because I only derive Eq on the type itself, I can’t accidentally try to compare two
IDs of different types, preventing a class of bugs.
#[derive(Debug, ParitalEq, Eq, Clone, Copy)]
struct FileReadId(NonZeroUsize);
#[derive(Debug, ParitalEq, Eq, Clone, Copy)]
struct TimerId(NonZeroUsize);
timer_id == file_read_id // error: mismatched types
Compare this to if I had used a type alias, which does not create a distinct new type:
type FileReadId = NonZeroUsize;
type TimerId = NonZeroUsize;
timer_id == file_read_id // no error because they're the same type
I used NonZeroUsize here because Rust can perform some extra size optimizations on that type. Because
it knows that the value can never be zero, it can use it to reduce the size of Option types:
println!("{}", std::mem::size_of::<Option<usize>>()); // 16 (bytes)
println!("{}", std::mem::size_of::<Option<NonZeroUsize>>()); // 8 (bytes)
Rust will represent the None variant of Option<NonZeroUsize> as 0, whereas with a regular usize it would
need to store a separate boolean to indicate whether the value is present (which grows to be 8 bytes large, because
of the type’s alignment).
Because of this, I use NonZeroUsize (and the other NonZero types) out of habit whenever I don’t actually need
the zero value.
Identify yourself
However, I realized that this approach isn’t ideal in the long term. When we actually have to read an arbitrary number of files while downloading podcast episodes, we would have to scan through every ID to find out what a completion message means. My first thought was that I could figure out who the message belongs to by mapping different types of responses to different value ranges: RSS feeds get values 100–99,999, audio downloads get 100,000–200,000, etc. I could then subtract off the starting value to get the index of some associated array, which would be constant time.
Thinking a bit longer, though, I realized this is just a worse version of enums. So now I have reworked the FileReadId
types to reflect their higher-level meaning:
// Before
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
struct FileReadId(NonZeroUsize);
// ...
match message {
Message::FileReadComplete(read_id, read_result) if read_id == config_read_id => {
// After
#[derive(Debug, PartialEq, Eq, Clone, Copy)]
enum FileReadId {
Config,
RssFeed(u16),
// ...
}
// ...
match message {
Message::FileReadComplete(FileReadId::Config, read_result) {
Not only does this new enum take up the same amount of size as the previous verion (technically less because I started using u16!), it also reads better, moving the identity of the read into the match pattern, rather than the following guard clause.