Day 17: Are you afraid of the disk?
Today was another shorter session, but in many ways a more satisfying one. I was able to complete enough of the I/O driver to successfully load the config from the file system. This brought to mind several tricky aspects of file systems that I think are under appreciated, which I will detail here. But first: the actual code:
rt.block_on(async {
let (message_tx, mut message_rx) = tokio::sync::mpsc::channel::<Message>(32);
'main: loop {
for action in world.actions.drain(..) {
match action {
Action::ExitSuccess => break 'main,
Action::ExitFailure(message) => {
eprintln!("{message}");
std::process::exit(1);
},
Action::FileRead(file_read_id, app_path) => {
let path_buf = app_path_resolver.resolve(&app_path);
let message_tx = message_tx.clone();
tokio::task::spawn(async move {
let result = tokio::fs::read(path_buf).await.map_err(|err| err.kind());
message_tx.send(Message::FileReadComplete(file_read_id, result)).await
.expect("message channel closed");
});
},
}
}
match message_rx.recv().await {
Some(message) => execution_state.receive(message, &mut world),
None => todo!("read no message?")
};
}
});
As I had hoped, writing to a single shared message channel seems to work. Because
Tokio tasks are poll()‘d to completion
automatically, I don’t need to maintain any additional state to ensure they actually make progress.
I’ve added an ExitSuccess action as well, now that successfully exiting is actually possible (currently
all the synchronization process does is read the config, print it, then exit).
I also get to use a labeled break in Rust, which I’ve only done a handfull of times before. Normally I would
prefer to use a while loop of the form while running, and assign running = false, but this would cause the
loop to run forever, as currently written. Each iteration of the main loop needs at least one message to be received
from the channel. If there were no other actions other than exiting, then this would deadlock. I think the labeled
break is harmless in this case, so long as you don’t ask Djikstra.
File systems are basically network services
Given the prevalence of SSDs with ~200 μs read-times, it can be easy to think of storage access as just sort of slow. Sure, it’s not as good as being in memory, but you could still do a few of them per frame in a video game and still hit 60 fps. This is not true in general though. For starters, hard disks (you know, the actual spinning rust) can have much slower times. According to the (surprisingly specific) Wikipedia page on hard disk drive performance characteristics, seek times for modern hard drives are in the 1-15 ms range, or up to 60 times slower. That’s not great, but that’s still assuming the rust is already spinning. If the hard drive has spun down to save power, say in a laptop, it could take seconds before you get a response. This would be considered slow even for a network request.
More directly than hard drives though, sometimes file systems really are networks, because they are using
Network-attached storage. In that case, anything goes: your file system
read could take an arbitrarily long time, or may never complete. It may also break because the network went down and you get
an otherwise uncommon (though fun to say) EIO error.
For this reason, I plan to treat everything to do with the file system the same way I do network operations, with timeouts, retries, and an appropriate degree of caution.