Day 16: Expecting the unexpected
Work continues on at the rate that my schedule allows. Today I began on what might be the trickier half of the World architecture I previously described, the external driver. Also, in writing the execution state, I found myself having to account for “impossible” conditions, which I’ll need some strategy to handle.
Driving forward
My progress on the I/O driver is as follows:
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_io()
.build()
.into_diagnostic()?;
rt.block_on(async {
execution_state.init(&mut world);
let (mut message_tx, mut message_rx) = tokio::sync::mpsc::channel::<Message>(32);
loop {
for action in &world.actions {
match action {
Action::ExitFailure(message) => {
eprintln!("{message}");
std::process::exit(1);
},
Action::FileRead(file_read_id, app_path) => {
let path_buf = app_path_resolver.resolve(app_path);
tokio::task::spawn_blocking(|| {
todo!()
});
},
}
}
world.clear();
}
});
My plan is to have a single Tokio channel containing messages, which I will read from in each iteration of the control loop. Each action that needs to access the outside world will spawn a Tokio task that writes its result to that channel. I think this will allow me to keep the integration surface between the I/O driver and its backing reactor (eg. Tokio) as small as possible.
What to do when doing the impossible
Séance’s World architecture is centered around a receive method which takes in Messages and
performs Actions. This interface is the full extent of the core logic’s interface to the outside world
(through the appropriately named World argument), and so must also include all I/O. Here’s what that
looks like right now:
fn init(&mut self, mut world: impl World) {
let config_read_id = self.file_read_ids.next();
world.act(Action::FileRead(config_read_id, AppPath::Config));
self.config_state = ConfigState::Pending(config_read_id);
}
fn receive(&mut self, message: Message, mut world: impl World) {
if let ConfigState::Pending(config_id) = self.config_state {
match message {
Message::FileReadComplete(read_id, read_result) if read_id == config_id => { /* ... */ },
other => eprintln!("warning: unexpected message {other:?}")
}
}
}
The first thing we do when initializing is to request the config. While we are waiting for that message to return, we could theoretically receive any other message,in that the type system allows it. It isn’t clear what we should do in that case. Today any other message would have to arise because of a bug — either we unloaded the config file after sending out another action, or maybe the driver accidentally sent duplicated a message.
In a distributed system, the correct behaviour here is often pretty clear: we ignore the unexpected messages, perhaps generating a warning. As several Jepsen analyses note, it is entirely possible in a distributed system to get a response to a message before you send it. Maybe you sent a message and then crashed and restarted. Because crashing Séance also naturally closes any connection to external systems, though, this doesn’t seem relevant here.
Another popular strategy here is to hard-crash the application, the logic being that we have clearly entered into a state that we didn’t expect, and so are at risk of breaking things, for example by corrupting data. This is the behaviour enforced by assertions, which are generally considered a good practice for high-reliability applications. For example, in Tiger Style , the style guide used by TigerBeetle, they say
Use assertions: Use assertions to verify that conditions hold true at specific points in the code. Assertions work as internal checks, increase robustness, and simplify debugging.
A problem with this approach is that I want to use some sort of generative testing, such as with quickcheck, so that I can thoroughly test the application’s behaviour in different circumstances. A significant benefit to the pure World architecture is it allows me to send in arbitrary messages and not have to worry about it breaking things, since nothing happens if we ignore any outgoing actions.
For now, I think I’ll continue writing out error messages, but eventually I want to modify the system so that these messages are displayed somewhere that a person using the software can see, to aid debugging.
A note on naming
I’m not fully satisfied with the names names I currently use in the World architecture implementation. World I quite like, but the central place I hang application logic is called “ExecutionState”. Straightforward, sure, but also mechanical and indistinct. Two better names do come to mind, but they each have their downsides.
The first is “Actor”, since it is the thing doing the acting, and the model generally has a resemblance to the actor model. It isn’t the actor model, though, which complicates things. The biggest difference is that there is only one actor in the system, whereas the purpose of the actor model is to model processes as the interactions of many distinct actors. The external driver has some internal resemblance to the actor model since it is based on a series of Tokio tasks, though that feels incidental.
The second is “Agent”, which fits almost as well and has an appropriate feeling of autonomy. Until recently, it also didn’t have a single common use in software systems (that I knew of). Unfortunately, it is currently strongly associated with “AI Agents”, AKA LLMs hooked up to a control loop with tool access. Interestingly, that loop itself is in some ways similar to this architecture, with the LLM “Agent” taking the same place as the “ExecutionState”. I definitely do not want to associate my work with the current AI scene, though, which is sloppy and exploitative and uncritical in the exact opposite way to my intent. It does fit very well though, so I may choose that name anyway.