This literally came to me in a dream, so wanted to write down here before it escaped my brain (yes I sometimes dream in code)
We have our classic plr demo (adapted so it is using an API, because this is what enables this kind of thing):
from pylabrobot.liquid_handling import LiquidHandler
from pylabrobot.liquid_handling.backends import STARBackendAPI
from pylabrobot.resources import Deck
auth_key = "AUTH_KEY"
deck = Deck.from_api("machine1", base_url = "pylabrobot.org/api/v1/state", auth = auth_key)
backend = STARBackendAPI("machine1", base_url = "pylabrobot.org/api/v1/star", auth = auth_key)
lh = LiquidHandler(backend=STARBackendAPI(), deck=deck)
await lh.setup()
await lh.pick_up_tips(lh.deck.get_resource("tip_rack")["A1"])
await lh.aspirate(lh.deck.get_resource("plate")["A1"], vols=100)
await lh.dispense(lh.deck.get_resource("plate")["A2"], vols=100)
await lh.return_tips()
The problem is that if the program fails at any of the awaits, or if there is a disconnection in network, or if there is a power outage, everything goes down. You can’t just rerun the protocol, you have to figure out what went wrong and modify the code accordingly. This is obviously a bad thing, especially with very expensive protocols.
The idea of durable execution is that the protocol can be re-run however many times, and stopped at any point, and the software will continue execution as expected. While other systems need a lot of infrastructure in order to do this, it can be done quite simply with the networked backends. Here is how it works:
from pylabrobot.liquid_handling import LiquidHandler
from pylabrobot.liquid_handling.backends import STARBackend
from pylabrobot.resources import Deck
from pylabrobot.durable import DurableExecution
auth_key = "AUTH_KEY"
durability_key = DurableExecution.keygen("exec.key") # file that gets written to filesystem
deck = Deck.from_api("machine1", base_url = "pylabrobot.org/api/v1/state", auth = auth_key, durability_key = durability_key)
backend = STARBackendAPI("machine1", base_url = "pylabrobot.org/api/v1/star", auth = auth_key, durability_key = durability_key)
lh = LiquidHandler(backend=STARBackendAPI(), deck=deck)
await lh.setup()
await lh.pick_up_tips(lh.deck.get_resource("tip_rack")["A1"])
await lh.aspirate(lh.deck.get_resource("plate")["A1"], vols=100)
await lh.dispense(lh.deck.get_resource("plate")["A2"], vols=100)
await lh.return_tips()
What happens is that the program generates a durable key, which is just a random seed that it then writes to the filesystem (or wherever). Whenever it sends either a resource or liquid handling request to the API, it generates the next RNG number, and sends it alongside. The server checks if it has responded to that RNG number. If it has, it simply passes the exact data that it has already sent back to the robot.
POST pylabrobot.org/api/v1/star {"id": "2ec1a198-9300-458f-8616-d442ce95d27f", "cmd": "aspirate", "vol": 10}
# server checks if it has already generated 2ec1a198-9300-458f-8616-d442ce95d27f
# if it has, return that JSON. If it has not, go actually run it on the robot.
RETURN {"status": "complete"}
This is essentially just a key-value cache (id to JSON string), so is fairly easy to implement, but extremely difficult to implement if you don’t own the backend (temporal is an example of someone trying to solve this in a general way).
Coincidentally, this also creates a traceable log of everything that has happened on the robot.
It also depends on you NOT having any commands on the basis of a random number generator. Any decisions made from randomness fuck up the system. Which for robot protocols shouldn’t be much of a problem.
My original implementation of this was in lua because you could actually make execution guarantees, embed the scripting ability into a larger system), make pausing a first-class thing that always happens (lua is just a better language than python for this kind of thing), but eh, if it implemented at the API level it doesn’t matter if it is lua or python.
function main(ctx)
result, cont = do_something(ctx)
if cont then return cont # this handles errors and continuations
return result
end
In this, you have explicit flow control, and protocols immediately halt and return execution every time (unlike the python, which kind of just waits at each async). But I don’t think you can really get that with python because of how long spawning the python process takes.
