I want to make an argument for moving the PLR internals toward Structured Concurrency. In short, that is a relatively new programming paradigm (8 years old, so not that new anymore), which makes reasoning about concurrent tasks (such as waiting for instrument command responses) much easier, thereby greatly reducing bugs. The linked document by Nathaniel Smith is probably the origin of that term, and he is the author of the python trio library (a mostly incompatible alternative to asyncio). He claims that any other concurrency should really be banned (like goto is banned nowadays), so that is why trio intentionally doesnāt support them. However, for some projects, abandoning asyncio is not an option for various reasons. Luckily, the anyio library (inspired by, and modelled after trio) reāimplements structured concurrency on top of either event loop (asncyio and trio), so that would be the path to go.
What problem does it solve concretely? Timeouts, cancellation, OS resource management and error handling. Currently, PLR tends to hang in many cases when an unexpected response comes from a machine. A failed setup call sometimes cannot be recovered by a stop call and may require a re-start of the process (i.e. interpreter or kernel). On linux, with the STAR backend, I quickly run into āOSError: Too many open filesā because of the many asyncio event loops that are being started and stopped. Sure, all of these issues can be solved one by one, but I argue that most of these would have been avoided in the first place, if structured concurrency had been used.
So what will it entail? Mostly adaption would have to happen within the hardware communication loops. However, structured concurrency intimately means that each such loop shall live within some async context manager (structuring the lifetime of the concurrent tasks). Thus, the preferred API for enabling communication with hardware would be a move away from setup/stop to context managers. I anticipate push-back on this primarily because of interactive (notebook) use. There is a solution: setup/stop availability could be restored by having a (hidden) global object lifecycle context manager. That sounds scary? (it should - global state is a scary thing). It is required because setup/stop implicitly relies on global state in all of those detached futures which become managed by structured concurrency. By making this explicit in that global object lifecycle manager, at least we gain control over it.
What are the feelings from the community? Should I attempt a PoC?