Introducing PyLabPraxis Pre-alpha

marielle · June 20, 2025, 3:31pm

Hello everyone,

I have been seeing great discussions on the forum about building towards a shared protocol library, standardization, and integrating data handling into PyLabRobot. These are exactly the challenges I have dealt with when trying to scale automation from individual scripts to a robust, multi-user system where different people have different levels of experience, and you need to ensure good data safety.

I’ve been actively developing a platform called PyLabPraxis on top of PyLabRobot to build a solution for this. It’s currently pre-alpha, but I’d like to share how its architecture and design patterns directly address the points raised in the linked thread.

Although I have been developing it a while, it is still definitely in quite early stages given the scope of the project. I need to more thoroughly test the backend, but I have it in place to a degree that I felt comfortable publishing this.

Justification for a Managed Architecture

A simple folder of scripts was not enough to provide desired features for my use cases, where I need high quality documentation, flexibility in experiment runs, and solid data tracking. In my view, if these features were placed into the PyLabRobot repository, it could quickly become overwhelming and unfocused.

I think PyLabRobot should primarily focus on being the open-source interface for runtime lab automation. I see the separation of concerns I outline below as allowing PyLabRobot to remain focused and incredibly useful, particularly in the agile notebook based protocol optimization.

Someone previously raised the question “Is there space for this within the existing PLR architecture?”—my conclusion was that a truly robust library needs a managed environment built around PyLabRobot. In PyLabPraxis, this environment consists of:

A FastAPI backend and PostgreSQL database: This provides the foundation for a persistent, version-controlled library of protocols and a complete audit history of every run. Docker-based configuration makes this easy and convenient to set up on a server.
An Orchestrator service: This core component is responsible for managing the entire lifecycle of a protocol run—from fetching its definition and acquiring hardware assets to executing the code and logging the results. It interacts with all the other core components either directly or via the database and in-memory Redis storage.
A Flutter-based frontend: This enables users of different skill levels to submit protocols with arbitrary parameters and labware, with the protocol library, parameter specification, scheduling, resource and machine allocation and reservation, etc. specified by the users in a convenient, easy-to-use GUI, and then parsed by the backend. This frontend also has scaffolding for inventory management. I have gone through a few iterations of this frontend and have yet to update it to make it consistent with the current backend, so it is not working at the moment. I want to build towards integrating the visualizer in such a way that users can be guided through deck set-up and also data visualization, but these are definitely downstream tasks. It cannot be understated the degree to which having this kind of ease of use and built-in guidance helps more unfamiliar users feel comfortable using these often expensive machines.

This architecture is what makes a true “protocol library” possible, moving beyond simple script execution.

A Layered Standard for Protocols

Regarding standards and what a protocol should include, PyLabPraxis uses a layered approach. This ensures that simple protocols are easy to write, while complex workflows are fully supported. Crucially, most advanced features are optional.

The Minimum Requirement:

The baseline for a protocol in PyLabPraxis is just a Python function that uses type hints for PyLabRobot objects (LiquidHandler, Plate, etc.) and standard Python types. This low barrier to entry ensures that existing scripts can be easily integrated.

A discovery service scans the code and populates the database. The ProtocolCodeManager can fetch protocol code from various sources, including Git repositories, pinning versions including specific commit hashes for tracking and reproducibility. The system then uses this metadata to automatically generate a configuration menu with validated fields for each parameter. It also allows users to specify the which keyword argument is the state argument. The Orchestrator uses the asset_requirements to acquire and lock the necessary hardware via the AssetManager and AssetLockManager, If a variable named state: Dict is in the function, Praxis will also infer that it can load a protocol run specific PraxisState object in it’s place, which has coverage of dictionary built-ins to run smoothly.


# A simple, valid protocol with no special decorators.
def simple_transfer_cycle(
pipettor: LiquidHandler,
tip_rack:TipRack, 
source_plate: Plate,
dest_plate: Plate,
volume: float
):
# --- Standard PyLabRobot logic ---
  tip = tip_rack.get_tip("A1")
  pipettor.pick_up_tip([tip])
  pipettor.aspirate(source_plate["A1"], vols=[volume])
  pipettor.dispense(dest_plate["A1"], vols=[volume])
  pipettor.drop_tips(tip)

Optional Enhancement 1: Metadata for Improved Discoverability and Configuration

To enable a user-friendly library and pre-run configuration, we use an optional decorator, @protocol_function.

What it does: This decorator attaches metadata to the function, such as a human-readable name, description, version, and—most importantly—additional constraints on parameters or the specific asset requirements. It also allows users to specify the keyword for the state argument to load a protocol run specific view into PraxisState at runtime.

Optional Enhancement 2: Integrated, Structured Data Handling

A turbidostat that needs database and graphing capabilities. A protocol script should not have to manage its own database connections.

How PyLabPraxis solves this: I have designed Praxis to log any returned JSON-serializable data, with function and protocol specific accessions to allow for arbitrary reconstruction of data. The Orchestrator captures this return value and uses a dedicated function_output_data service and related data structures to persist it to the database. This can be the direct output from a PyLabRobot method or a called function within the top-level protocol function returning a simple dictionary or list of measurements. For more complex data like plate readings, the system has scaffolding to deconvolve measurements by linking them to the accessions of the resource instances used, which allows for connection to that resources metadata over time. For instance, it can automatically associate a list of 96 absorbance values with the specific uuid for the specific Plate instance used at runtime, it’s layout, and the database entry properties_json metadata, which could include any user specified features such as a dictionary of the well sample metadata. There is a also a field for the Plate.serialize() output state. For keeping track of history and ensure accuracy at time of reconstruction, we could either use database triggers to log history or some other approach. I have not implemented this yet, and am open to suggestions, but for now, users can ensure the outputs include the necessary metadata for their workflows.
Justification: This decouples the scientific logic from the data infrastructure. The protocol author only needs to focus on generating the data, not on how it’s stored, queried, or versioned. Because JSON-serializable is the only restriction, users can choose to output data in whatever ways make the most sense for their analysis.

Optional Enhancement 3: Complex Workflow State

For multi-step protocols that need to pass information between steps (e.g., what the current step is if you want arbitrary stopping and starting), the Orchestrator can provide an optional, run-specific object called PraxisState.

How it works: This is a dictionary-like object backed by Redis. A function can write a simple, JSON-serializable value to it (praxis_state.current_step_id = “wash_plates”), and a subsequent function in the same run can read it.
Justification: This provides a formal mechanism for state management within a distributed workflow (e.g., across different Celery tasks) without cluttering the protocol’s logic with complex parameter passing.

Examples in Practice: From Simple to Advanced

Here’s how these layers come together.

Simple Protocol with Basic Data Output This example uses no special decorators. It demonstrates moving a plate and returning the direct, JSON-serializable array output from a PyLabRobot method, which the system will capture and log, along with the data about the arguments provided, including data sufficient for reconstructing well metadata in relation to the output.

def measure_growth(
pipettor: LiquidHandler,
reader: PlateReader,
plate: Plate
) -> dict:
# --- Protocol Logic ---
pipettor.move_plate(plate, reader)
od = reader.read_absorbance(plate=plate, wavelength=600)

# The direct output from read_absorbance is JSON-serializable
# and will be automatically logged by the system.
return od

Advanced Protocol with Pydantic and Decorator This version uses the decorator for improved specification and a Pydantic model for strongly-typed, structured data output.

from pylabrobot.liquid_handling import LiquidHandler
from pylabrobot.resources import Plate, PlateReader
from pydantic import BaseModel
from praxis.models import protocol_function, AssetRequirement

class Readout(BaseModel):
  well_name: str
  absorbance: float
  well_sample_id: str


@protocol_function(
  name="Read Plate Absorbance",
  description="Measures and returns the absorbance for each well of a plate.",
  asset_requirements=[
    AssetRequirement(name="plate_reader", asset_type=PlateReader),
     AssetRequirement(name="assay_plate", asset_type=Plate)
  ]
)
def read_plate(
  plate_reader: PlateReader,
  assay_plate: Plate,
  wavelength: int = 450
) -> list[Readout]:
"""
This protocol returns a list of Pydantic models, which the system
will automatically log to the database with full validation. This assumes
The user has added a "sample_id" attribute to the well object.
"""
absorbance_data = plate_reader.read_absorbance(plate=assay_plate, wavelength=wavelength)

# Transform raw data into our structured Pydantic model.
results = [
Readout(well_name=well.name, absorbance=data, well_sample_id.get("sample_id", "UNKNWON"))
for well, data in zip(
]
return results

I believe this flexible, layered approach directly addresses the need for a powerful yet accessible protocol library.

API-Driven for Frontend Accessibility

A key goal for PyLabPraxis is to make these powerful automation capabilities accessible to a broader user base via a graphical frontend. The entire architecture is built to support this. The combination of a PostgreSQL database for persistence, Pydantic models for data validation, and a FastAPI backend exposes all of this functionality through a comprehensive RESTful API. This allows a web application to:

List and search all available protocols.
Fetch protocol details and dynamically render a configuration form for user parameters.
Initiate, monitor, and manage protocol runs.
Retrieve and visualize structured data outputs from completed runs.

Because the system also allows for nested protocol structures where you have functions calling other functions (unless the top_level argument in the decorator is set to True), I think this could also be used as a more granular toolset to construct protocols. But that is quite far downstream.

The Broader PyLabPraxis Workflow

It’s important to note how this protocol library fits into the larger system that manages the physical lab environment. Here’s a brief overview:

The Workcell and WorkcellRuntime components are responsible for managing the live, operational state of all PyLabRobot objects. They are initialized from workcell configurations stored in the database and handle the real-time connections to the hardware.
The AssetManager sits on top of this, handling the logical allocation of assets. When the Orchestrator needs a specific liquid handler for a protocol, it’s the AssetManager’s job to request the live object from WorkcellRuntime and ensure it’s available.
For long-running protocols, eventually managing parallel use of a single instrument, or managing high-demand equipment, a ProtocolScheduler can be used. This component leverages Celery and Redis to place protocol runs into a queue for asynchronous execution. This prevents the API from being blocked and allows for scheduling, reservation of instruments, and robust execution of workflows that might take hours.
The Orchestrator brings it all together. It takes a protocol definition, uses the AssetManager to acquire the necessary instruments, prepares the PraxisState and execution context, and then runs the protocol function, passing in the live PyLabRobot objects as arguments.

This separation of concerns—where protocol logic is distinct from live hardware state and asset management—is what allows the system to be robust and scalable.

I wanted to share these design concepts as a concrete example to get a better sense of what I have been working towards for this community. For a more complete overview, please consult the README found in the Github repository linked at the top of this post. This is still very much early development stages, and any feedback from the community would be highly appreciated!