Reverse Engineering Tips and Example Case

Reverse Engineering the ClarioStar Plate Reader: A Guide and Case Study

Hey y’all,

​I’ve been working on reverse engineering the ClarioStar plate reader, and I’ve seen posts here asking for guidance on similar undertakings involving figuring out how to send serial commands to automatable instruments. I want to share some general principles I’ve found incredibly useful, along with a salient example from a recent breakthrough I accomplished with @Eric, where we pinpointed the exact function and encoding approach of a single byte in the serial stream.

​This particular reverse engineering case has both advantages and disadvantages. Conveniently, the instrument’s proprietary software logs all serial commands sent and received. These logs are easily accessible and can even be “cleaned up” with their built-in log viewer. This significantly simplifies the process by bypassing the need for serial sniffing and manual parsing. However, the parameter space for controlling the plate reader is incredibly vast, with dozens of parameters encoding physical dimensions, categorical actions, numeric settings, and more.

​I started this project last December, and it was on hold for about six months while I moved labs and shifted priorities. After many hours of work, I’ve successfully decoded how various aspects like plate dimensions, wells to read, reading type, shaker actuation, well scan patterns, and reading modes are encoded and learned a lot about reverse engineering sending commands in the process.

​My Reverse Engineering Principles

​Here’s the approach that has worked for me:

​1. Define Your Bare Minimum

​Before you dive in, consider what features are absolutely essential for your needs. Do you need a complete wholesale reverse engineering, or would basic patching of a single command suffice for your specific goals? Can you get away with just a dictionary of pre-defined commands? Relatedly, are you developing this for the community, or do you really need it just for your specific use case?

Answering this can help you structure your approach and gives you good backups to fall back to if a complete or more thorough reverse engineering isn’t feasible on the timescale you need it.

​My objective was to enable use of multiple different protocols used in my lab. These covered a large enough swath of the parameter space to justify a full reverse engineering effort. I kept the full commands of protocols saved so we could basically patch it in if needed, but work using the plate reader is only now getting back into swing.

​2. Isolate Encodings (Iterative Process)

  • 2.1 Isolate Encodings: This is often the most monotonous, frustrating, and labor-intensive step (but also potentially automatable). You need to gather enough measurements across various parameters and meticulously record the details of each command, just to get some initial patterns you can hone in on.
  • 2.2 One Thing at a Time: Once you have some leads, focus on figuring out exactly how things are encoded by changing only one variable at a time. This is usually the best starting point. However, with the ClarioStar’s extensive degrees of freedom, this approach would have taken far too long for the entire plate reader. It was also sadly impossible for the plate dimensions and well encodings as multiple values changed at once, but thankfully the techniques described below still led to me figuring out how that information is communicated.
    • Techniques for Decoding:
      • XOR/Difference Analysis: For “one-difference” commands (where only a single parameter changes), using XOR or direct difference analysis between command pairs can quickly highlight the relevant bytes or bits.
      • Direct Numeric Encoding: For numeric parameters, look for direct numeric encoding. Conversely, if a parameter has a small, fixed number of options, it’s likely encoded categorically, maybe up to the number of options. For categorical, I saw both an integer based (0-5 for 6 categories of shaker patterns) and also some that started at values such as 120-124, so it may not be consistent for your machine.
      • Correlation Analysis: Analyze correlations at the bit, bit pair, nibble, byte, and byte pair levels. This is overkill, but if you set up the output right, it should be easy to filter. This is particularly useful for identifying how larger values or multiple discrete options are packed into a command and can quickly give a head start. Be wary of spurious or confounded correlations, especially with the integer value interpretation.
    • Binary Search Approach for Numeric Parameters: For numeric parameters, compare the minimum and maximum values (e.g., using XOR) and analyze the bits, bit pairs, nibbles(4 bit segments), and bytes. The range of values can guide you: a variable spanning 1 to 1000 will require at least 10 bits (2^{10} = 1024 values). Often, a full 2 byte 16-bit number (up to 65,535 values) would be used… but not always! If still unclear, try a value in the middle and continue the binary search in this way until you’ve nailed it down. You should also develop and test predictions as you go to validate your encodings. Not only does this rigorously assess your encoding approach, but Given how frustrating and time consuming this process can be, the psychological buttressing effect and satisfaction it provides is actually quite important on my experience.

For my large-scale reverse engineering, this involved generating extensive reports on correlations at different bit levels, and then parsing these to only examine the strongest and clearest associations.

​3. Convert to an API and Contribute!

​Once you’ve deciphered the commands, the next step is to wrap them into an accessible API, making them usable. If you’re contributing to an open-source project like PyLabRobot, here’s how to proceed:

  • Post here on the forum: Share your findings and progress with the community.
  • Message one of the PyLabRobot developers: Reach out directly to discuss your contributions.
  • Refer to the documentation on developing new machines: Follow the established guidelines for integrating new instrument drivers.
  • Make a pull request: Submit your code changes for review.
  • Incorporate feedback: Be open to suggestions and refine your code based on community and maintainer input.

​Case Study: The Command Type Byte

​Before all messages sent to or from the plate reader, there are four message start bytes before the main body of any communication. As hexidecimal, The first two are consistently 0x02 and 0x00 (00000010 00000000), and the last is always 0x0C (00001100). The third byte, however, varies. It indicates the command type: e.g., for fetching EEPROM values it would be 0x0F, for querying for status it’s 0x0A, for sending status it’s 0x0E, for stop commands 0x0D etc. Most of these are discrete categories.

​We had initially assumed all “protocol run” commands were flagged with 0x86. But yesterday, while trying to run some commands, we encountered issues. We reverted to the baseline software and tested different protocols. To our surprise, the third byte differed with each measurement type!

​To our chagrin, it was also unclear how this information was encoded. We basically performed the “isolate” step of our pipeline, changing a bunch of parameters and observing the third byte. We managed to narrow down the relevant parameters to three aspects: reading type, whether well scan is on, and the number of wavelengths/chromatics. This was achieved through hours of trial and error.

​With these parameters isolated, we systematically perturbed individual variables using the method described above. Ultimately, after about 12 measurements, we determined the encoding:

  • Fluorescence: Flagged by 112 + (num_chromatics - 1) * 20
  • Absorbance: Flagged by 120 + (num_wavelengths - 1) * 2
  • Luminescence: Encoded by a base nibble of 0x2 where the first four bits (nibble 1) encode 8 + num_chromatics - 1.

​Interestingly, enabling well scan adds 5 to the integer value. Why the engineers chose this flagging method, we have no idea! These values do overlap, and the Luminescence, Fluorescence, and Absorbance flags, well scan, and wavelength/chromatics are included and likely parsed from the main run command body. As far as we know, the command won’t run properly if these aren’t flagged correctly in this message start (though we haven’t exhaustively tested this new aspect yet, so it’s always possible something else was causing the original issue).

​We validated our predictions by changing other parameters and ensure this byte remained unaffected.

​Future Considerations

​i suspect commands for pre-plate reading steps like pump priming and gain adjustment also have their own unique command bytes that will need similar investigation. I also dream of agentic models being able to tackle this kind of work, though I have my doubts about their current capabilities for nuanced reverse engineering, particularly concerning their tendency to need babysitting, drift off task, or confidently assert incorrect information.

All that said, ​I hope this helps others starting this kind of process. Comments, questions, and additional advice are welcome and encouraged. And I wish your intuitions about what bytes encode what features be correct!

8 Likes

amazing post!!

:100:

my biggest hesitation is the tokenizers actually use multiple characters per token, meaning LLMs don’t read letter by letter or byte by byte, but instead groups of bytes. (see this interactive tool to build intuition for that: https://tiktokenizer.vercel.app). this makes text generation much more efficient, but comes at the cost of models not being able to find patterns on a smaller scale than words/sentences. Since they can’t pass famous “how many r’s in strawberry” test, how would you expect it to find complicated patterns? Hopefully in the future we will have byte level tokenizers, they will be very powerful for reverse engineering.

kind of hidden in the amount of tips / background on reverse engineering is this part:

fantastic!

when will you make the PR? :slight_smile:

Hopefully soon! I’m working with @ericguan04 right now on this and another summer student next month. We have to work out some more of the API logic and need to find a validated safe placeholder for the unknown bytes before it will make sense to PR. Since @CamilloMoschner is looking to integrate the ClarioStar Plus, we will collaborate which will hopefully help us get to full coverage of the plate reader operations for both of our machines.

2 Likes

how about the bytes that are currently in PLR? those are the bytes we used in the esvelt lab method for prance. should be a sensible starting point? we know they work.

i will say it’s not required for a machine to be 100% fully understood and integrated before we merge it. as long as the intermediary is useful and not obviously unsafe

1 Like

THANK YOU SO MUCHH!!! was just asking about thiss :)) can really agree and relate while working on Tecan plate reader Infinite series months ago (pausing for now)

I also feel like it is really great if we can have an agent or maybe a software that can tackle this faster, maybe not end-to-end but helps on gathering and translating the send packets (sent by the instrument).

1 Like

do you have a program for it :eyes: :smiley:

export wireshark pcap to text, then use vscode multiline editor, works like a charm

1 Like

was mainly working for absorbance only (since our lab mostly use absorbance), and got the functions to command the instrument works (like init, plate in/out, read) but still struggling to perfectly intepret the packet sent by the instrument compared to what shown on their software (still off by 0.01 in my test plate but im afraid this may amplify if not really understood).

will steal time later (unfortunately not this or next month :melting_face:) and thinking of dumping the data I can gather and ask for help here?

2 Likes

that’s fantastic!

please do share it! that’s how every backend in PLR starts. someone implements the specific functionality they need. since it works for them, it is a useful thing in the library. it might not work for others, so they use the existing backend as a starting point (really drastically speeds up development time compared to starting from scratch), and they extend it with what they need. eventually it becomes something useful out of the box for many people.

3 Likes

Interesting timing here, I have been working on the same thing. I have fl endpoint, abs working with pretty much all of the knobs that BMG exposes in the gui working (no pumps though, I don’t have that instrument configuration).

I will be putting it in a public repo later today/tomorrow

edit: done, GitHub - hoxbio/bmg-clariostar: Go API for the BMG Clariostar over USB serial

4 Likes

This is great! You should make a PR to plr

Awesome work! I’m working with another summer student on getting the fluorescence commands worked out (which it looks like you did here, at least for the monochromatic case), some of the other settings like well scanning approach, and reverse engineering some of the aspects for kinetics. I’ll DM you so we can work together or share notes

3 Likes