Specification
Background
Motivation
There is a crisis of productivity and reproducibility in the life sciences today. Projects that should take weeks end up taking months and the vast majority of published literature struggles to be replicated by independent labs later.
Experimental protocols written in natural language are often ambiguous. For example, the phrases "spin down briefly" and "mix gently" are frequently used in many common protocols and convey much less information than is necessary for operators to reproduce each others' work.
Design Goals
- Flexible
- Autoprotocol allows for a plethora of possible protocols built from a small set of instructions. No biological knowledge is included in the specification. Adding new instructions is straightforward.
- Composable
- High levels of complexity are enabled by building up from smaller pieces. It should be possible to start from simple, rock-solid modules and compose them into cutting edge science.
- Synthesizable
- Autoprotocol is mappable directly to hardware commands for robotic automation. Human interpretation must not be necessary.
- Platform Independent
- Autoprotocol should be able to be generated and consumed by software written in any language on any platform.
- Just Data
- Encoded protocols are a linear series of instructions to execute and contain no branching logic or looping constructs evaluatable at runtime.
- Learnable
- A central design goal of Autoprotocol is the ability for users to extrapolate about how functionality they haven't yet used might work based on the parts they already know and frequently guess correctly.
Conventions In This Document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119.
Syntax
In the code excerpts and examples here, there are a few conventions to know. An unquoted string as a value is a type (for example, the Volume
and Time
type designations in the "volume"
and "duration"
fields at right). A quoted string is a literal, as in the "op"
value at right. Square brackets denote an array, as in the array of objects.
Dimensioned values ("quantities")
Both the volume and duration are quantities, which are strings of the format "magnitude:unit"
. Duration strings might be 50:second
, 12:minute
, 50:millisecond
, and so on. Similarly, volume strings might be 25:microliter
or 5:milliliter
. Measures may contain decimals, as in 25.2:microliter
. Dimensions are always written singular.
Refs and Datarefs
A ref is an alphanumeric string; a string that contains only letters and numbers and no special characters. Refs are simply easy identifier strings to use to refer to a container defined in an access
instruction. Similarly, datarefs are alphanumeric strings used to later identify any data generated by the given instruction. Refs and datarefs must be unique within each protocol.
Containers and Wells
Per the Protocol section, containers are referenced using their ref
string. Wells are referenced using a slash syntax :ref/:index
, like my_plate/A1
.
Serialization
Autoprotocol protocols are serialized using Javascript Object Notation (JSON). This choice is not intrinsically semantic, but it is mandatory for consistency and compatibility. Alternative serializations such as XML, Protocol Buffers or custom formats shall not be used.
Protocols
Structure
A protocol is defined by three segments:
- refs
- the set of containers that will be used in the protocol
- instructions
- the list of instructions to be performed
- constraints
- constraints on how the instructions should be performed
A ref is a short alphanumeric name given to a container to identify it in later instructions. Every container referenced in a protocol must also be given a destiny: either discarded at the end of the protocol, or stored.
A protocol shall not contain any segments not defined here as mandatory.
Once you have references to all the objects you want to work with, you can use them in other instructions by referring to the container itself by its ref or to aliquots within the container with the syntax :ref/:index
.
In the protocol snippet at right there are three instructions performing the operations:
- Distribute 40 μl from well
water/0
into each oftest/A1
,test/A2
, andtest/A3
. - Distribute 5 μl from well
dye/0
into each oftest/A1
,test/A2
, andtest/A3
. - Centrifuge the plate
test
for 30 seconds at 2000 g. - Take a 600 nm absorbance reading through wells
test/A1
,test/A2
, andtest/A3
.
Aliquot Paths
While a protocol is just data and does not contain logic (e.g., if/then statements), it is common to use a program that does contain logic to dynamically generate a protocol. For example, the layout of wells on a variable number of plates may change depending on the number of samples being operated on, though the series of operations for each sample is the same (it is "scale invariant"). On the surface, this can make it appear complex to compare protocols over time or across different conditions.
The concept of aliquot paths captures the common scientifically-relevant structure across generated protocols that differ in their overall content due to scale. Two protocols are homomorphic if for every ref in one protocol there is one or more similar ref(s) in the second protocol with the same path. Protocol homomorphism is directional: if one protocol contains additional refs not seen in the other whose paths are independent from the paths of the isomorphic refs (the refs do not interact and constitute completely separate "subroutines" within the protocol), the protocols may still be said to be homomorphic in the context of the refs with common paths.
Put more simply, if there are two protocols that perform the same set of conceptual operations on a different number of samples, adding additional operations and samples that have nothing to do with the existing samples doesn't break the idea that the protocols are "similar, just scaled" for the original samples.
Aliquot paths are important because they allow us to compare logical blocks of operations irrespective of how they're physically configured.
Definitions
Types
Instruction and ref specification use the following common types.
Primitive Types
Type | Definition |
---|---|
Boolean | true or false |
Float | a floating point numeric value |
Int | an integer numeric value |
String | any sequence of utf-encoded characters bounded with " |
Derived Types
Type | Example Value | Definition |
---|---|---|
Aliquot | "growth_plate/A1" |
an Autoprotocol container and a well index delimited with a / represented as a String |
Container | "growth_plate" |
an Autoprotocol container referenced in the refs section of the protocol represented as a String |
Quantity e.g. Volume |
"5:microliters" |
a magnitude and a unit delimited with a : represented as a String |
Compound | {"format": "InChI", "value": "InChI=1S/CH4/h1H4"} |
a chemical compound defined by the String ‘value’ written following the pattern described by the ‘format’ |
Type Wrappers
Syntax | Example Specification | Definition |
---|---|---|
Enum(..) | Enum("one, "two") |
any one of the enclosed values |
Option<Type> | Option<String> |
either be the enclosed Type or null |
Units
Instruction and ref specification use the following units to represent quantities.
Unit | Examples |
---|---|
Acceleration | meter/second^2 , millimeter/second^2 |
Amount | mole , millimole , micromole , nanomole |
AmountConcentration | mole/liter , millimole/liter , molar , millimolar |
Area | meter^2 |
Capacitance | farad , picofarad |
ElectricPotential | volt , millivolt , microvolt , nanovolt |
Frequency | hertz , kilohertz , rpm |
Length | meter , millimeter , micrometer , nanometer |
Mass | gram , milligram , microgram , nanogram |
MassConcentration | milligram/milliliter , nanogram/microliter |
Power | watt , milliwatt , microwatt |
Pressure | pascal , bar , torr |
Temperature | celsius , kelvin |
Time | day , hour , minute , second , millisecond |
Velocity | meter/second , millimeter/second |
Volume | liter , milliliter , microliter , nanoliter |
VolumeAcceleration | milliliter/second^2 , microliter/second^2 |
VolumeConcentration | milliliter/milliliter , microliter/microliter |
VolumeFlow | milliliter/second , microliter/second |
Fields
Some common fields that are shared across instructions are defined below.
{
"shake_path": Enum(
"cw_orbital",
"ccw_orbital",
"portrait_linear",
"landscape_linear",
"cw_diamond",
"ccw_diamond",
"portrait_down_double_orbital",
"landscape_down_double_orbital",
"portrait_up_double_orbital",
"landscape_up_double_orbital"
)
}
Refs
container_refs
Names: The refs
field aliases Containers to descriptive Strings called refs.
Origins: The id
field is used to specify the unique identifier of an existing Container. The new
field is used to specify that this ref does not yet exist and what type of Container it should be. These two fields are mutually exclusive.
Destinies: The discard
field indicates whether the ref should be discarded or not. The store
field indicates how a ref should be stored. These two fields are mutually exclusive.
Covers The cover
field indicates the type of cover a container is initially covered with. If no cover is specified the container is assumed to be uncovered.
Instructions
The instructions
field of a Protocol is a made up of a list of Instructions.
Instructions are encoded as an op
which is the instruction name and optionally a series of additional top-level fields to encode how it should be executed.
Following is the set of instructions currently in the Autoprotocol standard.
acoustic_transfer
Acoustic liquid handling uses acoustics to fly individual droplets from a source container to a destination one. Most acoustic liquid handlers only support a discrete set of droplet_size
and the volume
field of each transfer
must be a multiple of it. prevalidate_sources
is used to ensure that the source wells contain enough volume to successfully complete the transfer. source_volume_limits
are used to overwrite vendor-specified defaults for what volumes should pass prevalidation.
cover
Containers must be covered or sealed for storage, incubation, and centrifugation operations (among others). Many instructions including liquid handling operations require that a container be uncovered before use. retrieve_lid
indicates that a lid previously saved by a uncover
operation with store_lid
should be used.
flow_cytometry
Flow cytometry optically detects and characterizes particles suspended in a fluid. For each flow_cytometry
instruction, the channel information will be collected for each sample given the collection conditions specified. stop_criteria
are combined based on the condition specified in trigger_logic
; if left unset, the aquisition_volume will be used.
incubate
The incubate
instruction stores a sample in an incubator with the appropriate settings for a given duration.
liquid_handle
The liquid_handle
instruction acts as a framework to allow precise control over liquid handling parameters and express a broad range of liquid handling operations.
The liquid_handle operation is based around transporting volumes of liquid in and out of locations. Each operation is a locations sequence of location with transports sequences specifying the list of volumes. The position of device components may reset between elements of locations. Transports within the same locations use the same consumables (i.e. tips in the case of air_displacement liquid handling).
measure_mass
The measure_mass
instruction can be used to determine the mass of a sample (container). The execution is vendor specific and may or may not consume a fraction of the sample. The accuracy of results is vendor specific.
measure_volume
The measure_volume
instruction can be used to determine the volume of a sample. The execution is vendor specific and may or may not consume a fraction of the sample. The accuracy of results is vendor specific.
provision
The provision
instruction encodes adding some amount of an external resource to an aliquot or series of aliquots.
seal
Containers must be covered or sealed for storage, incubation, and centrifugation operations (among others). Seal type
s have useful properties ranging from optical clarity to gas permeability. Seals can be applied by either thermal
or adhesive
sealers which result in different seal integrity. thermal
seals can be applied with a range of temperatures and durations that can be optimized for different plate types. Many instructions including liquid handling operations require that a container be uncovered before use.
spectrophotometry
The spectrophotometry
instruction encodes one or a series of plate reading steps executed on a single container with the same device. This could be executed once, or at a defined interval, across some total duration. There are 4 valid mode
s (absorbance
, fluorescence
, luminescence
, and shake
) that each accept a different set of mode_params
spin
The spin
instruction is used to represent a series of centrifugation steps. The inward
and outward
flow_direction
encodes spinning the contents into or out of of a container respectively. The operation is repeated with the appropriate direction for each element in spin_directions
.
uncover
Containers must be covered or sealed for storage, incubation, and centrifugation operations (among others). Many instructions including liquid handling operations require that a container be uncovered before use. store_lid
indicates that the lid should be saved for some subsequent cover
instruction with retrieve_lid
.
unseal
Containers must be covered or sealed for storage, incubation, and centrifugation operations (among others). Many instructions including liquid handling operations require that a container be uncovered before use.
Constraints
time_constraints
Time constraints encode a temporal relationship between two time points, from
and to
.
Each of the time points must specify exactly one of their optional fields. ref_start
and ref_end
encode the points at which a Container leaves its origin and enters its destiny respectively. instruction_start
and instruction_end
encode the points at the beginning and end of an instruction’s execution; the instruction is represented by its 0-indexed position within the instructions list.
Each time constraint may include any combination of the less_than
, more_than
, and ideal
fields. less_than
and more_than
constraints encode the minimum and maximum amount of time that is allowable between the two time points. ideal
constraints encode the intended timing between two time points as well as the optimization_cost
by which these fields should be weighted.