# Specification

## Background

### Motivation

There is a crisis of productivity and reproducibility in the life sciences today. Projects that should take weeks end up taking months and the vast majority of published literature struggles to be replicated by independent labs later.

Experimental protocols written in natural language are often ambiguous. For example, the phrases "spin down briefly" and "mix gently" are frequently used in many common protocols and convey much less information than is necessary for operators to reproduce each others' work.

### Design Goals

Flexible
Autoprotocol allows for a plethora of possible protocols built from a small set of instructions. No biological knowledge is included in the specification. Adding new instructions is straightforward.
Composable
High levels of complexity are enabled by building up from smaller pieces. It should be possible to start from simple, rock-solid modules and compose them into cutting edge science.
Synthesizable
Autoprotocol is mappable directly to hardware commands for robotic automation. Human interpretation must not be necessary.
Platform Independent
Autoprotocol should be able to be generated and consumed by software written in any language on any platform.
Just Data
Encoded protocols are a linear series of instructions to execute and contain no branching logic or looping constructs evaluatable at runtime.
Learnable
A central design goal of Autoprotocol is the ability for users to extrapolate about how functionality they haven't yet used might work based on the parts they already know and frequently guess correctly.

### Conventions In This Document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119.

#### Syntax

In the code excerpts and examples here, there are a few conventions to know. An unquoted string as a value is a type (for example, the volume and duration type designations in the "volume" and "duration" fields at right). A quoted string is a literal, as in the "op" value at right. Square brackets denote an array, as in the array of objects.

#### Dimensioned values ("measures")

Both the volume and duration are measures, which are strings of the format "value:unit". Duration strings might be 50:second, 12:minute, 50:millisecond, and so on. Similarly, volume strings might be 25:microliter or 5:milliliter. Measures may contain decimals, as in 25.2:microliter. Dimensions are always written singular.

#### Refs and Datarefs

A ref is an alphanumeric string; a string that contains only letters and numbers and no special characters. Refs are simply easy identifier strings to use to refer to a container defined in an access instruction. Similarly, datarefs are alphanumeric strings used to later identify any data generated by the given instruction. Refs and datarefs must be unique within each protocol.

#### Containers and Wells

Per the Protocol section, containers are referenced using their ref string. Wells are referenced using a slash syntax :ref/:index, like my_plate/A1.

#### Serialization

Autoprotocol protocols are serialized using Javascript Object Notation (JSON). This choice is not intrinsicially semantic, but it is mandatory for consistency and compatibility. Alternative serializations such as XML, Protocol Buffers or custom formats shall not be used.

Duration
millisecond, second, minute, hour
Volume
nanoliter, microliter, milliliter
Speed
rpm
Length
nanometer
Temperature
celsius
Matter
nanomole, micromole

## Protocols

### Structure

A protocol is defined by two segments:

refs
the set of containers that will be used in the protocol
instructions
the list of instructions to be performed

A ref is a short alphanumeric name given to a container to identify it in later instructions. Every container referenced in a protocol must also be given a destiny: either discarded at the end of the protocol, or stored.

A protocol shall not contain any segments not defined here as mandatory.

Once you have references to all the objects you want to work with, you can use them in other instructions by referring to the container itself by its ref or to aliquots within the container with the syntax :ref/:index.

In the protocol snippet at right there are three instructions performing the operations:

• Distribute 40 μl from well water/0 into each of test/A1, test/A2, and test/A3.
• Distribute 5 μl from well dye/0 into each of test/A1, test/A2, and test/A3.
• Centrifuge the plate test for 30 seconds at 2000 g.
• Take a 600 nm absorbance reading through wells test/A1, test/A2, and test/A3.

### Aliquot Paths

While a protocol is just data and does not contain logic (e.g., if/then statements), it is common to use a program that does contain logic to dynamically generate a protocol. For example, the layout of wells on a variable number of plates may change depending on the number of samples being operated on, though the series of operations for each sample is the same (it is "scale invariant"). On the surface, this can make it appear complex to compare protocols over time or across different conditions.

The concept of aliquot paths captures the common scientifically-relevant structure across generated protocols that differ in their overall content due to scale. Two protocols are homomorphic if for every ref in one protocol there is one or more similar ref(s) in the second protocol with the same path. Protocol homomorphism is directional: if one protocol contains additional refs not seen in the other whose paths are independent from the paths of the isomorphic refs (the refs do not interact and constitute completely separate "subroutines" within the protocol), the protocols may still be said to be homomorphic in the context of the refs with common paths.

Put more simply, if there are two protocols that perform the same set of conceptual operations on a different number of samples, adding additional operations and samples that have nothing to do with the existing samples doesn't break the idea that the protocols are "similar, just scaled" for the original samples.

Aliquot paths are important because they allow us to compare logical blocks of operations irrespective of how they're physically configured.

## Instructions

Instructions are the unit operations of a protocol.

### Container Access

#### The refs stanza

The refs section in a protocol binds containers to short, descriptive strings called refs. An existing container can be referenced by specifying its id. New containers can be instantiated by specifying the container type in the new field.

In the example to the right, we bind the container with ID ct13aba8geam to the ref "cells", a new 96-well PCR plate to the ref "pcr", and the container with ID ct1a72ae74ja to the ref "primer".

#### Container destinies

Once your run has been executed, the containers you used in the run have to go somewhere while you analyze the data and make a decision about what to do next. Every referenced container must have either a discard or store key, specifying what to do with that container when the run is complete.

In the example, the cells container will be stored at −20 °C, the pcr plate will be stored at ambient conditions, and the primer container will be discarded.

### One-Channel Liquid Handling

Liquid handling forms the backbone of any biological protocol. Liquid handling may be done by any suitable platform (syringe pump, manual, acoustic, etc) with the necessary accuracy and precision characteristics for the specified operation.

#### Pipette groups

A pipette instruction is constructed as a list of groups, executed in order, where each group is a transfer, distribute, consolidate or mix group. One disposable tip is used for each group, which can have implications for contamination (discussed below).

There are four different types of pipette group:

transfer
For each element in the transfer list, in order, aspirates the specifed volume from the source well and dispenses the same volume into the target well.
distribute
Aspirates sufficient volume from the source well, then dispenses into each target well the volume requested, in the order specified.

If the total volume to be dispensed exceeds the maximum tip volume (1000 µL), you must either specify allow_carryover to allow the pipette to return to the source and aspirate another load, or break your group up into multiple distributes each of less than the maximum tip volume. Specifying allow_carryover means that the source well could become contaminated with material from the target wells, so take care to use it only when you're sure that contamination won't be an issue—for example, if the target plate is empty.

consolidate
Aspirates from each source well, in order, the volume specified, then dispenses the sum volume into the target well. This is known as pipeline pipetting, and can sometimes be used to obtain greater accuracy when pipetting small volumes into empty wells, if a larger volume of diluent is aspirated first. Be aware that the same tip will be used to aspirate from all the source wells, so if you want to avoid contaminating any of them you should use a separate transfer group.

Like the distribute group, consolidate is limited by the maximum volume of the disposable tip. If the total volume you want to dispense into the target well exceeds the volume that will fit in one tip, you must either specify allow_carryover to allow the tip to carry on pipetting from the source wells after it has touched the target well, or break up your operation into multiple groups with separate tips.

mix
Mixes the specified wells, in order, by repeated aspiration and dispensing of the specified volume. The default mixing speed is 50 µL/second, but you may specify a slower or faster speed.

Well positions are given in the transfers field using the format :ref/:index.

#### Pipetting speeds

Aspirate and dispense speeds are normally determined automatically, but can also be specified explicitly. In a transfer group, each transfer can be given an explicit aspirate_speed and dispense_speed parameter. For a distribute group, the aspirate speed is given once for the source well and once for each destination well (similarly for a consolidate operation).

#### Pre- and post-mixing

distribute, consolidate and transfer pipette groups optionally accept a mix_before or mix_after parameter (or in the case of transfer, both). Specifying a pre-mix will mix the source aliquot before aspirating, and is only valid on distribute and transfer groups. Specifying a post-mix will mix the destination aliquot after dispensing into it, and is only valid on consolidate and transfer groups.

The mix_before and mix_after parameters have the same format as a single mix group, but without the well parameter.

where group is one of:

### Multichannel Liquid Handling

A stamp instruction consists of a list of groups of transfers, each of which specifies from and to well references (ref/well_index) representing the top-left well or origin of a specified shape. Currently, the shape field may only be a rectangle object defined by rows and columns attributes representing the number of contiguous tip rows and columns to transfer. The shape parameter is optional and will default to a full 8 rows by 12 columns. The tip_layout field refers to the SBS compliant layout of tips, is optional, and will default to the layout of a 96 tip box. The volume field defines the volume of liquid that will be aspirated from every well of the shape specified starting at the from field and dispensed into the corresponding wells starting at the to field. Similar to to the pipette instruction, each object within the same transfer list indicates that the same set of tips is used.

Vendor restrictions may apply to shape and tip_layout fields. Stamp instructions with multiple transfers or protocols with multiple stamp instructions within them may have different, vendor-determined behavior.

### Acoustic Liquid Handling

Acoustic liquid handling specifies droplet_size, where the volume field of each transfer must be an integer multiple of droplet_size. Most instruments only capable of transferring a single droplet_size and this default should be set according to the vendor's available instrumentation accordingly.

### Magnetic Separation

Separation of liquid samples using magnetic particles involves the binding of samples to magnetic particles, followed by cycles of immobilization and washing of those particles, and finally elution of samples from the particles. The magnetic_transfer instruction specifies a method for processing magnetic particles that transfers the particles from solution to solution.

The five sub-operations described below are used specify a magnetic_transfer instruction.

collect
Used to collect beads from an object. Protection tips are magnetized and raised and lowered repeatedly pausing at the bottom_position for pause_duration time. The lowering, pausing, and raising is performed a number of cycles times. During this instruction, the object can be optionally heated to temperature.
mix
Used to oscillate the tips vertically in an object. The oscillation will last for duration and move with a frequency and amplitude around center. During this operation, the object can be optionally heated to temperature and the tips can be magnetize'd.
release
Used to release beads into an object. Non-magnetized tips are oscillated vertically. The oscillation will last for duration and move with a frequency and amplitude around center. During this instruction, the object can be optionally heated to temperature.
dry
Used to dry beads with tips above and outside an object for a set duration.
incubate
Used to incubate an object for a set duration with the tips moved to tip_position. During this instruction, the object can be optionally heated to temperature and the tips can be magnetize 'd.

#### Structure and Execution

The magnetic_transfer instruction takes a list of lists of dicts. Each dict is a sub-operation. Each list of dicts represents a collection of sub-operations which will be performed in order using the same tip. The list of lists (groups) represents the collection of sub-operations, performed in order, with tip changes between each group.

The bottom_position, tip_position, center, and amplitude parameters are expressed as floats of relative well height. For example, setting one of these parameters to 0 represents positioning from the well bottom or height of 0, 1 represents positioning at the well top or the height of a well, and 2 represents positioning at twice the well height from the well bottom (or one well height above the well top) or twice the height of a well.

### Reagent Dispensing

Reagent dispensers rapidly and accurately dispense between 0.5 and 2500 microliters of a reagent to all wells of specified columns of a plate using a multi-channel dispensing cassette. The columns parameter maps the column number (indexed from 0) to the volume to be dispensed to that column. Columns not specified will recieve no liquid. Reagent dispensing is often used for abundant and commonly used reagents such as water or LB broth.

### Covers and Seals

Containers must be sealed or covered before storage or performing various operation such as thermocycling, incubating, or centrifuging. The seal instruction requires a type of seal to be specified, while a cover instruction must specify a lid type

For the most part, if a container has a seal or a cover, it must be unsealed or uncovered before performing any action on the aliquots within. However, some instructions may be run on covered or sealed containers.

### Sanger Sequencing

The sanger_sequence instruction details wells of a container to sequence using the Sanger chemistry. Wells specified should already contain the appropriate mix for sequencing as required by the vendor. If the type RCA is specified, the primer field is mandatory and should refer to a location containing sufficient primer for all sequencing reactions specified in the instruction. RCA also requires that the wells specified contain at least 15 microliters of bacteria suspended in media.

### Centrifugation

The centrifugation instruction is exactly what it sounds like. The units of acceleration available are g (multiple of gravitational acceleration g at Earth's surface) or meter/second^2.

### Thermocycling

Thermocycling is typically associated with PCR, but more generally it simply refers to the general process of taking a sample though a series of temperature steps.

Ordinary thermocyclers have an option to hold a final temperature, often around 4 degrees, "forever". Since protocols should automatically progress as soon as they are ready, a "forever" hold is unnecessary, and thus not available. However it usually is a good idea to include a final step between 4 and 10°C to cool down your sample after the final extension.

A step can either be at a constant temperature or with a gradient applied row-wise down the plate. Constant temperatures can range from 0°C - 100°C, and gradient temperatures can range from 30°C - 100°C. Additionally, the difference between the top and bottom temperatures in a gradient step must range from 1°C - 24°C, with the top parameter being the greater temperature. The resolution of all temperatures and durations are 0.1°C and 1 second, respectively.

The dyes parameter is optional. For steps without optical measurement, just omit the dyes parameter or include an empty array. If no step specifies dyes to use, the reaction may be run on a traditional (non-qPCR) thermocycler. If one or more steps includes a non-empty dyes parameter, the reaction will be run on a thermocycler with qPCR capabilities. The dyes object is a map from a selected fluorophore to the list of wells that contain it. Any of the steps of a thermocycle group may enable an optional boolean read parameter to signal a qPCR reading after its corresponding step. The default value for read is false.

An optional melting object may be used to specify a quantitative melting curve step at the end of the protocol. It will take its dye configuration from the top-level object. Its temperature increment parameter may take values from 0.1°C - 9.9°C. Note that a melt has an implicit 30 second hold at its beginning.

The volume parameter denotes the sample volume, which is used to estimate the sample temperature. This helps to maintain consistency between the intended and actual behavior of thermocycle instructions. The value of volume must be between 0uL - 50uL for 96 well plates and 0uL - 30uL for 384 well plates, and it will be rounded to the nearest microliter.

Plates must be sealed using the seal instruction before any thermocycler instructions can be executed. Submitting a protocol that thermocycles an unsealed plate will yield an error message.

The dataref parameter is optional and only necessary if one or more of the specified groups contains a non-empty dyes parameter.

### Incubation

The incubate instruction stores a sample in an incubator with the appropriate settings for a given duration. This instruction is logically related to the store instruction for other storage locations (ambient, refrigerators, freezers), but is broken out on its own because of how much more configuration there is. While there's only one way to store a sample at -80°C, for incubation the temperature, percentage of CO2and whether to orbitally shake the sample is configurable.

### Measure Property

To confirm or determine the properties of a sample, such as mass or volume a set of measure instructions can be used, where the operation type determines the type of analysis.

#### Measure Mass

The measure_mass instruction can be used to determine the mass of a sample (container). The execution is vendor specific and may or may not consume a fraction of the sample. The accuracy of results is vendor specific

#### Measure Volume

The measure_volume instruction can be used to determine the volume of a sample. The execution is vendor specific and may or may not consume a fraction of the sample. The accuracy of results is vendor specific

#### Measure Concentration

The measure_concentration instruction can be used to determine the concentration of a sample. A vendor specific minimum amount of the sample will be consumed. The volume parameter within the instruction indicates the amount of sample that will be used for quantification. The accuracy of results is vendor specific.

### Spectrophotometry

Spectrophotometry instructions can optionally contain an incubate_before parameter which allows for incubation for a duration with the option of shaking prior to plate measurements, as well as setting temperature during the entire instruction recording.

#### Absorbance

Absorbance readings can generally be measured for wavelengths between ~200 nm and 1000 nm depending on the plate reader type.

The wells parameter is specified using short-form syntax, since a container object is already given. Each member of the wells array is simply the index, such as "A4" or "B7".

#### Fluorescence

Fluorescence excitation wavelengths can generally be measured between ~200 nm and 1000 nm, and emission can be detected between 250 nm and 900 nm (plate readers vary).

The wells parameter is specified using short-form syntax, since a container object is already given. Each member of the wells array is simply the index, such as "A4" or "B7".

#### Luminescence

The luminescence mode has no wavelength parameter and measures all photon emissions in the detectable wavelength range, generally 380 nm - 600 nm.

The wells parameter is specified using short-form syntax, since a container object is already given. Each member of the wells array is simply the index, such as "A4" or "B7".

### Gel Electrophoresis

The gel_separate instruction specifies wells for which to perform gel electrophoresis. Matrix (gel type) and ladder are specified in their respective fields, as well as a dataref or name for the resulting band size results. The volume parameter specifies the volume of liquid to be pipetted into each well of the matrix (gel).

The spread instruction details a volume of bacteria to be spread from the well in the from field to the well in the to field, which should contain agar. The instruction is intended to be used with a 1- or 6-well SBS microplate pre-filled with agar.

### Colony Picking

The autopick instruction tells an automated colony picker to pick at least min_colony_count colonies from the well specified in from to the well(s) specified in to. The wells in to are filled in the order specified until no more colonies are available or all of the wells specified are filled. If fewer than min_colony_count colonies are detected in the from well, the instruction fails. Optionally, a vendor can specify a dictionary of criteria for picking colonies according to the specification of the equipment they are using, for example {"color": "white"}.

### Flow Cytometry

The flow_analyze instruction allows the specification of parameters used by a flow cytometry device. Parameters such as the excitation and emission spectra available and other instruction requirements such as captured_events are specified by the vendor.

### Flash Freeze

The flash_freeze instruction specifies a container and a duration to submerge that container in liquid nitrogen in order to flash freeze its contents.

### Oligosynthesize

The oligosynthesize Autoprotocol instruction specifies a list of oligonucleotides to synthesize, the final destination of the synthesized oligo, and the scale and purification method of synthesis. Vendors determine what constitutes a valid sequence based on their method of synthesis. Typically, at least the IUPAC codes for basepairs are accepted: http://www.bioinformatics.org/sms/iupac.html