Tutorial

Load some example data using ititer and look at the first 10 rows:

import ititer as it
df = it.load_example_data()
df.head(10)

Sample

OD

Dilution

21-P0004-v001sr01

1.371

40

21-P0004-v001sr01

0.981

160

21-P0004-v001sr01

0.535

640

21-P0004-v001sr01

0.182

2560

21-P0004-v001sr01

0.064

10240

21-P0004-v001sr01

0.027

40960

21-P0004-v001sr01

0.015

163840

21-P0004-v001sr01

0.010

655360

21-P0034-v001sr01

0.948

40

21-P0034-v001sr01

0.452

160

Each row contains information about a single OD measurement; the sample, OD value and the dilution factor. The measurements for the first sample, 21-P0004-v001sr01 are in the first 8 rows, followed by those for 21-P0034-v001sr01.

This is known as long format data.

Wide format data

If your data is in wide format, for instance perhaps each row contains OD values at different dilutions for a single sample , then use pandas.DataFrame.melt() to generate a long format DataFrame.

Example wide format data:

Sample

40

160

640

2560

10240

40960

163840

655360

21-P0004-v001sr01

1.371

0.981

0.535

0.182

0.064

0.027

0.015

0.01

21-P0034-v001sr01

0.948

0.452

0.185

0.043

0.016

0.004

0.002

-0.001

21-P0050-v001sr01

1.418

1.253

0.972

0.393

0.152

0.049

0.018

0.011

For a DataFrame exactly like this the call to melt would be:

df.reset_index().melt(id_vars="Sample", var_name="Dilution", value_name="OD")

Log transform dilutions

Next, we need to transform the dilutions (which increase logarithmically) to values that increase linearly. This will mean that when the data are plotted, the values will be evenly spaced on the x-axis.

There is a helper function for doing this called titer_to_index(). The titer argument is the dilutions, start is the first dilution in the dilution series, and fold sets the fold change in concentration at each step in the dilution series. We can call the function and store the returned value as a new column in our DataFrame:

df["Log Dilution"] = it.titer_to_index(titer=df["Dilution"], start=40, fold=4)

Fit curves

We are now ready to fit sigmoid curves for each sample.

We want partial pooling between samples for inference of the a parameter to give us one estimate of a for each sample. We know a priori that the response will be 0 at a theoretical infinite dilution, so we can set c=0. We will use full pooling for b and d because we expect the gradient of the sigmoid curve (b) and its height above baseline (d) to be the same across samples.

We make a new Sigmoid object that has these properties:

sigmoid = it.Sigmoid(a="partial", b="full", c=0, d="full")

Now we call the fit() method to infer the posterior distributions of the model parameters, and supply the data from our long format DataFrame:

sigmoid = sigmoid.fit(
    log_dilution=df["Log Dilution"], response=df["OD"], sample_labels=df["Sample"]
)

Various messages will print displaying information about the sampling of the posterior distribution.

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, d, b, a, sigma_a, mu_a]
Sampling 4 chains for 1_000 tune and 10_000 draw iterations (4_000 + 40_000 draws total) took 25 seconds.]

Visualize curves

It is generally a good idea to visualize the model fits. To inspect an individual sample of interest use the plot_sample() method, and pass it the sample name you want to see. By default this method shows a selection of sigmoid curves from the posterior distribution. Above, fit() took 10,000 samples from the posterior distribution. Here, step=1000 means that every 1,000th sample will be shown, resulting in 10,000 / 1000 = 10 lines in total.

sigmoid.plot_sample("21-P0004-v001sr01", step=1000)
_images/1-sample.png

Looking at samples from the posterior distribution tells you how confident the model is in the model fit. Sparser data or data that aren’t well arranged in a sigmoid curve will yield more dispersed lines.

You can take the mean value of each parameter from the posterior distribution and plot the resulting sigmoid curve by passing mean=True:

sigmoid.plot_sample("21-P0004-v001sr01", mean=True)
_images/1-sample-mean.png

To visualize multiple samples at once, pass a list of sample names to plot_samples():

sigmoid.plot_samples(["21-P0833-v001sr01", "21-P0834-v001sr01"])
_images/2-samples.png

Or, to show all samples use plot_all_samples():

sigmoid.plot_all_samples()
_images/all-samples.png

See the matplotlib documentation for help on customizing and saving figures.

Inflection titers

The degree to which a sigmoid curve is shifted left or right on the x-axis is often the point of interest to compare between samples. This is described by the inflection point of the curve, calculated by inflections():

df_inflections = sigmoid.inflections(hdi_prob=0.95)
df_inflections.head().round(2)

sample

mean

median

hdi low

hdi high

21-P0425-v001sr01

0.91

0.91

0.78

1.04

21-P0917-v001sr01

1.82

1.82

1.7

1.96

21-P0796-v001sr01

2.51

2.51

2.39

2.64

21-P0680-v001sr01

2.04

2.04

1.91

2.17

21-P0800-v001sr01

4.47

4.47

4.35

4.6

hdi low and hdi high refer to the low and high boundary of the Highest Density Interval (HDI). An HDI is the narrowest set of parameter values that contain a certain mass of the posterior probability density - it is a type of confidence interval for a parameter. Here, we specified an HDI probability of 0.95 (which is also the default value for the inflections() method). Note, there is nothing particularly special about a value of 0.95

Values in this DataFrame are on the log dilution scale; i.e. they tell you the position in the dilution series of the inflection point. To get values on the dilution scale use index_to_titer():

df_inflection_titers = it.index_to_titer(df_inflections, start=40, fold=4)
df_inflection_titers.head().round(2)

sample

mean

median

hdi low

hdi high

21-P0425-v001sr01

141.43

141.58

117.89

169.98

21-P0917-v001sr01

501.36

501.69

422.53

601.65

21-P0796-v001sr01

1294.1

1294.03

1102.35

1544.14

21-P0680-v001sr01

676.47

676.82

563.92

807.78

21-P0800-v001sr01

19699.43

19744.58

16530.67

23644.44

Endpoint titers

Endpoint titers can also be computed. An endpoint titer is the dilution at which the response drops below a certain value, known as the cut-off. Choice of cut-off is somewhat arbitrary, but is usually some low absolute value, or a low proportion of the maximal response. Use endpoints() to compute endpoints:

df_endpoints = sigmoid.endpoints(cutoff_proportion=0.1, hdi_prob=0.95)
df_endpoints.head().round(2)

sample

mean

median

hdi low

hdi high

21-P0425-v001sr01

2.94

2.94

2.80

3.08

21-P0917-v001sr01

3.85

3.85

3.72

3.99

21-P0796-v001sr01

4.54

4.54

4.41

4.66

21-P0680-v001sr01

4.07

4.07

3.93

4.20

21-P0800-v001sr01

6.50

6.50

6.37

6.63

Like inflection points, the values in this DataFrame are on the log dilution scale. Use index_to_titer() to put them on the dilution scale.