Tutorial¶

Load some example data using ititer and look at the first 10 rows:

import ititer as it
df = it.load_example_data()
df.head(10)

Sample	OD	Dilution
21-P0004-v001sr01	1.371	40
21-P0004-v001sr01	0.981	160
21-P0004-v001sr01	0.535	640
21-P0004-v001sr01	0.182	2560
21-P0004-v001sr01	0.064	10240
21-P0004-v001sr01	0.027	40960
21-P0004-v001sr01	0.015	163840
21-P0004-v001sr01	0.010	655360
21-P0034-v001sr01	0.948	40
21-P0034-v001sr01	0.452	160

Each row contains information about a single OD measurement; the sample, OD value and the dilution factor. The measurements for the first sample, 21-P0004-v001sr01 are in the first 8 rows, followed by those for 21-P0034-v001sr01.

This is known as long format data.

Wide format data¶

If your data is in wide format, for instance perhaps each row contains OD values at different dilutions for a single sample , then use pandas.DataFrame.melt() to generate a long format DataFrame.

Example wide format data:

Sample	40	160	640	2560	10240	40960	163840	655360
21-P0004-v001sr01	1.371	0.981	0.535	0.182	0.064	0.027	0.015	0.01
21-P0034-v001sr01	0.948	0.452	0.185	0.043	0.016	0.004	0.002	-0.001
21-P0050-v001sr01	1.418	1.253	0.972	0.393	0.152	0.049	0.018	0.011

For a DataFrame exactly like this the call to melt would be:

df.reset_index().melt(id_vars="Sample", var_name="Dilution", value_name="OD")

Log transform dilutions¶

Next, we need to transform the dilutions (which increase logarithmically) to values that increase linearly. This will mean that when the data are plotted, the values will be evenly spaced on the x-axis.

There is a helper function for doing this called titer_to_index(). The titer argument is the dilutions, start is the first dilution in the dilution series, and fold sets the fold change in concentration at each step in the dilution series. We can call the function and store the returned value as a new column in our DataFrame:

df["Log Dilution"] = it.titer_to_index(titer=df["Dilution"], start=40, fold=4)

Fit curves¶

We are now ready to fit sigmoid curves for each sample.

We want partial pooling between samples for inference of the a parameter to give us one estimate of a for each sample. We know a priori that the response will be 0 at a theoretical infinite dilution, so we can set c=0. We will use full pooling for b and d because we expect the gradient of the sigmoid curve (b) and its height above baseline (d) to be the same across samples.

We make a new Sigmoid object that has these properties:

sigmoid = it.Sigmoid(a="partial", b="full", c=0, d="full")

Now we call the fit() method to infer the posterior distributions of the model parameters, and supply the data from our long format DataFrame:

sigmoid = sigmoid.fit(
    log_dilution=df["Log Dilution"], response=df["OD"], sample_labels=df["Sample"]
)

Various messages will print displaying information about the sampling of the posterior distribution.

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, d, b, a, sigma_a, mu_a]
Sampling 4 chains for 1_000 tune and 10_000 draw iterations (4_000 + 40_000 draws total) took 25 seconds.]

Visualize curves¶

It is generally a good idea to visualize the model fits. To inspect an individual sample of interest use the plot_sample() method, and pass it the sample name you want to see. By default this method shows a selection of sigmoid curves from the posterior distribution. Above, fit() took 10,000 samples from the posterior distribution. Here, step=1000 means that every 1,000^th sample will be shown, resulting in 10,000 / 1000 = 10 lines in total.

sigmoid.plot_sample("21-P0004-v001sr01", step=1000)

Looking at samples from the posterior distribution tells you how confident the model is in the model fit. Sparser data or data that aren’t well arranged in a sigmoid curve will yield more dispersed lines.

You can take the mean value of each parameter from the posterior distribution and plot the resulting sigmoid curve by passing mean=True:

sigmoid.plot_sample("21-P0004-v001sr01", mean=True)

To visualize multiple samples at once, pass a list of sample names to plot_samples():

sigmoid.plot_samples(["21-P0833-v001sr01", "21-P0834-v001sr01"])

Or, to show all samples use plot_all_samples():

sigmoid.plot_all_samples()

See the matplotlib documentation for help on customizing and saving figures.

Inflection titers¶

The degree to which a sigmoid curve is shifted left or right on the x-axis is often the point of interest to compare between samples. This is described by the inflection point of the curve, calculated by inflections():

df_inflections = sigmoid.inflections(hdi_prob=0.95)
df_inflections.head().round(2)

sample	mean	median	hdi low	hdi high
21-P0425-v001sr01	0.91	0.91	0.78	1.04
21-P0917-v001sr01	1.82	1.82	1.7	1.96
21-P0796-v001sr01	2.51	2.51	2.39	2.64
21-P0680-v001sr01	2.04	2.04	1.91	2.17
21-P0800-v001sr01	4.47	4.47	4.35	4.6

hdi low and hdi high refer to the low and high boundary of the Highest Density Interval (HDI). An HDI is the narrowest set of parameter values that contain a certain mass of the posterior probability density - it is a type of confidence interval for a parameter. Here, we specified an HDI probability of 0.95 (which is also the default value for the inflections() method). Note, there is nothing particularly special about a value of 0.95

Values in this DataFrame are on the log dilution scale; i.e. they tell you the position in the dilution series of the inflection point. To get values on the dilution scale use index_to_titer():

df_inflection_titers = it.index_to_titer(df_inflections, start=40, fold=4)
df_inflection_titers.head().round(2)

sample	mean	median	hdi low	hdi high
21-P0425-v001sr01	141.43	141.58	117.89	169.98
21-P0917-v001sr01	501.36	501.69	422.53	601.65
21-P0796-v001sr01	1294.1	1294.03	1102.35	1544.14
21-P0680-v001sr01	676.47	676.82	563.92	807.78
21-P0800-v001sr01	19699.43	19744.58	16530.67	23644.44

Endpoint titers¶

Endpoint titers can also be computed. An endpoint titer is the dilution at which the response drops below a certain value, known as the cut-off. Choice of cut-off is somewhat arbitrary, but is usually some low absolute value, or a low proportion of the maximal response. Use endpoints() to compute endpoints:

df_endpoints = sigmoid.endpoints(cutoff_proportion=0.1, hdi_prob=0.95)
df_endpoints.head().round(2)

sample	mean	median	hdi low	hdi high
21-P0425-v001sr01	2.94	2.94	2.80	3.08
21-P0917-v001sr01	3.85	3.85	3.72	3.99
21-P0796-v001sr01	4.54	4.54	4.41	4.66
21-P0680-v001sr01	4.07	4.07	3.93	4.20
21-P0800-v001sr01	6.50	6.50	6.37	6.63

Like inflection points, the values in this DataFrame are on the log dilution scale. Use index_to_titer() to put them on the dilution scale.

ititer 0.1.4 documentation