Tutorial¶
Load some example data using ititer
and look at the first 10 rows:
import ititer as it
df = it.load_example_data()
df.head(10)
Sample |
OD |
Dilution |
---|---|---|
21-P0004-v001sr01 |
1.371 |
40 |
21-P0004-v001sr01 |
0.981 |
160 |
21-P0004-v001sr01 |
0.535 |
640 |
21-P0004-v001sr01 |
0.182 |
2560 |
21-P0004-v001sr01 |
0.064 |
10240 |
21-P0004-v001sr01 |
0.027 |
40960 |
21-P0004-v001sr01 |
0.015 |
163840 |
21-P0004-v001sr01 |
0.010 |
655360 |
21-P0034-v001sr01 |
0.948 |
40 |
21-P0034-v001sr01 |
0.452 |
160 |
Each row contains information about a single OD measurement; the sample, OD value and the dilution factor.
The measurements for the first sample, 21-P0004-v001sr01
are in the first 8 rows, followed by those for 21-P0034-v001sr01
.
This is known as long format data.
Wide format data¶
If your data is in wide format,
for instance perhaps each row contains OD values at different dilutions for a single sample ,
then use pandas.DataFrame.melt()
to generate a long format DataFrame.
Example wide format data:
Sample |
40 |
160 |
640 |
2560 |
10240 |
40960 |
163840 |
655360 |
---|---|---|---|---|---|---|---|---|
21-P0004-v001sr01 |
1.371 |
0.981 |
0.535 |
0.182 |
0.064 |
0.027 |
0.015 |
0.01 |
21-P0034-v001sr01 |
0.948 |
0.452 |
0.185 |
0.043 |
0.016 |
0.004 |
0.002 |
-0.001 |
21-P0050-v001sr01 |
1.418 |
1.253 |
0.972 |
0.393 |
0.152 |
0.049 |
0.018 |
0.011 |
For a DataFrame exactly like this the call to melt
would be:
df.reset_index().melt(id_vars="Sample", var_name="Dilution", value_name="OD")
Log transform dilutions¶
Next, we need to transform the dilutions (which increase logarithmically) to values that increase linearly. This will mean that when the data are plotted, the values will be evenly spaced on the x-axis.
There is a helper function for doing this called titer_to_index()
.
The titer
argument is the dilutions, start
is the first dilution in the dilution series, and fold
sets the fold change in concentration at each step in the dilution series.
We can call the function and store the returned value as a new column in our DataFrame:
df["Log Dilution"] = it.titer_to_index(titer=df["Dilution"], start=40, fold=4)
Fit curves¶
We are now ready to fit sigmoid curves for each sample.
We want partial pooling between samples for inference of the a
parameter to give us one estimate of a
for each sample.
We know a priori that the response will be 0 at a theoretical infinite dilution, so we can set c=0
.
We will use full pooling for b
and d
because we expect the gradient of the sigmoid curve (b
) and its height above baseline (d
) to be the same across samples.
We make a new Sigmoid
object that has these properties:
sigmoid = it.Sigmoid(a="partial", b="full", c=0, d="full")
Now we call the fit()
method to infer the posterior distributions of the model parameters, and supply the data from our long format DataFrame:
sigmoid = sigmoid.fit(
log_dilution=df["Log Dilution"], response=df["OD"], sample_labels=df["Sample"]
)
Various messages will print displaying information about the sampling of the posterior distribution.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, d, b, a, sigma_a, mu_a]
Sampling 4 chains for 1_000 tune and 10_000 draw iterations (4_000 + 40_000 draws total) took 25 seconds.]
Visualize curves¶
It is generally a good idea to visualize the model fits.
To inspect an individual sample of interest use the plot_sample()
method, and pass it the sample name you want to see.
By default this method shows a selection of sigmoid curves from the posterior distribution.
Above, fit()
took 10,000 samples from the posterior distribution.
Here, step=1000
means that every 1,000th sample will be shown, resulting in 10,000 / 1000 = 10 lines in total.
sigmoid.plot_sample("21-P0004-v001sr01", step=1000)
Looking at samples from the posterior distribution tells you how confident the model is in the model fit. Sparser data or data that aren’t well arranged in a sigmoid curve will yield more dispersed lines.
You can take the mean value of each parameter from the posterior distribution and plot the resulting sigmoid curve by passing mean=True
:
sigmoid.plot_sample("21-P0004-v001sr01", mean=True)
To visualize multiple samples at once, pass a list of sample names to plot_samples()
:
sigmoid.plot_samples(["21-P0833-v001sr01", "21-P0834-v001sr01"])
Or, to show all samples use plot_all_samples()
:
sigmoid.plot_all_samples()
See the matplotlib
documentation for help on customizing and saving figures.
Inflection titers¶
The degree to which a sigmoid curve is shifted left or right on the x-axis is often the point of interest to compare between samples.
This is described by the inflection point of the curve, calculated by inflections()
:
df_inflections = sigmoid.inflections(hdi_prob=0.95)
df_inflections.head().round(2)
sample |
mean |
median |
hdi low |
hdi high |
---|---|---|---|---|
21-P0425-v001sr01 |
0.91 |
0.91 |
0.78 |
1.04 |
21-P0917-v001sr01 |
1.82 |
1.82 |
1.7 |
1.96 |
21-P0796-v001sr01 |
2.51 |
2.51 |
2.39 |
2.64 |
21-P0680-v001sr01 |
2.04 |
2.04 |
1.91 |
2.17 |
21-P0800-v001sr01 |
4.47 |
4.47 |
4.35 |
4.6 |
hdi low
and hdi high
refer to the low and high boundary of the Highest Density Interval (HDI).
An HDI is the narrowest set of parameter values that contain a certain mass of the posterior probability density - it is a type of confidence interval for a parameter.
Here, we specified an HDI probability of 0.95 (which is also the default value for the inflections()
method).
Note, there is nothing particularly special about a value of 0.95
Values in this DataFrame are on the log dilution scale; i.e. they tell you the position in the dilution series of the inflection point.
To get values on the dilution scale use index_to_titer()
:
df_inflection_titers = it.index_to_titer(df_inflections, start=40, fold=4)
df_inflection_titers.head().round(2)
sample |
mean |
median |
hdi low |
hdi high |
---|---|---|---|---|
21-P0425-v001sr01 |
141.43 |
141.58 |
117.89 |
169.98 |
21-P0917-v001sr01 |
501.36 |
501.69 |
422.53 |
601.65 |
21-P0796-v001sr01 |
1294.1 |
1294.03 |
1102.35 |
1544.14 |
21-P0680-v001sr01 |
676.47 |
676.82 |
563.92 |
807.78 |
21-P0800-v001sr01 |
19699.43 |
19744.58 |
16530.67 |
23644.44 |
Endpoint titers¶
Endpoint titers can also be computed.
An endpoint titer is the dilution at which the response drops below a certain value, known as the cut-off.
Choice of cut-off is somewhat arbitrary, but is usually some low absolute value, or a low proportion of the maximal response.
Use endpoints()
to compute endpoints:
df_endpoints = sigmoid.endpoints(cutoff_proportion=0.1, hdi_prob=0.95)
df_endpoints.head().round(2)
sample |
mean |
median |
hdi low |
hdi high |
---|---|---|---|---|
21-P0425-v001sr01 |
2.94 |
2.94 |
2.80 |
3.08 |
21-P0917-v001sr01 |
3.85 |
3.85 |
3.72 |
3.99 |
21-P0796-v001sr01 |
4.54 |
4.54 |
4.41 |
4.66 |
21-P0680-v001sr01 |
4.07 |
4.07 |
3.93 |
4.20 |
21-P0800-v001sr01 |
6.50 |
6.50 |
6.37 |
6.63 |
Like inflection points, the values in this DataFrame are on the log dilution scale.
Use index_to_titer()
to put them on the dilution scale.