Tutorial
========

Load some example data using :py:mod:`ititer` and look at the first 10 rows:

.. code-block:: python

    import ititer as it
    df = it.load_example_data()
    df.head(10)


.. csv-table::
   :header: Sample,OD,Dilution
   :widths: 20, 10, 10

   21-P0004-v001sr01,1.371,40
   21-P0004-v001sr01,0.981,160
   21-P0004-v001sr01,0.535,640
   21-P0004-v001sr01,0.182,2560
   21-P0004-v001sr01,0.064,10240
   21-P0004-v001sr01,0.027,40960
   21-P0004-v001sr01,0.015,163840
   21-P0004-v001sr01,0.010,655360
   21-P0034-v001sr01,0.948,40
   21-P0034-v001sr01,0.452,160

Each row contains information about a single OD measurement; the sample, OD value and the dilution factor.
The measurements for the first sample, ``21-P0004-v001sr01`` are in the first 8 rows, followed by those for ``21-P0034-v001sr01``.

This is known as *long format* data.

Wide format data
----------------

If your data is in *wide format*,
for instance perhaps each row contains OD values at different dilutions for a single sample ,
then use :py:meth:`pandas.DataFrame.melt` to generate a long format DataFrame.

Example wide format data:

.. csv-table::
   :header: Sample,40,160,640,2560,10240,40960,163840,655360
   :widths: 20, 10, 10, 10, 10, 10, 10, 10, 10

   21-P0004-v001sr01,1.371,0.981,0.535,0.182,0.064,0.027,0.015,0.01
   21-P0034-v001sr01,0.948,0.452,0.185,0.043,0.016,0.004,0.002,-0.001
   21-P0050-v001sr01,1.418,1.253,0.972,0.393,0.152,0.049,0.018,0.011

For a DataFrame exactly like this the  call to ``melt`` would be:

.. code-block:: python

    df.reset_index().melt(id_vars="Sample", var_name="Dilution", value_name="OD")

Log transform dilutions
-----------------------

Next, we need to transform the dilutions (which increase logarithmically) to values that increase linearly.
This will mean that when the data are plotted, the values will be evenly spaced on the x-axis.

There is a helper function for doing this called :py:func:`~.ititer.titer_to_index`.
The ``titer`` argument is the dilutions, ``start`` is the first dilution in the dilution series, and ``fold`` sets the fold change in concentration at each step in the dilution series.
We can call the function and store the returned value as a new column in our DataFrame:

.. code-block:: python

    df["Log Dilution"] = it.titer_to_index(titer=df["Dilution"], start=40, fold=4)


Fit curves
----------

We are now ready to fit sigmoid curves for each sample.

We want partial pooling between samples for inference of the ``a`` parameter to give us one estimate of ``a`` for each sample.
We know *a priori* that the response will be 0 at a theoretical infinite dilution, so we can set ``c=0``.
We will use full pooling for ``b`` and ``d`` because we expect the gradient of the sigmoid curve (``b``) and its height above baseline (``d``) to be the same across samples.

We make a new :py:class:`~.ititer.Sigmoid` object that has these properties:

.. code-block:: python

    sigmoid = it.Sigmoid(a="partial", b="full", c=0, d="full")


Now we call the :py:meth:`~.ititer.Sigmoid.fit` method to infer the posterior distributions of the model parameters, and supply the data from our long format DataFrame:

.. code-block:: python

    sigmoid = sigmoid.fit(
        log_dilution=df["Log Dilution"], response=df["OD"], sample_labels=df["Sample"]
    )

Various messages will print displaying information about the sampling of the posterior distribution.

.. code-block:: text

    Auto-assigning NUTS sampler...
    Initializing NUTS using jitter+adapt_diag...
    Multiprocess sampling (4 chains in 4 jobs)
    NUTS: [sigma, d, b, a, sigma_a, mu_a]
    Sampling 4 chains for 1_000 tune and 10_000 draw iterations (4_000 + 40_000 draws total) took 25 seconds.]

Visualize curves
----------------

It is generally a good idea to visualize the model fits.
To inspect an individual sample of interest use the :py:meth:`~.ititer.Sigmoid.plot_sample` method, and pass it the sample name you want to see.
By default this method shows a selection of sigmoid curves from the posterior distribution.
Above, :py:meth:`~.ititer.Sigmoid.fit` took 10,000 samples from the posterior distribution.
Here, ``step=1000`` means that every 1,000\ :sup:`th` sample will be shown, resulting in 10,000 / 1000 = 10 lines in total.

.. code-block:: python

    sigmoid.plot_sample("21-P0004-v001sr01", step=1000)

.. image:: 1-sample.png

Looking at samples from the posterior distribution tells you how confident the model is in the model fit.
Sparser data or data that aren't well arranged in a sigmoid curve will yield more dispersed lines.

You can take the mean value of each parameter from the posterior distribution and plot the resulting sigmoid curve by passing ``mean=True``:

.. code-block:: python

    sigmoid.plot_sample("21-P0004-v001sr01", mean=True)

.. image:: 1-sample-mean.png

To visualize multiple samples at once, pass a list of sample names to :py:meth:`~.ititer.Sigmoid.plot_samples`:

.. code-block:: python

    sigmoid.plot_samples(["21-P0833-v001sr01", "21-P0834-v001sr01"])

.. image:: 2-samples.png

Or, to show all samples use :py:meth:`~.ititer.Sigmoid.plot_all_samples`:

.. code-block:: python

    sigmoid.plot_all_samples()

.. image:: all-samples.png

See the :py:mod:`matplotlib` documentation for help on customizing and saving figures.

Inflection titers
-----------------

The degree to which a sigmoid curve is shifted left or right on the x-axis is often the point of interest to compare between samples.
This is described by the inflection point of the curve, calculated by :py:meth:`~.ititer.Sigmoid.inflections`:

.. code-block:: python

    df_inflections = sigmoid.inflections(hdi_prob=0.95)
    df_inflections.head().round(2)

.. csv-table::
    :header: sample,mean,median,hdi low,hdi high
    :widths: 20, 12, 12, 12, 12

    21-P0425-v001sr01,0.91,0.91,0.78,1.04
    21-P0917-v001sr01,1.82,1.82,1.7,1.96
    21-P0796-v001sr01,2.51,2.51,2.39,2.64
    21-P0680-v001sr01,2.04,2.04,1.91,2.17
    21-P0800-v001sr01,4.47,4.47,4.35,4.6

``hdi low`` and ``hdi high`` refer to the low and high boundary of the Highest Density Interval (HDI).
An HDI is the narrowest set of parameter values that contain a certain mass of the posterior probability density - it is a type of confidence interval for a parameter.
Here, we specified an HDI probability of 0.95 (which is also the default value for the :py:meth:`~.ititer.Sigmoid.inflections` method).
Note, there is nothing particularly special about a value of 0.95

Values in this DataFrame are on the log dilution scale; i.e. they tell you the position in the dilution series of the inflection point.
To get values on the dilution scale use :py:func:`~.ititer.index_to_titer`:

.. code-block:: python

    df_inflection_titers = it.index_to_titer(df_inflections, start=40, fold=4)
    df_inflection_titers.head().round(2)

.. csv-table::
    :header: sample,mean,median,hdi low,hdi high
    :widths: 20, 12, 12, 12, 12

    21-P0425-v001sr01,141.43,141.58,117.89,169.98
    21-P0917-v001sr01,501.36,501.69,422.53,601.65
    21-P0796-v001sr01,1294.1,1294.03,1102.35,1544.14
    21-P0680-v001sr01,676.47,676.82,563.92,807.78
    21-P0800-v001sr01,19699.43,19744.58,16530.67,23644.44

Endpoint titers
---------------

Endpoint titers can also be computed.
An endpoint titer is the dilution at which the response drops below a certain value, known as the cut-off.
Choice of cut-off is somewhat arbitrary, but is usually some low absolute value, or a low proportion of the maximal response.
Use :py:meth:`~.ititer.Sigmoid.endpoints` to compute endpoints:

.. code-block:: python

    df_endpoints = sigmoid.endpoints(cutoff_proportion=0.1, hdi_prob=0.95)
    df_endpoints.head().round(2)

.. csv-table::
    :header: sample,mean,median,hdi low,hdi high
    :widths: 20, 12, 12, 12, 12

    21-P0425-v001sr01,2.94,2.94,2.80,3.08
    21-P0917-v001sr01,3.85,3.85,3.72,3.99
    21-P0796-v001sr01,4.54,4.54,4.41,4.66
    21-P0680-v001sr01,4.07,4.07,3.93,4.20
    21-P0800-v001sr01,6.50,6.50,6.37,6.63

Like inflection points, the values in this DataFrame are on the log dilution scale.
Use :py:func:`~.ititer.index_to_titer` to put them on the dilution scale.