Pymob in minutes - the basics#

This guide provides a streamlined introduction to the basic Pymob workflow and its key functionalities.
We will explore a simple linear regression model that we want to fit to a noisy dataset.
Pymob supports the modeling process by providing several tools for data structuring, parameter estimation and visualization of results.

If you are looking for a more detailed introduction, click here.
If you want to learn how to work with ODE models, check out this tutorial.

Pymob components 🧩#

Before starting the modeling process, let’s take a look at the main steps and modules of pymob:

  1. Simulation:
    First, we need to initialize a Simulation object by creating an instance of the pymob.simulation.SimulationBase class from the simulation module.
    Optionally, we can configure the simulation with sim.config.case_study.name = "linear-regression", sim.config.case_study.scenario = "test" and many other options.

  2. Model:
    Our model will be defined as a standard python function.
    We will then assign it to the Simulation object by accessing the .model attribute.

  3. Observations:
    Our observation data must be structured as an xarray.Dataset.
    We assign it to the observations attribute of our Simulation object.
    Calling sim.config.data_structure will give us further information about the layout of our data.

  4. Solver:
    A solver (solvers) is required to solve the model.
    In our simple case, we will use the solve_analytic_1d solver from the analytic module.
    We assign it to our Simulation object using the solver attribute.
    Since our model already provides an analytical solution, this solver basically does nothing. It is still needed to fulfill Pymob’s requirement for a solver component.
    For more complex models (e.g. ODEs), the JaxSolver from the diffrax module is a more powerful option.
    Users can also implement custom solvers as a subclass of pymob.solvers.base.SolverBase.

  5. Inferer:
    The inferer handels the parameter estimation.
    Pymob supports various backends. In this example, we will work with NumPyro.
    We assign the inferer to our Simulation object via the inferer attribute and configure the desired kernel (e.g. nuts).
    But before inference, we need to parameterize our model using the Param class.
    Each parameter can be marked either as free or fixed, depending on whether it should be variable during the optimization procedure.
    The parameters are stored in the model_parameters dictionary, which holds model input values. By default, it takes the keys: parameters, y0 and x_in.

  6. Evaluator:
    The Evaluator is an instance to manage model evaluations. It sets up tasks, coordinates parallel runs of the simulation and keeps track of the results from each simulation or parameter inference process.
    Evaluators store the raw output from a simulation and can generate an xarray object from it that corresponds to the data-structure of the observations with the results property. This automatically aligns the simulations results with the observations, for simple computation of loss functions.

  7. Config:
    The simulation settings will be saved in a .cfg configuration file.
    The config file contains information about our simulation in various sections. Learn more here.
    We can further use it to create new simulations by loading settings from a config file.

framework-overview

Getting started 🛫#

# First, import the necessary python packages
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

# Import the pymob modules
from pymob.simulation import SimulationBase
from pymob.sim.solvetools import solve_analytic_1d
from pymob.sim.config import Param

Since no measured data is provided, we will generate an artificial dataset.
\(y_{obs}\) represents the observed data over the time \(t\) [0, 10].
To use this data later in the simulation, we need to convert it into an xarray dataset.
In your own application, you would replace this with your measured experimental data.

# Parameter for the artificial data generation
rng = np.random.default_rng(seed=1)  # for reproducibility
slope = rng.uniform(2,4)
intercept = 1.0
num_points = 101
noise_level = 1.7

# generating time values
t = np.linspace(0, 10, num_points)

# generating y-values with noise
noise = rng.normal(0, noise_level, num_points)
y_obs = slope * t + intercept + noise

# visualizing our data
fig, ax = plt.subplots(figsize=(5, 4))
ax.scatter(t, y_obs, label='Datapoints')
ax.set(xlabel='t [-]', ylabel='y_obs [-]', title ='Artificial Data')
plt.tight_layout()

# convert the data to an xr-Dataset
data_obs = xr.DataArray(y_obs, coords={"t": t}).to_dataset(name="y")
data_obs
<xarray.Dataset>
Dimensions:  (t: 101)
Coordinates:

  • t (t) float64 0.0 0.1 0.2 0.3 0.4 0.5 … 9.5 9.6 9.7 9.8 9.9 10.0 Data variables: y (t) float64 2.397 1.864 -0.6106 3.446 … 27.91 31.2 29.83 32.7

png

Initialize a simulation ✨#

In pymob, a simulation object is initialized by creating an instance of the SimulationBase class from the simulation module.
We will choose a linear regression model, as it provides a good approximation of the data: \( y = a + b*x \)

x-dimension

The x_dimension of our simulation can have any name, for example t as often used for time series data. You can specify it via sim.config.simulation.x_dimension.

# Initialize the Simulation object
sim = SimulationBase()

# configurate the case study
sim.config.case_study.name = "superquickstart"
sim.config.case_study.scenario = "linreg"

# Define the linear regression model
def linreg(x, a, b):
    return a + b * x

# Add the model to the simulation
sim.model = linreg

# Adding our dataset to the simulation
sim.observations = data_obs

# Defining a solver
sim.solver = solve_analytic_1d

# Take a look at the layut of the data
sim.config.data_structure
MinMaxScaler(variable=y, min=-0.6106386438473108, max=32.702588647741905)


/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/sim/config/base.py:397: UserWarning: Case study 'unnamed_case_study' could not be imported. Install the case study with `pip install unnamed_case_study`.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/simulation.py:366: UserWarning: `sim.config.data_structure.y = Datavariable(dimensions=['t'] unit=None min=-0.6106386438473108 max=32.702588647741905 observed=True dimensions_evaluator=None)` has been assumed from `sim.observations`. If the order of the dimensions should be different, specify `sim.config.data_structure.y = DataVariable(dimensions=[...], ...)` manually.
  warnings.warn(





Datastructure(indices=[], y=DataVariable(dimensions=['t'], unit=None, min=-0.6106386438473108, max=32.702588647741905, observed=True, dimensions_evaluator=None))

Scalers

We notice a mysterious Scaler message. This tells us that our data variable has been identified and a scaler was constructed, which transforms the variable between [0, 1].
This has no effect at the moment, but it can be used later. Scaling can be powerful to help parameter estimation in more complex models.

Parameterizing and running the model 🏃#

Next, we define the model parameters \(a\) and \(b\).
Parameter \(a\) is set as fixed (free = False), meaning its value is known and will not be estimated during optimization.
Parameter \(b\) is marked as free (free = True), allowing it to be optimized to fit the data. As an initial guess, we assume \(b = 3\).

# Parameterizing the model
sim.config.model_parameters.a = Param(value=1.0, free=False)
sim.config.model_parameters.b = Param(value=3.0, free=True)
# this makes sure the model parameters are available to the model.
sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict

sim.model_parameters["parameters"] 
{'a': 1.0, 'b': 3.0}

Our model is now prepared with a defined parameter set.
To initialize the Evaluator, we call dispatch_constructor().
This step is essential and must be executed every time changes are made to the model.

The returned dataset (evaluator.results) has the exact same shape as the observation data.

# put everything in place for running the simulation
sim.dispatch_constructor()

# run
evaluator = sim.dispatch(theta={"b":3})
evaluator()
evaluator.results
/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/simulation.py:723: UserWarning: The number of ODE states was not specified in the config file [simulation] > 'n_ode_states = <n>'. Extracted the return arguments ['a+b*x'] from the source code. Setting 'n_ode_states=1.
  warnings.warn(
<xarray.Dataset>
Dimensions:  (t: 101)
Coordinates:

  • t (t) float64 0.0 0.1 0.2 0.3 0.4 0.5 … 9.5 9.6 9.7 9.8 9.9 10.0 Data variables: y (t) float64 1.0 1.3 1.6 1.9 2.2 2.5 … 29.8 30.1 30.4 30.7 31.0

What does the dispatch constructor do?

Behind the scenes, the dispatch constructor assembles a lightweight Evaluator object from the Simulation object, that takes the least necessary amount of information, runs it through some dimension checks, and also connects it to the specified solver and initializes it.

Let’s take a look at the results.

You can vary the parameter \(b\) in the previous step to investigate its influence on the model fit.
In the Introduction, you can try out the manual parameter estimation, which is a feature provided by Pymob.

fig, ax = plt.subplots(figsize=(5, 4))
data_res = evaluator.results
ax.plot(data_obs.t, data_obs.y, ls="", marker="o", color="tab:blue", alpha=.5, label ="observation data")
ax.plot(data_res.t, data_res.y, color="black", label ="result")
ax.legend()
<matplotlib.legend.Legend at 0x77fc11c5fc90>

png

Estimating parameters and uncertainty with MCMC 🤔#

Of course this example is very simple. In fact, we could optimize the parameters perfectly by hand.
But just for fun, let’s use Markov Chain Monte Carlo (MCMC) to estimate the parameters, their uncertainty and the uncertainty in the data.
We’ll run the parameter estimation with our inferer, using the NumPyro backend with a NUTS kernel. This completes the job in a few seconds.

We are almost ready to infer the model parameters. To also estimate the uncertainty of the parameters, we add another parameter representing the error and assume that it follows a lognormal distribution.
Additionally, we specify an error model for the data distribution. This will be: $\(y_{obs} \sim Normal (y, \sigma_y)\)$

Since \(\sigma_y\) is not a fixed parameter, it doesn’t need to be passed to the simulation class.

sim.config.model_parameters.sigma_y = Param(free=True , prior="lognorm(scale=1,s=1)", min=0, max=1)
sim.config.model_parameters.b.prior = "lognorm(scale=1,s=1)"

sim.config.error_model.y = "normal(loc=y,scale=sigma_y)"


sim.set_inferer("numpyro")
sim.inferer.config.inference_numpyro.kernel = "nuts"
sim.inferer.run()

# you can access the posterior distrubution by:
sim.inferer.idata.posterior

# Plot the results
sim.config.simulation.x_dimension = "t"
sim.posterior_predictive_checks(pred_hdi_style={"alpha": 0.1})
Jax 64 bit mode: False
Absolute tolerance: 1e-07


/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/inference/numpyro_backend.py:565: UserWarning: Model is not rendered, because the graphviz executable is not found. Try search for 'graphviz executables not found' and the used OS. This should be an easy fix :-)
  warnings.warn(


Trace Shapes:      
 Param Sites:      
Sample Sites:      
       b dist     |
        value     |
 sigma_y dist     |
        value     |
   y_obs dist 101 |
        value 101 |

0%| | 0/3000 [00:00<?, ?it/s]

warmup: 0%| | 1/3000 [00:01<1:09:30, 1.39s/it, 1 steps of size 1.87e+00. acc. prob=0.00]

warmup: 14%|█▍ | 435/3000 [00:01<00:06, 404.91it/s, 3 steps of size 1.15e+00. acc. prob=0.79]

warmup: 29%|██▉ | 874/3000 [00:01<00:02, 872.91it/s, 3 steps of size 1.05e+00. acc. prob=0.79]

sample: 44%|████▍ | 1313/3000 [00:01<00:01, 1378.17it/s, 3 steps of size 8.42e-01. acc. prob=0.93]

sample: 59%|█████▊ | 1757/3000 [00:01<00:00, 1899.16it/s, 3 steps of size 8.42e-01. acc. prob=0.92]

sample: 73%|███████▎ | 2200/3000 [00:01<00:00, 2393.88it/s, 7 steps of size 8.42e-01. acc. prob=0.92]

sample: 88%|████████▊ | 2641/3000 [00:01<00:00, 2833.03it/s, 7 steps of size 8.42e-01. acc. prob=0.92]

sample: 100%|██████████| 3000/3000 [00:02<00:00, 1447.78it/s, 1 steps of size 8.42e-01. acc. prob=0.92]

                mean       std    median      5.0%     95.0%     n_eff     r_hat
         b      3.00      0.03      3.00      2.96      3.04   1642.48      1.00
   sigma_y      1.47      0.10      1.46      1.29      1.63   1231.36      1.01

Number of divergences: 0

png

numpyro distributions

Currently only few distributions are implemented in the numpyro backend. This API will soon change, so that basically any distribution can be used to specifcy parameters.

We can inspect our estimates and see that the model provides a good fit for the parameters.
Note that we only get an estimate for \(b\). Previously, we set the parameter \(a\) with the flag free = False.
This effectively excludes it from the estimation and uses its default value, which was set to the true value a = 0.

Customize the posterior predictive checks

You can explore the API of pymob.sim.plot.SimulationPlot to find out how you can work on the default predictions. Of course you can always make your own plot, by accessing pymob.simulation.inferer.idata and pymob.simulation.observations

Report the results 🗒️#

Pymob provides the option to generate an automated report of the parameter distribution for a simulation.
The report can be configured by modifying the options in report().

# report the results
sim.report()
[WARNING] Could not fetch resource 'probability_model.png': replacing image with description
[WARNING] This document format requires a nonempty <title> element.
  Defaulting to 'report' as the title.
  To specify a title, use 'title' in metadata or --metadata title="...".

posterior_trace.png

posterior_pairs.png

Exporting the simulation and running it via the case study API 📤#

After constructing the simulation, all settings - custom and default - can be exported to a comprehensive configuration file.
The simulation will be saved to the default path (CASE_STUDY/scenarios/SCENARIO/settings.cfg) or to a custom path, specified with the file path keyword fp.
Setting force=True will overwrite any existing config file, which is a reasonable choice in most cases. From this point on, the simulation is (almost) ready to be executed from the command-line.

import os
sim.config.create_directory("scenario", force=True)
sim.config.create_directory("results", force=True)

# usually we expect to have a data directory in the case
os.makedirs(sim.data_path, exist_ok=True)
sim.save_observations(force=True)
sim.config.save(force=True)
Scenario directory created at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/superquickstart/scenarios/linreg'.
Results directory exists at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/superquickstart/results/linreg'.

Commandline API#

The command-line API runs a series of commands that load the case study, execute the initialize() method and perform some more initialization tasks before running the required job.

  • pymob-infer runs an inference job, for example:

    pymob-infer --case_study=quickstart --scenario=test --inference_backend=numpyro.
    While there are more command-line options, these two (–case_study and –scenario) are required.