Pymob Introduction#

Overview#

Pymob is a Python-based platform for parameter estimation across a wide range of models. It abstracts repetitive tasks in the modeling process so that you can focus on building models, asking questions to the real world and learn from observations.
The idea of pymob originated from the frustration with fitting complex models to complicated datasets (missing observations, non-uniform data structure, non-linear models, ODE models). In such scenarios a lot of time is spent matching observations with model results.
One of Pymob’s key strengths is its streamlined model definition workflow. This not only simplifies the process of building models but also lets you apply a host of advanced optimization and inference algorithms, giving you the flexibility to iterate and discover solutions more effectively.

What’s the focus of this introduction?#

This introduction will give you an overview of the pymob package and an easy example on how to use it. After, you can explore more advanced tutorials and deepen your pymob kowledge.
First the general structure of the pymob package will be explained. You will get to know the function of the components. Subsequentenly you will get instructions to use pymob for your first parameter estimation with a simple example.

How pymob is structured:#

Here you can see the structure of the structure of pymob package:
Structure of the pymob package
The Pymob package consists of several elements:

Simulation
First, we need to initialize a Simulation object by calling the pymob.simulation.SimulationBase class from the simulation module.
Optionally, we can configure the simulation object with pymob.simulation.SimulationBase.config.case_study.name = “linear-regression”, pymob.simulation.SimulationBase.config.case_study.scenario = “test” and many more options.
Model
The model is a python function you define. With the model you try to describe the data you observed. A classical model is, for example, the Lotka-Volterra model to describe the interactions of predators and prey. In the tutorial today, the model will be a simple linear function.
The model will be added to the simualtion by using pymob.simulation.SimulationBase.model
Observations
The obseravtions are the data points, to which we want to fit our model. The observation data needs to be an xarray.Dataset (learn more here).
We assign it to our Simulation object by pymob.simulation.SimulationBase.observations.
pymob.simulation.SimulationBase.config.data_structure will give us some information about the layout of our data.
Solver
A solver is required for many models e.g. models that contain differential equations. Solvers in pymob are callables that need to return a dictionary of results mapped to the data variables.
The solver is assigned to the Simulation object by pymob.simulation.SimulationBase.solver.
These solvers are currently implemented in pymob:
- analytic module
  - solve_analytic_1d
- base module
  - curve_jumps
  - jump_interpolation
  - mappar
  - radius_interpolation
  - rect_interpolation
  - smoothed_interpolation
- diffrax module
  - JaxSolver
- scipy module
  - solve_ivp_1d

The documentation can be found here

Inferer
The inferer serves as the parameter estimator. Pymob provides various backends. You can find detailed information here.
Currently, supported inference backends are:
- interactive (interactive backend in jupyter notebookswith parameter sliders)
- numpyro (bayesian inference and stochastic variational inference)
- pyabc (approximate bayesian inference)
- pymoo (experimental multi-objective optimization)
Evaluator
The Evaluator is an instance to manage model evaluations. It sets up tasks, coordinates parallel runs of the simulation and keeps track of the results from each simulation or parameter inference process.
Config
Pymob uses pydantic models to validate configuration files, with the configuration organized into separate sections. You can modify these configurations either by editing the files before initializing a simulation from a config file, or directly within the script. During parameter estimation setup, all configuration settings are stored in a config object, which can later be exported as a .cfg file.

Let’s get started 🎉#

You will need several packages during this introduction:

# imports from pymob
from pymob.simulation import SimulationBase
from pymob.sim.solvetools import solve_analytic_1d
from pymob.sim.config import Param

# other imports
import numpy as np
import xarray as xr
from matplotlib import pyplot as plt
import os
from numpy import random

In the following tutorial, you’ll notice some import statements included as comments. These are provided to indicate which package is required for each step.

Initialize a simulation#

First, we initialize an object of the class simulation. This is the center of the whole package and will manage all processes from now on.
In pymob a Simulation object is initialized by calling the pymob.simulation.SimulationBase class from the simulation module.

#from pymob.simulation import SimulationBase

sim = SimulationBase()

/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/sim/config/base.py:397: UserWarning: Case study 'unnamed_case_study' could not be imported. Install the case study with `pip install unnamed_case_study`.
  warnings.warn(

Configuring the simulation

Optionally, we can configure the simulation at this stage with sim.config.case_study.name = "linear-regression", sim.config.case_study.scenario = "test", and many more options.

Case studies are a principled approach to the modelling process. In essence, they are a simple template that contains building blocks for model and names and stores them in an intuitive and reproducible way. Here you’ll find some additional information on case studies.

At the moment, it is sufficient to only create a simulation object without making any further configurations.

Define a model#

Now the model needs to be defined. In Pymob, every model is represented as a Python function. Here, you’ll specify the model whose parameters will be estimated.

In this tutorial, we’ll use linear regression as our example, since it’s the simplest form of modeling.

# definition of the model: 
def linreg(t, a, b):
    return a + t * b

So we assume that this model describes our data well. So we add it to the simulation by

sim.model = linreg

Defining a solver#

As described above: A solver is required for many models. So we define a solver by pymob.simulation.SimulationBase.solver.
In our case the model gives the exact solution of the model. Therefore, we choose solve_analytic_1d. An overwiev of the solvers currently implemented in pymob can be found at the beginning of this tutorial here.

# from pymob.sim.solvetools import solve_analytic_1d
sim.solver = solve_analytic_1d

The pymob magic#

So far we have not done anything special. Pymob exists, because wrangling dimensions of input and output data, nested data-structures, missing data is painful.

Now we add our data, which is already transformed into a xarray Dataset, by using pymob.simulation.SimulationBase.observations.

# import xarray as xr

sim.observations = obs_data

MinMaxScaler(variable=data, min=-0.7031521676464498, max=26.6643243203019)


/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/simulation.py:366: UserWarning: `sim.config.data_structure.data = Datavariable(dimensions=['t'] unit=None min=-0.7031521676464498 max=26.6643243203019 observed=True dimensions_evaluator=None)` has been assumed from `sim.observations`. If the order of the dimensions should be different, specify `sim.config.data_structure.data = DataVariable(dimensions=[...], ...)` manually.
  warnings.warn(

This worked 🎉 pymob.simulation.SimulationBase.config.data_structure will now give us some information about the layout of our data, which will handle the data transformations in the background.

sim.config.data_structure

Datastructure(indices=[], data=DataVariable(dimensions=['t'], unit=None, min=-0.7031521676464498, max=26.6643243203019, observed=True, dimensions_evaluator=None))

What happens when we assign a Dataset to the observations attribute?

Debug into the function and discover what happens!

We can give pymob additional information about the data structure of our observations and intermediate (unobserved) variables that are simulated. This can be done with sim.config.data_structure.y = DataVariable(dimensions=["x"]). These information can be used to switch the dimensional order of the observations or provide data variables that have differing dimensions from the observations, if needed. But if the dataset is ordinary, simply setting pymob.simulation.SimulationBase.observations property with a xr.Dataset will be sufficient.

Scalers

We also notice a mysterious Scaler message. This tells us that our data variable has been identified and a scaler was constructed, which transforms the variable between [0, 1]. This has no effect at the moment, but it can be used later. Scaling can be powerful to help parameter estimation in more complex models.

Parameterizing a model#

Parameters are specified via the FloatParam or ArrayParam class. Parameters can be marked free or fixed depending on whether they should be variable during an optimization procedure.

In this tutorial we want to fit the parameter \(b\) and assume that we know parameter \(a\):

The parameter \(a\) is set as fixed (free = False), meaning its value is known and will not be estimated during optimization.
The parameter \(b\) is marked as free (free = True), allowing it to be optimized to fit our data. As an initial guess, we assume \(b = 3\).

#from pymob.sim.config import Param
sim.config.model_parameters.a = Param(value=0, free=False)
sim.config.model_parameters.b = Param(value=3, free=True)

# this makes sure the model parameters are available to the model.
sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict

To make the parameters available to the simulation one has to use sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict. This step is particularly important for all fixed parameters.

pymob.simulation.SimulationBase.model_parameters is a dictionary that stores the input data for the model. By default, it includes the keys parameters, y0, and x_in. For our analytic model, we only need the parameters key. In situations where initial values for variables are required, you can provide them using pymob.simulation.SimulationBase.model_parameters["y0"] = … .

For example, when working with a Lotka-Volterra model, you would specify the initial conditions for the predator and prey populations with y0. For more details on such use cases, please refer to the advanced tutorial.

generating input for solvers

A helpful function to generate y0 or x_in from observations is SimulationBase.parse_input, combined with settings of config.simulation.y0

sim.model_parameters['parameters']

{'a': array(0), 'b': array(3)}

Exporting the simulation and running it via the case study API#

After constructing the simulation, all settings of the simulation can be exported to a comprehensive configuration file, along with all the default settings. This is as simple as

import os
sim.config.case_study.name = "quickstart"
sim.config.case_study.scenario = "test"
sim.config.create_directory("scenario", force=True)
sim.config.create_directory("results", force=True)

# usually we expect to have a data directory in the case
os.makedirs(sim.data_path, exist_ok=True)
sim.save_observations(force=True)
sim.config.save(force=True)

Scenario directory exists at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/quickstart/scenarios/test'.
Results directory exists at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/quickstart/results/test'.

The simulation will be saved to the default path (CASE_STUDY/scenarios/SCENARIO/settings.cfg) or to a custom file path specified with the fp keyword. force=True will overwrite any existing config file, which is the reasonable choice in most cases.

From there on, the simulation is (almost) ready to be executable from the commandline.

Commandline API#

The commandline API runs a series of commands that load the case study, execute the pymob.simulation.SimulationBase.initialize() method and perform some more initialization tasks, before running the required job.

pymob-infer: Runs an inference job e.g. pymob-infer --case_study=quickstart --scenario=test --inference_backend=numpyro. While there are more commandline options, these are the two required

Pymob Introduction#

Overview#

What’s the focus of this introduction?#

How pymob is structured:#

Let’s get started 🎉#

Generate artificial data#

Initialize a simulation#

Define a model#

Defining a solver#

The pymob magic#

Parameterizing a model#

Running the model 🏃#

Estimating parameters#

Manual estimation#

Estimating parameters and uncertainty with MCMC#

Plot the results#

Report the results#

Exporting the simulation and running it via the case study API#

Commandline API#