Pymob Introduction#
Overview#
Pymob is a Python-based platform for parameter estimation across a wide range of models. It abstracts repetitive tasks in the modeling process so that you can focus on building models, asking questions to the real world and learn from observations.
The idea of pymob originated from the frustration with fitting complex models to complicated datasets (missing observations, non-uniform data structure, non-linear models, ODE models). In such scenarios a lot of time is spent matching observations with model results.
One of Pymob’s key strengths is its streamlined model definition workflow. This not only simplifies the process of building models but also lets you apply a host of advanced optimization and inference algorithms, giving you the flexibility to iterate and discover solutions more effectively.
What’s the focus of this introduction?#
This introduction will give you an overview of the pymob package and an easy example on how to use it. After, you can explore more advanced tutorials and deepen your pymob kowledge.
First the general structure of the pymob package will be explained. You will get to know the function of the components. Subsequentenly you will get instructions to use pymob for your first parameter estimation with a simple example.
How pymob is structured:#
Here you can see the structure of the structure of pymob package:
The Pymob package consists of several elements:
Simulation
First, we need to initialize a Simulation object by calling thepymob.simulation.SimulationBaseclass from the simulation module.
Optionally, we can configure the simulation object withpymob.simulation.SimulationBase.config.case_study.name= “linear-regression”,pymob.simulation.SimulationBase.config.case_study.scenario= “test” and many more options.Model
The model is a python function you define. With the model you try to describe the data you observed. A classical model is, for example, the Lotka-Volterra model to describe the interactions of predators and prey. In the tutorial today, the model will be a simple linear function.
The model will be added to the simualtion by usingpymob.simulation.SimulationBase.modelObservations
The obseravtions are the data points, to which we want to fit our model. The observation data needs to be anxarray.Dataset(learn more here).
We assign it to our Simulation object bypymob.simulation.SimulationBase.observations.
pymob.simulation.SimulationBase.config.data_structurewill give us some information about the layout of our data.Solver
A solver is required for many models e.g. models that contain differential equations. Solvers in pymob are callables that need to return a dictionary of results mapped to the data variables.
The solver is assigned to the Simulation object bypymob.simulation.SimulationBase.solver.
These solvers are currently implemented in pymob:analytic module
solve_analytic_1d
base module
curve_jumps
jump_interpolation
mappar
radius_interpolation
rect_interpolation
smoothed_interpolation
diffrax module
JaxSolver
scipy module
solve_ivp_1d
The documentation can be found here
Inferer
The inferer serves as the parameter estimator. Pymob provides various backends. You can find detailed information here.
Currently, supported inference backends are:interactive (interactive backend in jupyter notebookswith parameter sliders)
numpyro (bayesian inference and stochastic variational inference)
pyabc (approximate bayesian inference)
pymoo (experimental multi-objective optimization)
Evaluator
The Evaluator is an instance to manage model evaluations. It sets up tasks, coordinates parallel runs of the simulation and keeps track of the results from each simulation or parameter inference process.Config
Pymob usespydanticmodels to validate configuration files, with the configuration organized into separate sections. You can modify these configurations either by editing the files before initializing a simulation from a config file, or directly within the script. During parameter estimation setup, all configuration settings are stored in a config object, which can later be exported as a.cfgfile.
Let’s get started 🎉#
You will need several packages during this introduction:
# imports from pymob
from pymob.simulation import SimulationBase
from pymob.sim.solvetools import solve_analytic_1d
from pymob.sim.config import Param
# other imports
import numpy as np
import xarray as xr
from matplotlib import pyplot as plt
import os
from numpy import random
In the following tutorial, you’ll notice some import statements included as comments. These are provided to indicate which package is required for each step.
Generate artificial data#
In the real world, you will have measured a dataset. For demonstration, we generate some artifical data. Later we will fit the model to our artifical data.
\(y_{obs}\) represents the observation data over the time \(t\) [0, 10].
# Parameter for the artificial data generation
rng = np.random.default_rng(seed=1) # for reproducibility
slope = rng.uniform(1,4)
intercept = 1.0
num_points = 100
noise_level = 1.7
# generating x-values
x = np.linspace(0, 10, num_points)
# generating y-values with noise
noise = rng.normal(0, noise_level, num_points)
y_obs = slope * x + intercept + noise
data = np.array(y_obs)
# visualising our data
plt.scatter(x, y_obs, label='Datapoints')
plt.xlabel('t [-]')
plt.ylabel('y_obs [-]')
plt.title('Artificial Data')
plt.legend()
plt.show()

Above you can see you’re generated artificial data. At the moment it’s stored in a normal array as you can see below:
# our artificial data is now in the variable data
print(data)
[ 2.39675084 1.81785059 -0.70315217 3.30742766 2.78326703 1.36771732
3.52454616 3.41252601 3.54888575 3.35328588 4.49048771 2.56521125
3.79634384 3.50979549 5.60354444 4.90914103 4.60054453 4.02458419
5.17270933 5.8798854 5.65362632 8.57816731 8.34579772 2.28149774
3.93525899 7.10557652 6.94107294 8.2780973 8.54045905 12.02744521
6.79279159 8.29740594 12.66815375 10.55094467 10.83486488 9.08995387
7.41814448 10.7606699 10.91741134 8.90169647 10.0828172 11.37793583
10.15043989 11.84556627 12.43105392 12.58533694 11.92025208 14.04642718
14.80814685 14.09471271 12.41438677 15.3052946 13.46514525 16.06827389
13.0077698 16.64051021 15.30791566 13.47525798 15.32060955 16.20232009
16.83019906 14.95284153 14.99613473 17.47407018 16.59740969 18.04735114
19.19428235 15.3562682 18.84777408 20.75332169 18.42173378 17.80525218
20.71855905 20.12671118 21.47496089 19.62120052 17.94508373 20.53326405
20.21848206 22.55054798 21.81778089 18.97226891 19.96904293 23.75936909
23.66863583 21.68072914 23.02346747 24.03883303 24.33375292 25.28318484
24.48570624 24.14458006 24.12185409 26.61276612 21.24765866 25.09450444
25.64242623 23.41934038 26.66432432 25.24747102]
The pymob package operates with xarray.Dataset. We avoid most of the mess by using xarray as a common input/output format. So we have to transform our data into a xarray.Dataset.
obs_data = xr.DataArray(data, dims = ("t"), coords={"t": x}).to_dataset(name="data")
Note: If you want to rename your data-dimension you have to change every sim.config.data_structure.data to the new name!
It can be helpful to look at the data befor going forward, especially if you never worked with xarray Datasets. At the section ‘Data variables’ you’ll find the data you just generated.
obs_data
<xarray.Dataset> Dimensions: (t: 100) Coordinates:
t (t) float64 0.0 0.101 0.202 0.303 0.404 … 9.697 9.798 9.899 10.0 Data variables: data (t) float64 2.397 1.818 -0.7032 3.307 … 25.64 23.42 26.66 25.25
Initialize a simulation#
First, we initialize an object of the class simulation. This is the center of the whole package and will manage all processes from now on.
In pymob a Simulation object is initialized by calling the pymob.simulation.SimulationBase class from the simulation module.
#from pymob.simulation import SimulationBase
sim = SimulationBase()
/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/sim/config/base.py:397: UserWarning: Case study 'unnamed_case_study' could not be imported. Install the case study with `pip install unnamed_case_study`.
warnings.warn(
Configuring the simulation
Optionally, we can configure the simulation at this stage with
sim.config.case_study.name = "linear-regression", sim.config.case_study.scenario = "test", and many more options.
Case studies are a principled approach to the modelling process. In essence, they are a simple template that contains building blocks for model and names and stores them in an intuitive and reproducible way. Here you’ll find some additional information on case studies.
At the moment, it is sufficient to only create a simulation object without making any further configurations.
Define a model#
Now the model needs to be defined. In Pymob, every model is represented as a Python function. Here, you’ll specify the model whose parameters will be estimated.
In this tutorial, we’ll use linear regression as our example, since it’s the simplest form of modeling.
# definition of the model:
def linreg(t, a, b):
return a + t * b
So we assume that this model describes our data well. So we add it to the simulation by
sim.model = linreg
Defining a solver#
As described above: A solver is required for many models. So we define a solver by pymob.simulation.SimulationBase.solver.
In our case the model gives the exact solution of the model. Therefore, we choose solve_analytic_1d. An overwiev of the solvers currently implemented in pymob can be found at the beginning of this tutorial here.
# from pymob.sim.solvetools import solve_analytic_1d
sim.solver = solve_analytic_1d
The pymob magic#
So far we have not done anything special. Pymob exists, because wrangling dimensions of input and output data, nested data-structures, missing data is painful.
Now we add our data, which is already transformed into a xarray Dataset, by using pymob.simulation.SimulationBase.observations.
# import xarray as xr
sim.observations = obs_data
MinMaxScaler(variable=data, min=-0.7031521676464498, max=26.6643243203019)
/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/simulation.py:366: UserWarning: `sim.config.data_structure.data = Datavariable(dimensions=['t'] unit=None min=-0.7031521676464498 max=26.6643243203019 observed=True dimensions_evaluator=None)` has been assumed from `sim.observations`. If the order of the dimensions should be different, specify `sim.config.data_structure.data = DataVariable(dimensions=[...], ...)` manually.
warnings.warn(
This worked 🎉 pymob.simulation.SimulationBase.config.data_structure will now give us some information about the layout of our data, which will handle the data transformations in the background.
sim.config.data_structure
Datastructure(indices=[], data=DataVariable(dimensions=['t'], unit=None, min=-0.7031521676464498, max=26.6643243203019, observed=True, dimensions_evaluator=None))
What happens when we assign a Dataset to the observations attribute?
Debug into the function and discover what happens!
We can give pymob additional information about the data structure of our observations and intermediate (unobserved) variables that are simulated. This can be done with sim.config.data_structure.y = DataVariable(dimensions=["x"]).
These information can be used to switch the dimensional order of the observations or provide data variables that have differing dimensions from the observations, if needed. But if the dataset is ordinary, simply setting pymob.simulation.SimulationBase.observations property with a xr.Dataset will be sufficient.
Scalers
We also notice a mysterious Scaler message. This tells us that our data variable has been identified and a scaler was constructed, which transforms the variable between [0, 1]. This has no effect at the moment, but it can be used later. Scaling can be powerful to help parameter estimation in more complex models.
Parameterizing a model#
Parameters are specified via the FloatParam or ArrayParam class. Parameters can be marked free or fixed depending on whether they should be variable during an optimization procedure.
In this tutorial we want to fit the parameter \(b\) and assume that we know parameter \(a\):
The parameter \(a\) is set as fixed (
free = False), meaning its value is known and will not be estimated during optimization.The parameter \(b\) is marked as free (
free = True), allowing it to be optimized to fit our data. As an initial guess, we assume \(b = 3\).
#from pymob.sim.config import Param
sim.config.model_parameters.a = Param(value=0, free=False)
sim.config.model_parameters.b = Param(value=3, free=True)
# this makes sure the model parameters are available to the model.
sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict
To make the parameters available to the simulation one has to use sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict. This step is particularly important for all fixed parameters.
pymob.simulation.SimulationBase.model_parameters is a dictionary that stores the input data for the model. By default, it includes the keys parameters, y0, and x_in. For our analytic model, we only need the parameters key. In situations where initial values for variables are required, you can provide them using pymob.simulation.SimulationBase.model_parameters["y0"] = … .
For example, when working with a Lotka-Volterra model, you would specify the initial conditions for the predator and prey populations with y0. For more details on such use cases, please refer to the advanced tutorial.
generating input for solvers
A helpful function to generate y0 or x_in from observations is SimulationBase.parse_input, combined with settings of config.simulation.y0
sim.model_parameters['parameters']
{'a': array(0), 'b': array(3)}
Running the model 🏃#
The model is prepared with a parameter set and ready to be executed. With pymob.simulation.SimulationBase.dispatch_constructor(), everything is prepared for the run of the model. It initiaizes an evaluator, makes preliminary calculations and checks.
ℹ️ What does the dispatch constructor do?:
Behind the scenes, the dispatch constructor assembles a lightweight pymob.simulation.SimulationBase.evaluator object from the Simulation object, that takes the least necessary amount of information, runs it through some dimension checks, and also connects it to the specified solver and initializes it. The purpose of the dispatch constructor is manyfold:
By executing the entire overhead of a model evaluation and packing it into a new pymob.simulation.SimulationBase.evaluator instance pymob.simulation.SimulationBase.dispatch_constructor() to make single model evaluations as fast as possible and allow parallel evaluations, because each evaluator created by pymob.simulation.SimulationBase.dispatch() is it’s a fully independent model instance with a separate set of parameters that can be solved.
Evaluators store the raw output from a simulation and can generate an xarray object from it that corresponds to the data-structure of the observations with the pymob.simulation.SimulationBase.evaluator.results property. This automatically aligns simulations results with observations, for simple computation of loss functions.
For the parameter estimation it is not necessary to run the model, but it can be helpfull. By using pymob.simulation.SimulationBase.dispatch() all the parameters with the setting free=True get fixed. Therefore, we have to fix parameter \(b\).
Try changing the value of \(b\) and see what effect it has on the next steps?
pymob.simulation.SimulationBase.dispatch_constructor() should be executed every time you change something in your simulation settings, even if you don’t run the model.
# put everything in place for running the simulation
sim.dispatch_constructor()
# run
evaluator = sim.dispatch(theta={"b":3}) # makes sure that the parameter b is set to 3
evaluator()
evaluator.results
/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/simulation.py:723: UserWarning: The number of ODE states was not specified in the config file [simulation] > 'n_ode_states = <n>'. Extracted the return arguments ['a+t*b'] from the source code. Setting 'n_ode_states=1.
warnings.warn(
<xarray.Dataset> Dimensions: (t: 100) Coordinates:
t (t) float64 0.0 0.101 0.202 0.303 0.404 … 9.697 9.798 9.899 10.0 Data variables: data (t) float64 0.0 0.303 0.6061 0.9091 1.212 … 29.09 29.39 29.7 30.0
This returns a dataset which is of the exact same shape as the observation dataset, plus intermediate variables that were created during the simulation, if they are tracked by the solver.
Although this API seems to be a bit clunky, it is necessary, to make sure that simulations that are executed in parallel are isolated from each other.
Estimating parameters#
We are almost set to infer the parameters of the model. We add another parameter to also estimate the error of the parameters, We use a lognormal distribution for it. We also specify an error model for the distribution. This will be
Further we also have to make some assumptions for the parameter \(b\) which we want to fit. First, we have to define the prior function from which we draw the parameter values during the parameter estimation. Additionally, we set the min and max values for our parameters. This can also be done in one step, as can be seen for the error-model parameter sigma_y.
sim.config.model_parameters.b.prior = "lognorm(scale=1,s=1)"
sim.config.model_parameters.b.min = -5
sim.config.model_parameters.b.max = 5
#construction the error model
sim.config.model_parameters.sigma_y = Param(free=True , prior="lognorm(scale=1,s=1)", min=0, max=1)
sim.config.error_model.data = "normal(loc=data,scale=sigma_y)"
As sigma_y is not a fixed parameter, the new parameter does not have to be passed to the simulation class.
sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict
sim.model_parameters['parameters']
{'a': array(0), 'b': array(3), 'sigma_y': 0.0}
Manual estimation#
First, we try estimating the parameters by hand. For this we have a simple interactive backend.
Note that changing sigma_y has no effect on the model fit because sigma_y is only used for the error model.
from matplotlib import pyplot as plt
def plot(results: xr.Dataset):
obs = sim.observations
SSE = ((results.data - obs.data) ** 2).sum(dim="t") #calculating the sum of squared errors
fig, ax = plt.subplots(1,1)
ax.plot(results.t, results.data, lw=2, color="black")
ax.plot(obs.t, obs.data, ls="", marker="o", color="tab:blue", alpha=.5)
ax.set_xlim(-1,12)
ax.set_ylim(-1,30)
ax.text(0.05, 0.95, f"SSE={np.round(SSE.values, 2)}", transform=ax.transAxes, ha="left", va="top")
sim.plot = plot
sim.interactive()
HBox(children=(VBox(children=(FloatSlider(value=3.0, description='b', max=5.0, min=-5.0, step=None), FloatSlid…
Estimating parameters and uncertainty with MCMC#
Of course this example is very simple, we can in fact optimize the parameters perfectly by hand. But just for the fun of it, let’s use Markov Chain Monte Carlo (MCMC) to estimate the parameters, their uncertainty and the uncertainty in the data.
The inferer serves as the parameter estimator. Different inferer are implemented in numpy and can be found at the beginning of the tuorial and in the API. The method for the parameter estimation is defined by using pymob.simulation.SimulationBase.set_inferer(). This automatically translates the pymob data in the format of the selected inferer. Numpyro additionally needs a kernel. To start the estimation you use pymob.simulation.SimulationBase.inferer.run().
Note that other methods often don’t need a kernel.
numpyro distributions
Currently only few distributions are implemented in the numpyro backend. This API will soon change, so that basically any distribution can be used to specifcy parameters.
Finally, we let our inferer run the paramter estimation procedure with the numpyro backend and a NUTS kernel. This does the job in a few seconds.
sim.dispatch_constructor() # important to call this before running the inferer
sim.set_inferer("numpyro")
sim.inferer.config.inference_numpyro.kernel = "nuts"
sim.inferer.run()
sim.inferer.idata.posterior
Jax 64 bit mode: False
Absolute tolerance: 1e-07
/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/inference/numpyro_backend.py:565: UserWarning: Model is not rendered, because the graphviz executable is not found. Try search for 'graphviz executables not found' and the used OS. This should be an easy fix :-)
warnings.warn(
Trace Shapes:
Param Sites:
Sample Sites:
b dist |
value |
sigma_y dist |
value |
data_obs dist 100 |
value 100 |
0%| | 0/3000 [00:00<?, ?it/s]
warmup: 0%| | 1/3000 [00:01<1:10:14, 1.41s/it, 1 steps of size 1.87e+00. acc. prob=0.00]
warmup: 15%|█▍ | 443/3000 [00:01<00:06, 408.44it/s, 7 steps of size 1.05e+00. acc. prob=0.79]
warmup: 30%|██▉ | 894/3000 [00:01<00:02, 885.99it/s, 3 steps of size 1.12e+00. acc. prob=0.79]
sample: 45%|████▍ | 1348/3000 [00:01<00:01, 1406.87it/s, 3 steps of size 8.14e-01. acc. prob=0.94]
sample: 60%|██████ | 1801/3000 [00:01<00:00, 1935.07it/s, 3 steps of size 8.14e-01. acc. prob=0.93]
sample: 75%|███████▍ | 2249/3000 [00:01<00:00, 2430.07it/s, 3 steps of size 8.14e-01. acc. prob=0.93]
sample: 90%|████████▉ | 2699/3000 [00:02<00:00, 2880.02it/s, 3 steps of size 8.14e-01. acc. prob=0.93]
sample: 100%|██████████| 3000/3000 [00:02<00:00, 1446.88it/s, 3 steps of size 8.14e-01. acc. prob=0.93]
mean std median 5.0% 95.0% n_eff r_hat
b 2.65 0.03 2.65 2.61 2.70 1510.47 1.00
sigma_y 1.56 0.11 1.55 1.36 1.73 1549.92 1.00
Number of divergences: 0
<xarray.Dataset> Dimensions: (chain: 1, draw: 2000) Coordinates:
chain (chain) int64 0
draw (draw) int64 0 1 2 3 4 5 6 7 … 1993 1994 1995 1996 1997 1998 1999 cluster (chain) int64 0 Data variables: b (chain, draw) float32 2.614 2.701 2.686 2.663 … 2.631 2.626 2.639 sigma_y (chain, draw) float32 1.68 1.447 1.514 1.4 … 1.417 1.656 1.532 Attributes: created_at: 2026-04-10T05:48:06.011655+00:00 arviz_version: 0.21.0
We can inspect our estimates and see that the parameters are well esimtated by the model. Note that we only get an estimate for b. This is because earlier we set the parameter a with the flag free=False this effectively excludes it from estimation and uses the default value, which was set to the true value a=0.
The meanof b is the value of the estimated parameter. It should be the same or close to estimation you did manually. The sigma_y is the mean error of this estimation.
Plot the results#
Pymob provides a very basic utility for plotting posterior predictions. We see that the mean is a perfect fit and also that the uncertainty in the data is correctly displayed. Fantstic 🎉
sim.config.simulation.x_dimension = "t"
sim.posterior_predictive_checks(pred_hdi_style={"alpha": 0.1})

Customize the posterior predictive checks
You can explore the API of pymob.sim.plot.SimulationPlot to find out how you can work on the default predictions. Of course you can always make your own plot, by accessing pymob.simulation.inferer.idata and pymob.simulation.observations
Report the results#
The command pymob.simulation.SimulationBase.report() can be used to generate an automated report. The report can be configured with options in pymob.simulation.SimulationBase.config.report().
sim.report()
[WARNING] Could not fetch resource 'probability_model.png': replacing image with description
[WARNING] This document format requires a nonempty <title> element.
Defaulting to 'report' as the title.
To specify a title, use 'title' in metadata or --metadata title="...".
Exporting the simulation and running it via the case study API#
After constructing the simulation, all settings of the simulation can be exported to a comprehensive configuration file, along with all the default settings. This is as simple as
import os
sim.config.case_study.name = "quickstart"
sim.config.case_study.scenario = "test"
sim.config.create_directory("scenario", force=True)
sim.config.create_directory("results", force=True)
# usually we expect to have a data directory in the case
os.makedirs(sim.data_path, exist_ok=True)
sim.save_observations(force=True)
sim.config.save(force=True)
Scenario directory exists at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/quickstart/scenarios/test'.
Results directory exists at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/quickstart/results/test'.
The simulation will be saved to the default path (CASE_STUDY/scenarios/SCENARIO/settings.cfg) or to a custom file path specified with the fp keyword. force=True will overwrite any existing config file, which is the reasonable choice in most cases.
From there on, the simulation is (almost) ready to be executable from the commandline.
Commandline API#
The commandline API runs a series of commands that load the case study, execute the pymob.simulation.SimulationBase.initialize() method and perform some more initialization tasks, before running the required job.
pymob-infer: Runs an inference job e.g.pymob-infer --case_study=quickstart --scenario=test --inference_backend=numpyro. While there are more commandline options, these are the two required