Pymob quickstart#

Initialize a simulation#

In pymob a Simulation object is initialized by calling the pymob.simulation.SimulationBase class from the simulation module.

from pymob.simulation import SimulationBase

sim = SimulationBase()

/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/sim/config/base.py:397: UserWarning: Case study 'unnamed_case_study' could not be imported. Install the case study with `pip install unnamed_case_study`.
  warnings.warn(

Configuring the simulation

Optionally, we can configure the simulation at this stage with sim.config.case_study.name = "linear-regression", sim.config.case_study.scenario = "test", and many more options.

Define a model#

Let’s investigate a linear regression as the most simple task.

def linreg(x, a, b):
    return a + x * b

So we assume that this model describes our data well. So we add it to the simulation

sim.model = linreg

Defining a solver#

In our case the model gives the exact solution of the model. Solvers in pymob are callables that need to return a dictionary of results mapped to the data variables

from pymob.sim.solvetools import solve_analytic_1d
sim.solver = solve_analytic_1d

Generate artificial data#

In the real world, you will have measured a dataset. For demonstration, we define parameters \(theta\), that we assume describe the true data generating process and generate observations \(y\). Then we generate data for \(x\) on [-5, 5] and add random noise with a standard deviation of \(\sigma_y = 1\).

import numpy as np
rng = np.random.default_rng(1)

# define the coordinates of the x-dimension to generate data for
x = np.linspace(-5, 5, 50)

# define a set of parameters θ
theta = dict(a=0, b=1, sigma_y=1)

# then simulate some data and add some noise
y = linreg(x=x, a=theta["a"], b=theta["b"])
y_noise = rng.normal(loc=y, scale=theta["sigma_y"])

The pymob magic 🪄#

So far we have not done anythin special. Pymob exists, because wrangling dimensions of input and output data, nested data-structures, missing data is painful. We avoid most of the mess by using xarray as a common input/output format. So we have to transform our data into a xarray.Dataset and add it to the simulation.

import xarray as xr

sim.observations = xr.DataArray(y_noise, coords={"x": x}).to_dataset(name="y")

MinMaxScaler(variable=y, min=-5.690912333645177, max=5.891166954282328)


/home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/latest/lib/python3.11/site-packages/pymob/simulation.py:366: UserWarning: `sim.config.data_structure.y = Datavariable(dimensions=['x'] unit=None min=-5.690912333645177 max=5.891166954282328 observed=True dimensions_evaluator=None)` has been assumed from `sim.observations`. If the order of the dimensions should be different, specify `sim.config.data_structure.y = DataVariable(dimensions=[...], ...)` manually.
  warnings.warn(

This worked 🎉 sim.config.data_structure will now give us some information about the layout of our data, which will handle the data transformations in the background.

What happens when we assign a Dataset to the observations attribute?

Debug into the function and discover what happens!

We can give pymob additional information about the data structure of our observations and intermediate (unobserved) variables that are simulated. This can be done with sim.config.data_structure.y = DataVariable(dimensions=["x"]). These information can be used to switch the dimensional order of the observations or provide data variables that have differing dimensions from the observations, if needed. But if the dataset is ordinary, simply setting sim.observations property with a xr.Dataset will be sufficient.

Scalers

We also notice a mysterious Scaler message. This tells us that our data variable has been identified and a scaler was constructed, which transforms the variable between [0, 1]. This has no effect at the moment, but it can be used later. Scaling can be powerful to help parameter estimation in more complex models.

Parameterizing a model#

Parameters are specified via the FloatParam or ArrayParam class. Parameters can be marked free or fixed depending on whether they should be variable during an optimization procedure.

from pymob.sim.config import Param
sim.config.model_parameters.a = Param(value=0, free=False)
sim.config.model_parameters.b = Param(value=3, free=True)
# this makes sure the model parameters are available to the model.
sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict

sim.model_parameters is a dictionary that holds the model input data. The keys it takes by default are parameters, y0 and x_in. In our case, we have a analytic model and need only parameters. In situations, where initial values for variables are needed, they can be provided with sim.model_parameters["y0"] = ....

generating input for solvers

A helpful function to generate y0 or x_in from observations is SimulationBase.parse_input, combined with settings of config.simulation.y0

Exporting the simulation and running it via the case study API#

After constructing the simulation, all settings of the simulation can be exported to a comprehensive configuration file, along with all the default settings. This is as simple as

import os
sim.config.case_study.name = "quickstart"
sim.config.case_study.scenario = "test"
sim.config.create_directory("scenario", force=True)
sim.config.create_directory("results", force=True)

# usually we expect to have a data directory in the case
os.makedirs(sim.data_path, exist_ok=True)
sim.save_observations(force=True)
sim.config.save(force=True)

Scenario directory created at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/quickstart/scenarios/test'.
Results directory created at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/latest/docs/source/user_guide/case_studies/quickstart/results/test'.

The simulation will be saved to the default path (CASE_STUDY/scenarios/SCENARIO/settings.cfg) or to a custom path spcified with the fp keyword. force=True will overwrite any existing config file, which is the reasonable choice in most cases.

From there on, the simulation is (almost) ready to be executable from the commandline.

Commandline API#

The commandline API runs a series of commands that load the case study, execute the pymob.simulation.SimulationBase.initialize() method and perform some more initialization tasks, before running the required job.

pymob-infer: Runs an inference job e.g. pymob-infer --case_study=quickstart --scenario=test --inference_backend=numpyro. While there are more commandline options, these are the two required

Pymob quickstart#

Initialize a simulation#

Define a model#

Defining a solver#

Generate artificial data#

The pymob magic 🪄#

Parameterizing a model#

Running the model 🏃#

Estimating parameters#

Manual estimation#

Estimating parameters and uncertainty with MCMC#

Plot the results#

Report the results#

Exporting the simulation and running it via the case study API#

Commandline API#