Pymob quickstart#
Initialize a simulation#
In pymob a Simulation object is initialized by calling the SimulationBase class from the simulation module.
from pymob.simulation import SimulationBase
sim = SimulationBase()
Configuring the simulation
Optionally, we can configure the simulation at this stage with
sim.config.case_study.name = "linear-regression", sim.config.case_study.scenario = "test", and many more options.
Define a model#
Letβs investigate a linear regression as the most simple task.
def linreg(x, a, b):
return a + x * b
So we assume that this model describes our data well. So we add it to the simulation
sim.model = linreg
Defining a solver#
In our case the model gives the exact solution of the model. Solvers in pymob are callables that need to return a dictionary of results mapped to the data variables
from pymob.sim.solvetools import solve_analytic_1d
sim.solver = solve_analytic_1d
Generate artificial data#
In the real world, you will have measured a dataset. For demonstration, we define parameters \(theta\), that we assume describe the true data generating process and generate observations \(y\). Then we generate data for \(x\) on [-5, 5] and add random noise with a standard deviation of \(\sigma_y = 1\).
import numpy as np
rng = np.random.default_rng(1)
# define the coordinates of the x-dimension to generate data for
x = np.linspace(-5, 5, 50)
# define a set of parameters ΞΈ
theta = dict(a=0, b=1, sigma_y=1)
# then simulate some data and add some noise
y = linreg(x=x, a=theta["a"], b=theta["b"])
y_noise = rng.normal(loc=y, scale=parameters["sigma_y"])
The pymob magic πͺ#
So far we have not done anythin special. Pymob exists, because wrangling dimensions of input and output data, nested data-structures, missing data is painful. We avoid most of the mess by using xarray as a common input/output format. So we have to transform our data into a xarray.Dataset and add it to the simulation.
import xarray as xr
sim.observations = xr.DataArray(y_noise, coords={"x": x}).to_dataset(name="y")
This worked π sim.config.data_structure will now give us some information about the layout of our data, which will handle the data transformations in the background.
We can give pymob additional information about the data structure of our observations and intermediate (unobserved) variables that are simulated. This can be done with sim.config.data_structure.y = DataVariable(dimensions=["x"]).
These information can be used to switch the dimensional order of the observations or provide data variables that have differing dimensions from the observations, if needed. But if the dataset is ordinary, simply setting sim.observations property with a xr.Dataset will be sufficient.
Scalers
We also notice a mysterious Scaler message. This tells us that our data variable has been identified and a scaler was constructed, which transforms the variable between [0, 1]. This has no effect at the moment, but it can be used later. Scaling can be powerful to help parameter estimation in more complex models.
Parameterizing a model#
Parameters are specified via the FloatParam or ArrayParam class. Parameters can be marked free or fixed depending on whether they should be variable during an optimization procedure.
from pymob.sim.config import FloatParam
sim.config.model_parameters.a = FloatParam(value=10, free=False)
sim.config.model_parameters.b = FloatParam(value=3, free=True)
# this makes sure the model parameters are available to the model.
sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict
sim.model_parameters is a dictionary that holds the model input data. The keys it takes by default are parameters, y0 and x_in. In our case, we have a analytic model and need only parameters. In situations, where initial values for variables are needed, they can be provided with sim.model_parameters["y0"] = ....
generating input for solvers
A helpful function to generate y0 or x_in from observations is SimulationBase.parse_input, combined with settings of config.simulation.y0
Running the model π#
The model is prepared with a parameter set and executed with
evaluator = sim.dispatch(theta={"b":3})
evaluator()
evaluator.results
This returns a dataset which is of the exact same shape as the observation dataset, plus intermediate variables that were created during the simulation, if they are tracked by the solver.
Although this API seems to be a bit clunky, it is necessary, to make sure that simulations that are executed in parallel are isolated from each other.
Estimating parameters#
We are almost set infer the parameters of the model. We add another parameter to also estimate the error of the parameters, We use a lognormal distribution for it. We also specify an error model for the distribution. This will be
sim.config.model_parameters.sigma_y = FloatParam(free=True , prior="lognorm(scale=1,s=1)")
sim.config.error_model.y = "normal(loc=y,scale=sigma_y)"
numpyro distributions
Currently only few distributions are implemented in the numpyro backend. This API will soon change, so that basically any distribution can be used to specifcy parameters.
Finally, we let our inferer run the paramter estimation procedure with the numpyro backend and a NUTS kernel. This does the job in a few seconds
sim.set_inferer("numpyro")
sim.inferer.config.inference_numpyro.kernel = "nuts"
sim.inferer.run()
sim.inferer.idata.posterior
We can inspect our estimates and see that the parameters are well esimtated by the model.