# Pymob in minutes - the basics This guide provides a streamlined introduction to the basic Pymob workflow and its key functionalities. We will explore a simple linear regression model that we want to fit to a noisy dataset. Pymob supports the modeling process by providing several tools for *data structuring*, *parameter estimation* and *visualization of results*. If you are looking for a more detailed introduction, [click here](user_guide/Introduction). If you want to learn how to work with ODE models, check out [this tutorial](user_guide/advanced_tutorial_ODE_system). ## Pymob components 🧩 Before starting the modeling process, let's take a look at the main steps and modules of pymob: 1. __Simulation:__ First, we need to initialize a Simulation object by creating an instance of the {class}`pymob.simulation.SimulationBase` class from the simulation module. Optionally, we can configure the simulation with `sim.config.case_study.name = "linear-regression"`, `sim.config.case_study.scenario = "test"` and many other options. 2. __Model:__ Our model will be defined as a standard python function. We will then assign it to the Simulation object by accessing the `.model` attribute. 3. __Observations:__ Our observation data must be structured as an [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html). We assign it to the {attr}`~pymob.simulation.SimulationBase.observations` attribute of our Simulation object. Calling `sim.config.data_structure` will give us further information about the layout of our data. 4. __Solver:__ A solver ({mod}`~pymob.solvers`) is required to solve the model. In our simple case, we will use the `solve_analytic_1d` solver from the {mod}`~pymob.solvers.analytic` module. We assign it to our Simulation object using the {attr}`~pymob.simulation.SimulationBase.solver` attribute. Since our model already provides an analytical solution, this solver basically does nothing. It is still needed to fulfill Pymob's requirement for a solver component. For more complex models (e.g. ODEs), the `JaxSolver` from the {mod}`~pymob.solvers.diffrax` module is a more powerful option. Users can also implement custom solvers as a subclass of {class}`pymob.solvers.base.SolverBase`. 5. __Inferer:__ The inferer handels the parameter estimation. Pymob supports [various backends](https://pymob.readthedocs.io/en/stable/user_guide/framework_overview.html). In this example, we will work with *NumPyro*. We assign the inferer to our Simulation object via the {attr}`~pymob.simulation.SimulationBase.inferer` attribute and configure the desired kernel (e.g. *nuts*). But before inference, we need to parameterize our model using the {class}`~pymob.sim.parameters.Param` class. Each parameter can be marked either as free or fixed, depending on whether it should be variable during the optimization procedure. The parameters are stored in the {attr}`~pymob.simulation.SimulationBase.model_parameters` dictionary, which holds model input values. By default, it takes the keys: `parameters`, `y0` and `x_in`. 6. __Evaluator:__ The Evaluator is an instance to manage model evaluations. It sets up tasks, coordinates parallel runs of the simulation and keeps track of the results from each simulation or parameter inference process. Evaluators store the raw output from a simulation and can generate an xarray object from it that corresponds to the data-structure of the observations with the {attr}`~pymob.sim.evaluator.Evaluator.results` property. This automatically aligns the simulations results with the observations, for simple computation of loss functions. 7. __Config:__ The simulation settings will be saved in a `.cfg` configuration file. The config file contains information about our simulation in various sections. [Learn more here](case_studies.md#configuration). We can further use it to create new simulations by loading settings from a config file. ![framework-overview](./figures/pymob_overview.png) ## Getting started 🛫 ```python # First, import the necessary python packages import numpy as np import matplotlib.pyplot as plt import xarray as xr # Import the pymob modules from pymob.simulation import SimulationBase from pymob.sim.solvetools import solve_analytic_1d from pymob.sim.config import Param ``` /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_class" has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`. warnings.warn( Since no measured data is provided, we will generate an artificial dataset. $y_{obs}$ represents the **observed data** over the time $t$ [0, 10]. To use this data later in the simulation, we need to convert it into an **xarray dataset**. In your own application, you would replace this with your measured experimental data. ```python # Parameter for the artificial data generation rng = np.random.default_rng(seed=1) # for reproducibility slope = rng.uniform(2,4) intercept = 1.0 num_points = 101 noise_level = 1.7 # generating time values t = np.linspace(0, 10, num_points) # generating y-values with noise noise = rng.normal(0, noise_level, num_points) y_obs = slope * t + intercept + noise # visualizing our data fig, ax = plt.subplots(figsize=(5, 4)) ax.scatter(t, y_obs, label='Datapoints') ax.set(xlabel='t [-]', ylabel='y_obs [-]', title ='Artificial Data') plt.tight_layout() # convert the data to an xr-Dataset data_obs = xr.DataArray(y_obs, coords={"t": t}).to_dataset(name="y") data_obs ```
<xarray.Dataset>
Dimensions:  (t: 101)
Coordinates:
  * t        (t) float64 0.0 0.1 0.2 0.3 0.4 0.5 ... 9.5 9.6 9.7 9.8 9.9 10.0
Data variables:
    y        (t) float64 2.397 1.864 -0.6106 3.446 ... 27.91 31.2 29.83 32.7
![png](superquickstart_files/superquickstart_7_1.png) ## Initialize a simulation ✨ In pymob, a **simulation object** is initialized by creating an instance of the {class}`~pymob.simulation.SimulationBase` class from the simulation module. We will choose a linear regression model, as it provides a good approximation of the data: $ y = a + b*x $ ```{admonition} x-dimension :class: note The x_dimension of our simulation can have any name, for example t as often used for time series data. You can specify it via `sim.config.simulation.x_dimension`. ``` ```python # Initialize the Simulation object sim = SimulationBase() # configurate the case study sim.config.case_study.name = "superquickstart" sim.config.case_study.scenario = "linreg" # Define the linear regression model def linreg(x, a, b): return a + b * x # Add the model to the simulation sim.model = linreg # Adding our dataset to the simulation sim.observations = data_obs # Defining a solver sim.solver = solve_analytic_1d # Take a look at the layut of the data sim.config.data_structure ``` Inserted './case_studies' in PATH at index=0 Inserted './case_studies/unnamed_case_study/unnamed_case_study' in PATH at index=0 MinMaxScaler(variable=y, min=-0.6106386438473108, max=32.702588647741905) /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/sim/config/base.py:387: UserWarning: Module sim.py not found in ./case_studies/unnamed_case_study/unnamed_case_study.Missing modules can lead to unexpected behavior.If a module is not imported, you can specify it in the Config 'config.case_study.modules = [...]' warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/sim/config/base.py:387: UserWarning: Module mod.py not found in ./case_studies/unnamed_case_study/unnamed_case_study.Missing modules can lead to unexpected behavior.If a module is not imported, you can specify it in the Config 'config.case_study.modules = [...]' warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/sim/config/base.py:387: UserWarning: Module prob.py not found in ./case_studies/unnamed_case_study/unnamed_case_study.Missing modules can lead to unexpected behavior.If a module is not imported, you can specify it in the Config 'config.case_study.modules = [...]' warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/sim/config/base.py:387: UserWarning: Module data.py not found in ./case_studies/unnamed_case_study/unnamed_case_study.Missing modules can lead to unexpected behavior.If a module is not imported, you can specify it in the Config 'config.case_study.modules = [...]' warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/sim/config/base.py:387: UserWarning: Module plot.py not found in ./case_studies/unnamed_case_study/unnamed_case_study.Missing modules can lead to unexpected behavior.If a module is not imported, you can specify it in the Config 'config.case_study.modules = [...]' warnings.warn( /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/simulation.py:364: UserWarning: `sim.config.data_structure.y = Datavariable(dimensions=['t'] unit=None min=-0.6106386438473108 max=32.702588647741905 observed=True dimensions_evaluator=None)` has been assumed from `sim.observations`. If the order of the dimensions should be different, specify `sim.config.data_structure.y = DataVariable(dimensions=[...], ...)` manually. warnings.warn( Datastructure(y=DataVariable(dimensions=['t'], unit=None, min=-0.6106386438473108, max=32.702588647741905, observed=True, dimensions_evaluator=None)) ```{admonition} Scalers :class: note We notice a mysterious Scaler message. This tells us that our data variable has been identified and a scaler was constructed, which transforms the variable between [0, 1]. This has no effect at the moment, but it can be used later. Scaling can be powerful to help parameter estimation in more complex models. ``` ## Parameterizing and running the model 🏃 Next, we define the **model parameters** $a$ and $b$. Parameter $a$ is set as fixed (`free = False`), meaning its value is known and will not be estimated during optimization. Parameter $b$ is marked as free (`free = True`), allowing it to be optimized to fit the data. As an initial guess, we assume $b = 3$. ```python # Parameterizing the model sim.config.model_parameters.a = Param(value=1.0, free=False) sim.config.model_parameters.b = Param(value=3.0, free=True) # this makes sure the model parameters are available to the model. sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict sim.model_parameters["parameters"] ``` {'a': 1.0, 'b': 3.0} Our model is now prepared with a defined parameter set. To initialize the **Evaluator**, we call {meth}`~pymob.simulation.SimulationBase.dispatch_constructor()`. This step is essential and must be executed every time changes are made to the model. The returned dataset (`evaluator.results`) has the exact same shape as the observation data. ```python # put everything in place for running the simulation sim.dispatch_constructor() # run evaluator = sim.dispatch(theta={"b":3}) evaluator() evaluator.results ``` /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/simulation.py:719: UserWarning: The number of ODE states was not specified in the config file [simulation] > 'n_ode_states = '. Extracted the return arguments ['a+b*x'] from the source code. Setting 'n_ode_states=1. warnings.warn(
<xarray.Dataset>
Dimensions:  (t: 101)
Coordinates:
  * t        (t) float64 0.0 0.1 0.2 0.3 0.4 0.5 ... 9.5 9.6 9.7 9.8 9.9 10.0
Data variables:
    y        (t) float64 1.0 1.3 1.6 1.9 2.2 2.5 ... 29.8 30.1 30.4 30.7 31.0
```{admonition} What does the dispatch constructor do? :class: hint Behind the scenes, the dispatch constructor assembles a lightweight Evaluator object from the Simulation object, that takes the least necessary amount of information, runs it through some dimension checks, and also connects it to the specified solver and initializes it. ``` Let's take a look at the **results**. You can vary the parameter $b$ in the previous step to investigate its influence on the model fit. In the [Introduction](https://pymob.readthedocs.io/en/stable/user_guide/introduction.html), you can try out the *manual parameter estimation*, which is a feature provided by Pymob. ```python fig, ax = plt.subplots(figsize=(5, 4)) data_res = evaluator.results ax.plot(data_obs.t, data_obs.y, ls="", marker="o", color="tab:blue", alpha=.5, label ="observation data") ax.plot(data_res.t, data_res.y, color="black", label ="result") ax.legend() ``` ![png](superquickstart_files/superquickstart_18_1.png) ## Estimating parameters and uncertainty with MCMC 🤔 Of course this example is very simple. In fact, we could optimize the parameters perfectly by hand. But just for fun, let's use *Markov Chain Monte Carlo (MCMC)* to estimate the parameters, their uncertainty and the uncertainty in the data. We’ll run the parameter estimation with our **{attr}`~pymob.simulation.inferer`**, using the NumPyro backend with a NUTS kernel. This completes the job in a few seconds. We are almost ready to infer the model parameters. To also estimate the uncertainty of the parameters, we add another parameter representing the error and assume that it follows a lognormal distribution. Additionally, we specify an error model for the data distribution. This will be: $$y_{obs} \sim Normal (y, \sigma_y)$$ Since $\sigma_y$ is not a fixed parameter, it doesn't need to be passed to the simulation class. ```python sim.config.model_parameters.sigma_y = Param(free=True , prior="lognorm(scale=1,s=1)", min=0, max=1) sim.config.model_parameters.b.prior = "lognorm(scale=1,s=1)" sim.config.error_model.y = "normal(loc=y,scale=sigma_y)" sim.set_inferer("numpyro") sim.inferer.config.inference_numpyro.kernel = "nuts" sim.inferer.run() # you can access the posterior distrubution by: sim.inferer.idata.posterior # Plot the results sim.config.simulation.x_dimension = "t" sim.posterior_predictive_checks(pred_hdi_style={"alpha": 0.1}) ``` Jax 64 bit mode: False Absolute tolerance: 1e-07 /home/docs/checkouts/readthedocs.org/user_builds/pymob/envs/0.5.27/lib/python3.11/site-packages/pymob/inference/numpyro_backend.py:565: UserWarning: Model is not rendered, because the graphviz executable is not found. Try search for 'graphviz executables not found' and the used OS. This should be an easy fix :-) warnings.warn( Trace Shapes: Param Sites: Sample Sites: b dist | value | sigma_y dist | value | y_obs dist 101 | value 101 | 0%| | 0/3000 [00:00 element. Defaulting to 'report' as the title. To specify a title, use 'title' in metadata or --metadata title="...". ![posterior_trace.png](superquickstart_files/posterior_trace.png) ![posterior_pairs.png](superquickstart_files/posterior_pairs.png) ## Exporting the simulation and running it via the case study API 📤 After constructing the simulation, all settings - custom and default - can be exported to a comprehensive configuration file. The simulation will be saved to the default path (`CASE_STUDY/scenarios/SCENARIO/settings.cfg`) or to a custom path, specified with the file path keyword `fp`. Setting `force=True` will overwrite any existing config file, which is a reasonable choice in most cases. From this point on, the simulation is (almost) ready to be executed from the command-line. ```python import os sim.config.create_directory("scenario", force=True) sim.config.create_directory("results", force=True) # usually we expect to have a data directory in the case os.makedirs(sim.data_path, exist_ok=True) sim.save_observations(force=True) sim.config.save(force=True) ``` Scenario directory created at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/0.5.27/docs/source/user_guide/case_studies/superquickstart/scenarios/linreg'. Results directory exists at '/home/docs/checkouts/readthedocs.org/user_builds/pymob/checkouts/0.5.27/docs/source/user_guide/case_studies/superquickstart/results/linreg'. ### Commandline API The command-line API runs a series of commands that load the case study, execute the {meth}`~pymob.simulation.SimulationBase.initialize` method and perform some more initialization tasks before running the required job. + `pymob-infer` runs an inference job, for example: `pymob-infer --case_study=quickstart --scenario=test --inference_backend=numpyro`. While there are more command-line options, these two (--case_study and --scenario) are required.