Simulation#
The modelling toolkit offers a variety of algorithms that can be used on a large number of models. Recognizing that simulation, optimization/calibration, parameter inference, sensitivity analysis, validation, etc. require similar workflows and have common input and output datastreams allows the abstraction of shared processes into a generic Simulation class, which can be reused for the listed purposes.
Outdated documentation
The description of the simulation components is outdated. Especially the components parameterize and set_coordinates are now handled automatically, when parameters are defined, but they can still be used without problems.
Simulation components#
Any simulation has recurring components. Facilitating the transfer of information between those components is key to analyze a described model with different tools. In this framework, simulations are defined as classes which require the definition of methods that define the simulation.
# sim.py
from timepath import SimulationBase
from mod import lotka_volterra, solve_jax
import plot # imports the plot module which contains plotting functions
import prob # imports custom probability models for bayesian inference
class Simulation(SimulationBase):
solver: Callable = solve_jax # a function f(t, x, θ) = y
model: Callable = lotka_volterra # an ODE term df/dt = ... to be used with the solver
# import extra modules with outsourced code to keep the Simulation class
# compact
prob = prob # probability models that go beyond automatically generated models
mplot = plot # plot methods can now be referenced with
# self.mplot.my_beautiful_plot(self, ...)
def initialize(self, input: List):
# define the model parameters (can of course be also loaded from an
# input file)
self.model_parameters["parameters"] = dict(
alpha = 0.1, # Prey birth rate
beta = 0.02, # Rate at which predators decrease prey population
gamma = 0.3, # Predator reproduction rate
delta = 0.01, # Predator death rate
)
# define the initial populations
self.model_parameters["y0"] = np.array([40, 9]) # initial population of prey and predator
# used to define observations and any other information needed by the solver
self.observations = xr.load_dataset(input[1])
self.observations["wolves"] = self.observations["wolves"] + 1e-8
self.observations["rabbits"] = self.observations["rabbits"] + 1e-8
@staticmethod
def parameterize(free_parameters: Dict, model_parameters: Dict):
# receives a dictionary of free_parameters from a simulation evaluation
# and uses the free parameters to update the model_parameters dictionary,
# which can contain additional information passed to the solver
# can be used to modify parameters, before they are passed to the
# solver
# Initial conditions and parameters
y0 = model_parameters["y0"]
parameters = model_parameters["parameters"]
# mapping of parameters *theta* to the model parameters accessed by
# the solver. This task is necessary for any model
parameters.update(free_parameters)
return dict(y0=y0, parameters=parameters)
def set_coordinates(self, input):
# the model is integrated over the dimension time
# e.g.
t_start = 0
t_end = 200
t_step = 0.1
time = np.arange(t_start, t_end, t_step)
return time
def plot(self, results):
# while plot methods are not required to simply run a simulation they
# are very helpful for using the interactive backend. These plots
# are always updated when parameters of the plots are changed.
# If a JAX based solver is used (compiled to highly efficient code)
# then the updates can be in real time depending on the model complexity
fig = self.mplot.plot_trajectory(results)
fig.savefig(f"{self.output_path}/trajectory.png")
The Simulation class is not restricted to these methods, any number of methods can be added that customize the behavior, add plot methods, or any other function.
Before the entire architecture and the different methods of the Simulation are explained one by one in greater detail, let’s take a look at how to run a simulation and get started!
from pymob.utils.store_file import prepare_casestudy
# load the configuration file of the case study by providing the name of the
# case study directory and the name of the scenario directory. This assumes that
# all case studies are contained in a directory "case_studies".
#
# In this example we have the following directory tree:
# project_directory (working directory)
# ├─ case_studies
# │ ├─ lotka_volterra_case_study
# │ │ ├─ scenarios
# │ │ │ ├─ test_scenario
# │ │ │ │ └─ settings.cfg
# │ | | └─ test_scenario_2
# │ │ │ │ └─ settings.cfg
# | | ├─ sim.py
#
config = prepare_casestudy(
case_study=("lotka_volterra_case_study", "test_scenario"),
config_file="settings.cfg",
pkg_dir="case_studies"
)
# loads the defined Simulation class from sim.py at the root of the case-study
# directory
from sim import Simulation
# create a Simulation object by passing the configuration file to the Simulation
# class defined in sim.py
sim = Simulation(config)
# execute a simulation with a
func = sim.dispatch(theta={"alpha": 0.5}) # initiate the simulation
func() # run the simulation (takes more time for the first run)
func.results # obtain the results
# plot the results with the plotting function defined in the Simulation class
sim.plot(results=f.results)
Model and the solver. Information flow in the Simulation class when using the dispatch API#
The moment a simulation is dispatched, an instance of the Evaluator class is assembled with the specified parameters \(\theta\). Next to the model, solver, and parameters, the Evaluator obtains a number additional attributes from the Simulation instance. These are:
dimensions: A list of the dimensions of the simulationsn_ode_states: The number of ODE states trackedvar_dim_mapper: A list of variables and their associated dimensions. This is relevant for simulations, where not all data variables have the same dimensional structuredata_structure: Similar to the var_dim_mapper, but additionally contains the coordinatescoordinates: The coordinates of each dimension in a dictdata_variables: The data variables of the simulationstochastic: Whether the model is a stochastic or a deterministic modelindices: Indices, which should be used to map potentially nested parameters to a flat array for batch processing the simulations, by default {}post_processing: A function that takes a dictionary of simulation results and parameters as an input and adds new variables to the results, by default None, meaning that no post processing of the ODE solution is performed
These additional input can be magically picked up by the solver if the terms are used as function arguments of the solver function.
def solver(model, parameters, coordinates, indices):
...
if extra arguments are provided to the solve function, then they must be passed to the dispatch call of the Evaluator.
def solver(model, parameters, my_extra_argument):
...
...
evaluator = sim.dispatch(theta=..., my_extra_argument="foo")
Note that theoretically no ODE model is needed. If you have a deterministic solution to your problem, or you have an entirely different model (e.g. a numeric individual based model), than you can ignore the model and specify all you need for solving a problem in the solver
def solver(parameters, coordinates):
x = coordinates["x"]
a = parameters["a"]
b = parameters["b"]
y = a * x + b
return {"y": y}
Back to the Evaluator instance. It has a __call__ method, which is invoked, when the evaluator instance is called evaluator(). This runs the solve method with all specified arguments and stores the raw results to an attribute Y. When evaluator.results is accessed, the results dictionary returned by the solver is reshaped to an xarray.Dataset, which maps 1:1 to the observations with respect to the dimensionality of the output.
Note that the coordinates may differ, unless the coordinates of the observations are used as simulation coordinates.
Model parameters#
Model parameters are a critical part of the model. Changing model parameters changes the output of a model. Parameters are involved in many typical tasks of model development:
parameter calibration (or optimization)
parameter inference (slightly different operation)
senstivity analysis
In addition to the model parameters, the parameterize method also returns
the initial conditions of the simulation.
@staticmethod
def parameterize(free_parameters, model_parameters):
# Initial conditions and parameters
y0 = model_parameters["y0"]
parameters = model_parameters["parameters"]
# mapping of parameters *theta* to the model parameters accessed by
# the solver. This task is necessary for any model
parameters.update(free_parameters)
return dict(y0=y0, parameters=parameters)
When the simulation is initialized, model_parameters may be pre-specified.
like this
def initialize(self, input):
...
self.model_paramaters["y0"] = ...
self.model_parameters["parameters"] = dict(fixed_param=2.34)
...
This attribute model_parameters is then locked to the parameterize method by
using the python functools function partial
def __init__(self, ...):
...
self.parameterize = partial(self.parameterize, model_parameters=self.model_parameters)
This means in the initialize function of your simulation you can set fixed
parameters to the model. Free parameters specified in the dispatch call
`sim.dispatch(theta=dict(…)) will then arrive as free_parameters in the
parameterize call. If this API seems a bit complicated, it is probably true.
At the time of writing this, it seemed like a good and easy option. This will
be re-evaluated in the future and the API might change.
Simulation coordinates#
The model coordinates describe the dimensions over which the model is solved. In a simple model only one coordinate is present. This dimension can be the time dimension.
A very simple example:
def set_coordinates(self, input)
time = np.linspace(0, 100, 50)
return time
Models can also have a batch dimension, if it is part of the model that multiple replicates are simulated as one trial of an experiment.
def set_coordinates(self, input)
time = np.linspace(0, 100, 50)
sample = [1,2,3]
return time, sample
Any number of dimension can be specified as long as those dimensions are also
named in the same order in the settings.cfg file
[simulation]
dimensions = time sample
Observations#
Observations must have the same form as the model output
Input / Output#
As a uniform exchange format netcdf is enforced by the package. In python netcdf files are handled by the xarray package. For convenience, data_variables of interest can be saved as a number of output formats. However, higher dimesnional datasets with more coordinates than e.g. time, must be aggregated over the remaining dimensions, before they can be processed to .csv files