pymob package#

Submodules#

pymob.simulation module#

class pymob.simulation.SimulationBase(config: str | ConfigParser | Config | None = None)#

Bases: object

Construct a simulation directly to construct a new simulation instance, use it with a config file for modifying or playing with existing simulations or use for subclassing.

Components#

modelCallable: A python function that returns one or multiple numeric values or arrays The number and dimensionality of the output must be specified in the pymob.sim.config.Datastructure, which takes pymob.sim.config.DataVariable as input.
model_parametersDict[‘parameters’: Dict[str, float|Array], ‘y0’: xarray.Dataset, ‘x_in’: xarray.Dataset]: Model parameters is a dictionary containing 3 keys: ‘parameters’ (parameters), ‘y0’ (initial values), and ‘x_in’ (input that can be interpolated). Only ‘theta’ is a mandatory component.

Direct use#

In the direct use, {class}`pymob.simulation.SimulationBase` is instantiated and the relevant model attributes are set. Each simulation needs these parameters

>>> import xarray as xr
>>> from pymob import SimulationBase
>>> from pymob.examples import linear_model
>>> from pymob.sim.solvetools import solve_analytic_1d

Instantiate the model and assign the data. ALthough assigning data is not mandatory, it makes setting up a model easier, because the coordinates, and dimensions are simply taken from the observations dataset

>>> sim = SimulationBase()
>>> linreg, x, y, y_noise, parameters = linear_model(n=5)
>>> obs = xr.DataArray(y_noise, coords={"x": x}).to_dataset(name="y")
>>> sim.observations = obs
MinMaxScaler(variable=y, min=-4.654415807935214, max=5.905355866673117)

Parameterize the model

>>> sim.model = linreg
>>> sim.solver = solve_analytic_1d
>>> sim.config.model_parameters.a = Param(value=10, free=False)
>>> sim.config.model_parameters.b = Param(value=3, free=True , prior="normal(loc=0,scale=10)") # type:ignore
>>> sim.model_parameters["parameters"] = sim.config.model_parameters.value_dict

Run the model

>>> sim.dispatch_constructor()
>>> evaluator = sim.dispatch(theta={"b":3})
>>> evaluator()
>>> evaluator.results
<xarray.Dataset>
Dimensions:  (x: 5)
Coordinates:
  * x        (x) float64 -5.0 -2.5 0.0 2.5 5.0
Data variables:
    y        (x) float64 -5.0 2.5 10.0 17.5 25.0

Subclassing use#

Subclassing SimulationBase makes sense if the Simulation is intended to be used with configuration files

>>> class LotkaVolterraSimulation(SimulationBase):
...     def initialize(self, input):
...         self.observations = xr.load_dataset(os.path.join(self.data_path, self.config.case_study.observations))
...         y0 = self.parse_input("y0", drop_dims=["time"])
...         self.model_parameters["y0"] = y0
...         self.model_parameters["parameters"] = self.config.model_parameters.value_dict

setup() calls initialize and a couple of other functions to set up the simulation. Afterwards methods like dispatch() can be used. The idea is to automatize the regular setup steps for a simulation in the initialize method. The reason why setup() is called explicitely and not implicitly by the __init__ method is to give the user the opportunity to change the configuration before initializing, such as the name of the scenario (sim.config.case_study.scenario), the results directory (sim.config.case_study.output) or any other configuration of the simulation

class Report(config: Config, backend: type)#

Bases: object

Creates a configurable report. To select which items to report and to fine-tune the report settings, modify the settings in config.report.

posterior(posterior)#: Much of it included in the parameter estimates and may add to the confusion

check_dimensions(dataarray)#: Check if dataset dimensions match the specified dimensions. TODO: Name datasets for referencing them in errormessages

check_scaled_results_feasibility(scaled_results)#

Parameter inference or optimization over many variables can only succeed in reasonable time if the results that should be compared are on approximately equal scales. The Simulation class, automatically estimates the scales of result variables, when observations are provided.

Problems can occurr when observations are on very narrow ranges, but the simulation results can take much larger or lower values for that variable. As a result the inference procedure will almost exlusively focus on the optimization of this variable, because it provides the maximal return.

The function warns the user, if simulation results largely deviate from the scaled minima or maxima of the observations. In this case manual minima and maxima should be given

compute()#: A wrapper around run, which catches errors, logs, does post processing

create_data_scaler()#: Creates a scaler for the data variables of the dataset over all remaining dimensions. In addition produces a scaled copy of the observations

create_interpolated_coordinates(dim)#: Combines coordinates from observations and from interpolation

property dimension_coords: Dict[str, Tuple[int | str, ...]]#: Goes through dimensions of data structure and adds coordinates, then goes through dimensions of parameters and searches in coordinates and indices to

dispatch(theta: ~typing.Mapping[str, float | ~numpydantic.vendor.nptyping.base_meta_classes.NDArray[~typing.Any, (<class 'numpy.float16'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.float32'>, <class 'numpy.float64'>)] | ~numpydantic.vendor.nptyping.base_meta_classes.NDArray[~typing.Any, (<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.int64'>, <class 'numpy.int16'>, <class 'numpy.uint8'>, <class 'numpy.uint16'>, <class 'numpy.uint32'>, <class 'numpy.uint64'>, <class 'numpy.uint16'>)] | ~typing.Sequence[float]] = {}, y0: ~typing.Mapping[str, float | ~numpydantic.vendor.nptyping.base_meta_classes.NDArray[~typing.Any, (<class 'numpy.float16'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.float32'>, <class 'numpy.float64'>)] | ~numpydantic.vendor.nptyping.base_meta_classes.NDArray[~typing.Any, (<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.int64'>, <class 'numpy.int16'>, <class 'numpy.uint8'>, <class 'numpy.uint16'>, <class 'numpy.uint32'>, <class 'numpy.uint64'>, <class 'numpy.uint16'>)] | ~typing.Sequence[float]] = {}, x_in: ~typing.Mapping[str, float | ~numpydantic.vendor.nptyping.base_meta_classes.NDArray[~typing.Any, (<class 'numpy.float16'>, <class 'numpy.float32'>, <class 'numpy.float64'>, <class 'numpy.float32'>, <class 'numpy.float64'>)] | ~numpydantic.vendor.nptyping.base_meta_classes.NDArray[~typing.Any, (<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.int32'>, <class 'numpy.int64'>, <class 'numpy.int16'>, <class 'numpy.uint8'>, <class 'numpy.uint16'>, <class 'numpy.uint32'>, <class 'numpy.uint64'>, <class 'numpy.uint16'>)] | ~typing.Sequence[float]] = {})#

Dispatch an evaluator, which will compute the model for the parameters (theta), starting values (y0) and model input (x_in).

Evaluators are advantageous, because they are easier serialized than the whole simulation object. Comparison can then happen back in the simulation.

In addition, evaluators can be dispatched and seeded and evaluated in parallel, because they are decoupled from the simulation object

Parameters:

theta (Dict[float|Sequence[float]]) – Dictionary of model parameters that should be changed for dispatch. Unspecified model parameters will assume the default values, specified under config.model_parameters.NAME.value
y0 (Dict[float|Sequence[float]]) – Dictionary of initial values that should be changed for dispatch.
x_in (Dict[float|Sequence[float]]) – Dictionary of model input values that should be changed for dispatch.

dispatch_constructor(**evaluator_kwargs)#: Construct the dispatcher and pass everything to the evaluator that is static.

evaluate(theta)#: Wrapper around run to modify paramters of the model.

initialize(input)#

initializes the simulation. Performs any extra work, not done in parameterize or set_coordinates.

Overwrite in a case study simulation if special tasks are necessary

static parameterize(free_parameters: Dict[str, float | str | int], model_parameters: Dict) → Dict#

Optional. Set parameters and initial values of the model. Must return a dictionary with the keys ‘y0’ and ‘parameters’

Can be used to define parameters directly in the script or from a parameter file.

Parameters:

input (List[str] file paths of parameter/input files) –
theta (List[Param] a list of Parameters. By default the parameters) – specified in the settings.cfg are used in this list.

Returns:

tulpe

Return type:

tuple of parameters, can have any length.

parse_input(input: Literal['y0', 'x_in'], reference_data: Dataset | None = None, drop_dims: List[str] = []) → Dataset#

Parses a config string e.g. y=Array([0]) or a=b to a numpy array and looks up symbols in the elements of data, where data items are key:value pairs of a dictionary, xarray items or anything of this form

The values are broadcasted along the remaining dimensions in the obser- vations that have not been dropped. Input refers to the argument in the config file.

This method is useful to prepare y0s or x_in from observations or to broadcast starting values along batch dimensions.

Parameters:

input (Literal["y0", "x_in"]) –
The key in config.simulation that contains the input mapping. The key must be contained in the data structure, otherwise an error will be raised. This is done to make sure there is no ambiguity in the applied dimensional broadcasting.

Example: sim.config.simulation.y0 = [‘A=Array([0])’, ‘B=C’] reference_data = xr.Dataset()
reference_data (Optional[xr.Dataset]) –

posterior_predictive_checks(**plot_kwargs)#: OVERWRITE IF NEEDED. Placeholder method. Minimally plots the posterior predictions of a simulation.

prior_predictive_checks(**plot_kwargs)#: OVERWRITE IF NEEDED. Placeholder method. Minimally plots the prior predictions of a simulation.

report()#: Creates a configurable report. To select which items to report and to fine-tune the report settings, modify the options in config.report.

reshape_observations(observations, reduce_dim)#

This method reduces the dimensionality of the observations. Compiling xarray datasets from multiple experiments with different IDs and different endpoints, lead to blown up datasets where all combinations (even though they were not tested) are filled with NaNs. Reducing such artificial dimensions by flattening the arrays is the aim of this method.

TODO: There should be tests, whether the method is applicable (this may be already caught with the assertion)

TODO: The method should be generally applicable

run()#

Implementation of the forward simulation of the model. Needs to return X and Y

Returns:

X (np.ndarray | xr.DataArray)
Y (np.ndarray | xr.DataArray)

setup(**evaluator_kwargs)#

Simulation setup routine, when the following methods have been defined:

coords = self.set_coordinates(input=self.input_file_paths) self.coordinates = self.create_coordinates() self.var_dim_mapper = self.create_dim_index() init-methods ————

self.initialize –> may be replaced by self.set_observations

subset_by_batch_dimension(data)#: FIXME Subset by batch dimension, seems to be a method that is not appropriate for dispatch; and rather for the dispatch constructor The feature of pymob was not used and is currently deactivated in sim.dispatch() A better use of the method would be the use during the call to dispatch_constructor

total_average(results)#: objective function returning the total MSE of the entire dataset

static validate_model_input(model_input) → OrderedDict[str, Sequence[float]]#: Returns a copy of the model input. This means, the original model input will not be overwritten by any action.

pymob package#

Submodules#

pymob.simulation module#

Components#

Direct use#

Subclassing use#

pymob.infer module#

pymob.prior_predictive_checks module#

pymob.simulate module#

Module contents#

Index#