# Case studies

Case studies are a principled approach to the *modelling* process. In essence, they are a simple template that contains building blocks for model and names and stores them in an intuitive and reproducible way.

Each case study consists of the following components:

```
test_case_study
  sim.py               # sets up the simulation
  mod.py               # (opt) outsources solver and model definitions
  data.py              # (opt) outsources data input
  plot.py              # (opt) outsources vizualizations

  scenarios
    scenario_A         # the scenario of the "test_case_study" 
      settings.cfg     # configuration for the case study and scenario
      simulation.cfg   # parameters of the simulation

  results
    scenario_A         # the results of "scenario_A"

  data                 # (optional) datafiles for data.py

  scripts              # (optional) evaluation scripts

  docs                 # (optional) documentation of the case study
```

While it is recommended to keep data, docs, results, scripts directories in each case study to keep a comprehensive and compact structure of the project, these can reside anywhere else. 

## Configuration


Settings files are created as conf `.cfg` files. These files are organized into the following sections.

```{admonition} Scripting API
:class: attention
Since `pymob-0.4.0` configurations can be specified in the scripting API and exported to config files from the Simulation instance. This makes configuration considerably more user friendly as the possible options are directly availble through type hints.
```

```conf
[case-study]
name = test_case_study
scenario = test_scenario
package = case_studies
modules = sim mod prob data plot
simulation = Simulation
observations = simulated_data.nc
logging = DEBUG

[simulation]
seed = 1

[data-structure]
rabbits = dimensions=[time] min=0 max=nan
wolves = dimensions=[time] min=0 max=nan

[inference]
objective_function = total_average
n_objectives = 1
EPS = 1e-8

[model-parameters]
alpha = value=0.5 min=0.1 max=5.0 prior=lognorm(s=0.1,scale=0.50) free=True
beta = value=0.02 min=0.005 max=0.2 prior=lognorm(s=0.1,scale=0.02) free=True

[error-model]
wolves = lognorm(scale=wolves+EPS,s=0.1)
rabbits = lognorm(scale=rabbits+EPS,s=0.1)

[multiprocessing]
cores = 1

[inference.pyabc]

[inference.pymoo]

[inference.numpyro]
gaussian_base_distribution = False
kernel = nuts
init_strategy = init_to_uniform
chains = 1
draws = 2000
warmup = 1000
```

### case-study configuration

Contains the information about the Simulation class to be used and the data and output directory. The paths are relative to the root directory (where the case study is launched)

### Simulation configuration

Contains the details about the simulation. If an ode model is used, it should be specified here. In that case also different solvers can be provided to the simulation. 

It is also possible to provide the solver directly which then executes the entire simulation. In this case the solver should be a wrapper around a possibly external simulation which returns the results as a dictionary with keys, corresponding to the data variables.

**Solver post processing** can be used if the results returned from the solver need some additional post processing before they can be compared with the observations. The method takes a dictionary as an input and potentially other arguments specified e.g. in the free-model-parameters or fixed-model-parameters. 

**y0** Initial values of the ODE model. Can be any list separated by whitespace that follows `sympy` syntax. E.g. `wolves=Array([2]) rabbits=rabbits`. This can be processed with `Simulation.parse_input(data=[observations], input="y0", drop_dims="time")`. This will create an array of the keys (wolves, rabbits) and broadcast the values along all coordinates of the observations, but retaining only the first value of the dropped dimensions if a variable (rabbits) was provided. This is very useful if some data_variables of the ODE model were observed and the initial value should be taken from the data.

**seed**. The seed is used to initate random processes for reproducibility. The behavior is still experimental.

### data-structure

This discribes the dimensions and dimensional order of the data. It optionally provides an interface for setting minima and maxima for data scaling for use in optimizers that perform better when working on scaled data.
In addition, also different dimensional order between observation and simulation results can be specified  with `dimensions_evaluator=[...]`

### model-parameters

Model parameters that are subject to parameter inference and can be varied in the parameter estimation process or in the interactive simulation. 
Model parmameters that remain fixed throughout the simulation.

### error-model

Error functions for comparing the simulation results to the data. These functions will also be parsed with `sympy` parsers. Still experimental

### inference

Options for adjusting the inference algorithms