### The purpose of modeling

Since a reliable quantitative model is hard to obtain, experimental knowledge is often first described with a raw model, which is then improved based on new experiments. In this way, model development and experimental design are improved in a synergistic manner. Frequently, both the model and the objective of the study are also iteratively refined (e.g. [14, 15]). Although so far, no standard for the development of pathway models has been established, the majority of approaches (e.g. in [16–18]) embarks on the following general strategy: an initial model is constructed based on pre-existing data such as concentrations, kinetic parameters, flux measurements, microscopic images, which may stem from diverse sources (literature, databases, own measurements). First problems arise already at this stage: links can be missing or discrepancies between model outcome and biological observation can arise. Resolving them requires new experimental data. The points that are not sufficiently defined in the network can also be analyzed in a computational approach that clarifies which of several alternative model assumptions may explain the observations. The initial model is then used to make predictions. These predictions are tested in new experiments or against new data for different mutant organisms or for different pathway stimulation scenarios. The results allow for a model update and a second round of iteration.

This point is nicely illustrated by the development of cell cycle models. First simple models were made to study the emergence of oscillations in cascades of post-translational modifications with feedback[19]. Since that time, dynamic models for cell cycle have been developed and iteratively improved [[20–22]20, 21, 22 and others], now comprising dozens of variables.

### Definition of the system

The most basic decision in model building concerns the model components: which molecules and interactions play a role and which of them will be left out? Omitting certain processes from the models is based on the assumption that they have only a minor influence on the event under study, that their values remain constant in the experimental setup, or that they simply cannot be described with the currently available means. For example, the effect of regulated gene expression is usually neglected in the modeling of metabolic networks although modelers are certainly aware of production and degradation of enzymes. But the different time scales of protein turnover and metabolic reactions justify this simplification in many cases.

On the contrary, the actual architecture of signaling pathways often depends on the cell state (developmental state, cell cycle, previous events). For example the pheromone receptor in yeast is degraded after stimulation; therefore the pathway now lacks its first element. Also other variables besides substance concentrations may play a role: Upon osmotic stress, the HOG pathway (see below) leads eventually to the production of glycerol and, thereby, to a regulation of cell volume and turgor pressure. This means that it essentially changes the state of the cell and, by means of the volume changes, the concentration of all involved components. This is a strong argument for including the volume changes into the model in order to understand the relevant regulatory interactions.

### Coupling of pathways

Different groups of researchers who are experts in their field have developed specific pathway models which appropriately describe the studied phenomena. In order to build larger networks, it is desirable to integrate those different models into larger networks that can reflect more or more complex biological phenomena – as it shall be exemplified below by Bhalla's models of neuronal signaling pathways.

To integrate pathways into larger models, the development of tools for a sensible integration of pathway models is clearly on the agenda. Important ingredients for coupling of models are (1) the development of standardized model exchange formats such as SBML [23], (2) the emergence of model databases such as JWS online http://jjj.biochem.sun.ac.za/database[24], Biomodels http://www.ebi.ac.uk/biomodels or the Database of Quantitative Cellular Signaling (DOQCS, http://doqcs.ncbs.res.in/), and (3) quality standards for model description in publications, such as MIRIAM [25]. As an alternative, it has been proposed to construct signaling networks more or less automatically from data stored in databases, e.g. from interaction maps [26, 27]. However, this approach is likely to miss important biological details: even experts for specific pathways sometimes start to stumble when they are asked for the correct sequence of events in the interaction of proteins – although it is well known that the proteins do interact, and which protein domains are involved.

### Mathematical structure of biochemical network models

Structural models describe the present molecules and their interactions and possibly the molecule numbers. If also the dynamics is to be considered, these numbers will change in time. To describe these changes, modelers can choose from different types of mathematical models. Models used for signaling pathways can be loosely grouped as follows: they can be (i) deterministic (with defined states in the future) or probabilistic (stochastic processes), (ii) discrete or continuous with respect to time and to component abundance (i.e. molecule numbers or concentrations), and they (iii) may or may not describe the processes in space. The choice of a model will depend on system, the available information, and the specific questions to be studied.

In most models, biochemical reaction systems are described in a deterministic, continuous manner by rate equations for the concentrations of substances and complexes. The mathematical representation is a set of ordinary differential equations (ODEs)

where *m* is the number of biochemical species with the concentrations *c*
_{
i
}, *r* is the number of reactions with the rates *v*
_{
j
}, and the quantities *n*
_{
ij
}denote the stoichiometric coefficients. Depending on experimental information, the individual reaction rates can be described by very sophisticated kinetic laws. But often, mass action kinetics is used, where the rate for the reaction
reads

*v* = A · B · *k*
_{
f
}- *C · k*
_{
b
}.

The parameters *k*
_{
f
}, *k*
_{
b
}are the rate constants. Especially in metabolic networks, the traditional Michaelis-Menten kinetics is used, where the rate for the enzyme-catalyzed reaction *S → P* is expressed as

The quantity *V*
_{
max
}is the maximal rate and *K*
_{
M
}denotes the substrate concentration ensuring a half-maximal rate.

In the deterministic framework, spatial distribution of compounds can be described by distinguishing different compartments or by describing dynamics in a continuous space with partial differential equations. Examples for systems that are discrete with respect to time and values of variables are Boolean networks [28], Petri nets [29], or cellular automata[30]. In systems with small molecule numbers, stochastic effects tend to become relevant, and individual reaction events have to be simulated, e.g., by the different algorithms put forward by Gillespie [31–33]. When the particle numbers are high, the results of stochastic simulations are often well approximated by deterministic rate equation models. There is, however, no easy way to decide in advance whether or not a deterministic description is justified.

### Signaling networks and metabolic networks

Modeling of biochemical reaction networks has gained much success in the field of metabolic pathways, and many techniques have been developed for studying metabolic systems (steady state analysis, MCA, stoichiometric analysis, independent fluxes, conservation relations, flux balance analysis etc.). Therefore, we may ask for the similarities and the differences between metabolic and signaling pathways, and whether techniques developed for metabolism also apply to signaling systems. Both metabolism and signaling are modeled by a set of biochemical reactions including binding, dissociation, complex formation, and transfer of molecule groups. Especially phosphorylation and dephosphorylation occur in both cases (e.g. phosphofructokinase in metabolism or MAP kinases in signaling). Nevertheless, we also encounter differences, such as the following:

(i) In metabolism, the amount of enzyme and substrate often differ by several orders of magnitude (concentrations in the order of nM compared to mM). This is a precondition for the application of Michaelis-Menten kinetics, which is only justified if the enzyme concentration is much lower than the substrate concentration (quasi-steady state assumption suggested by Briggs and Haldane, 1925 [34]). In signaling pathways, the numbers of catalyst and substrate molecules are usually in the same order of magnitude. For example, molecule numbers of the proteins involved in typical yeast signaling pathways vary between about several hundreds and several thousands. This is a strong argument for not applying the Michaelis-Menten approximation, but using mass action kinetics, at least as long as the detailed kinetics of that specific reaction is not known. Michaelis-Menten kinetics underestimate the reaction rate compared to (a) mass action kinetics for the whole reaction or (b) mass action kinetics for the individual steps including reversible binding and product release (which give about the same results). This may lead to qualitatively different behavior like the occurrence or absence of oscillations in a MAP kinase cascade with feedback.

(ii) While metabolic pathways are characterized by a flow of matter (an atom entering glycolysis at the upper end/hexokinase may leave it at the lower end/pyruvate kinase), signaling pathways comprise many closed loops in which matter flows, e.g., within the G protein cycle or between the different phosphorylation states of a protein. The essential function of signaling pathways is the flow of information, although this statement does not exclude that the flow of matter in metabolism is also connected to a flow of information.

(iii) Phosphorylation under consumption of nucleotide triphosphates (ATP) has a different function. While it serves as a fuel in metabolism (it essentially increases the difference in free energy) and speeds up the reactions, it just marks proteins as different (changes their activity or binding behavior) in signaling.

(iv) Although metabolism is able to respond to environmental changes, especially nutrition, it is mainly a homeostatic process and has a strong constant component. This is the basis for consideration of steady states in metabolism models. Metabolic control analysis [35, 36], elementary flux modes [37], flux balance analysis [38] and other common approaches are based on the assumption of a steady state. Signaling pathways, on the opposite, operate essentially in a non-static manner: the pathway acts by state changes. Therefore, the analysis of their steady states cannot be central, although it may provide information on the contribution of different components to how such pathways are switched on or off, i.e. how they are shifted away from their resting state.

### Metabolic control analysis

Metabolic control analysis (MCA, [35, 36, 39], for an introduction see [40–42]) studies the effect of small parameter changes on concentrations and fluxes in the steady states of metabolic systems. The (linearized) influence of a certain parameter on a certain variable is quantified by response coefficients or control coefficients, which are also known as sensitivities. Metabolic control theory has developed the so-called summation and connectivity theorems, which relate the control coefficients to the stoichiometry and the linearized reaction kinetics of the system. Control coefficients can be used to detect important system parameters. This has been exemplified for a model of the WNT pathway [18], which we will discuss below. To quantify the robustness of a variable against general parameter changes, Lee et al [18] introduced a measure for robustness, ρ = 1/(1 + σ), where σ is the standard deviation of the concentration control coefficients. A high value of ρ implies robustness, or in other words, a low control of parameters over that variable. In recent years, MCA has been extended to cover time-dependent phenomena: Ingalls and Sauro [43] have proposed time-dependent response coefficients that describe the influence of parameter changes on time series. Besides this, control coefficients for signal characteristics such as maximal amplitude or mean signal time of output signals have been defined [44], and also spatial processes such as diffusion can be incorporated [45].