Metabolic control theory and parameter variability
Dynamic response to parameter fluctuations
Metabolic response coefficients describe how variables in metabolic systems, like steady state concentrations, respond to small changes of the kinetic parameters. This concept can be extended to temporal parameter fluctuations: spectral response coefficients relate Fourier components of concentrations and fluxes to Fourier components of the underlying parameters. The first-order response coefficients describe forced oscillations caused by small harmonic oscillations of single parameters: they depend on the driving frequency and describe the phases and amplitudes of the concentrations and fluxes.
In [Liebermeister 2005], we have extended this concept in different directions: (i) second-order response coefficients describe how perturbations of different frequencies interact by mode coupling, yielding higher harmonics in the metabolic response. (ii) Also other concepts from metabolic control theory, such as control coefficients with their summation and connectivity theorems can be directly generalised to periodic perturbations. (iii) Close to a Hopf bifurcation, resonance can occur: as an example, I studied the spectral densities of concentration fluctuations arising from the stochastic nature of chemical reactions. (iv) The temporal response to small parameter fluctuations can be computed by Fourier synthesis. For a model of glycolysis, this approximation remains fairly accurate even for large relative fluctuations of the parameters.
Biochemical models with uncertain parameters
The modelling of biochemical networks becomes delicate if kinetic parameters are varying, uncertain or unknown. But uncertain knowledge or beliefs about parameters can still be described by probability distributions. Uncertain (or variable) parameters will lead to uncertain (or variable) system behaviour. A simple and efficient way to assess the effects of parameter uncertainty is stochastic simulation with noisy parameters. We tried this approach in [Klipp et al. 2004].
In [Liebermeister and Klipp 2005], I have studied in more detail the relationship between parameter distributions and the resulting uncertainty of dynamic network properties, such as steady-state fluxes, concentrations, or control coefficients. Intuitively, if a system variable is sensitive to a parameter, an uncertainty in this parameter will contribute strongly to the uncertainty in the variable. This become clear in an approximation: If a parameter distribution is narrow, the resulting distribution of the variables can be computed by expanding them around a set of mean parameter values. The resulting fomulae show clearly how the variability of biological systems is related to the metabolic response coefficients.The probabilistic framework allows the study of metabolic correlations, and it provides simple measures of variability and stochastic sensitivity.
Methods for metabolic modelling
Biochemical models are developed in many places, and many of them are available in SBML format. I am developing a suite of tools - called SemanticSBML - to facilitate merging of SBML models. SemanticSBML consists of three tools: SBMLannotate helps you to insert annotations into your SBML models. SBMLcheck checks annotated models for completeness and consistency. SBMLmerge merges several annotated models.
Biochemical modelling usually focuses on certain pathways, while the pathway's environment, that is, the rest of the cell, is assumed to be "fixed". To achieve a more realistic, dynamic description that is still numerically efficient, I have developed a model-reduction approach [Liebermeister et al 2005]: we start with a full dynamical model of the environment, but then we replace it by a simplified black-box model; to do so, we linearise the environment model around a steady state, and use balanced truncation to reduce its dimensionality. The reduced variables describe the dynamic modes in the environment that are most important for its interaction with the subsystem.
In his Bachelor thesis, Marvin Schulz has extended this approach to other algorithms for complexity reduction.
Model construction / data integration
High-throughput experiments are producing data on metabolite concentrations, metabolic fluxes, protein levels, and expression values. Integrating these different data into a coherent picture would be interesting: for instance, how can differential gene expression be used to redirect fluxes and adjust concentrations in a metabolic network. However, data integration by statistical correlation - as it is usually done now - has its limitations; a final aim is to place the different kinds of data into a mechanistic, kinetic model of the metabolic pathway. I am working on methods to obtain kinetic models from bringing together a metabolic network structure with various experimental data; the data are combined and weighted using Bayesian parameter estimation.
For the data integration, I needed a versatile kinetic law that fits any stoichiometry, is relatively simple, and is still comparable to realistic saturable laws such as the Michaelis-Menten kinetics. Together with E. Klipp, I developed a simple and general rate law called "convenience kinetics" [Liebermeister and Klipp 2006 (i)]. Thermodynamic laws can impose dependencies on the kinetic parameters. Hence, to facilitate model fitting and parameter optimisation for large networks, we introduce thermodynamically independent system parameters: their values can be varied independently, without violating thermodynamical constraints. We achieve this by expressing the equilibrium constants either by Gibbs free energies of formation or by a set of independent equilibrium constants.
In Bayesian parameter estimation, model parameters are described by a posterior probability distribution, which scores the potential parameter sets, showing how well each of them agrees with the data and with the prior assumptions made. I proposed an estimation framework [Liebermeister and Klipp 2006 (ii)] in which the parameter posterior is computed in two separate steps: a first posterior summarises the available data on enzyme kinetic parameters; an improved second posterior is obtained by integrating metabolic fluxes, concentrations, and enzyme concentrations for one or more steady states. The resulting posterior parameter distribution describes a statistical ensemble of parameter sets; the parameter variances and correlations can account for missing knowledge, measurement uncertainties, or biological variability. The posterior distribution can be used to sample model instances and to obtain probabilistic statements about the model's dynamic behaviour.
Parameter priors / statistical learning
Bayesian parameter estimation can deal with missing or uncertain knowledge of kinetic parameters - provided we know at least realistic and accurate priors. But how do we obtain them? To establish a prior for KM values, we could simply choose the empirical distribution of all known KM values (or a log-normal distribution with the same mean and width). But we may also use an individual prior for each single KM value in a model. I have tried to such individual priors from machine learning, based on two kinds of statistical relationships: (i) correlations between values for same metabolite, enzyme, organism [Borger et al. 2006]; (ii) quantitative structure-property-relationship between molecule structure and physiological concentrations [Liebermeister 2005].
Optimal regulation in cells
An optimality principle for differential gene expression
I have developed a model of optimal regulation to describe large-scale differential gene expression [Liebermeister et al. 2004]. My starting point was a simple question: why do genes that share a certain function often show similar expression patterns? In the model, relationships between the optimal expression patterns and the function of genes are deduced from an optimality principle: the regulators have to maximise a fitness function which they influence directly via a cost term, and indirectly via their control on important cell variables, such as metabolic fluxes. According to the model, the optimal linear response to small perturbations reflects the regulators' functions, namely their linear influences on the cell variables. Where the optimality assumption is valid, the model results justify the use of expression data for functional annotation and for pathway reconstruction and suggest the use of linear factor models for the analysis of gene expression data. More details about this model are buried in my thesis,
Gene expression and metabolic network capacities
Expression of metabolic enzymes increases the metabolic capabilities of the cell, but it also consumes resources, and the gene regulatory system has to handle this tradeoff. Together with Oliver Ebenhöh, I studied whether and how gene expression patterns reflect the metabolic needs of the cell [Ebenhöh and Liebermeister 2006]: to do so, we translated gene expression profiles into "expressed metabolic subnetworks" and analysed their performance in terms of network capacities (sets of substances that can be produced from certain carbon sources). Our findings for the diauxic shift in yeast indicate that gene regulation maximises the range of essential compounds that can be obtained from the available nutrients, while minimising the number of expressed enzymes and therefore the burden of protein synthesis.
Analysis of gene expression by independent component analysis
The expression of genes is controlled by many intracellular variables. In [Liebermeister 2002], I applied Independent Component Analysis (ICA) to gene expression data, deriving a linear model based on hidden variables, which I termed "expression modes". The expression of each gene is a linear function of the expression modes, where, according to the ICA model, the linear influences of different modes show a minimal statistical dependence, and their distributions deviate sharply from the normal distribution. Studying cell cycle-related gene expression in yeast, I found that the dominant expression modes could be related to distinct biological functions, such as phases of the cell cycle or the mating response. The linear influences of the dominant modes showed distributions with large tails, indicating the existence of specifically up- and downregulated target genes. The expression modes and their influences can be used to visualise the samples and genes in low-dimensional spaces.