Funded Grants


In search of simplicity: Coarse-graining cellular information processing networks

Everything should be kept as simple as possible, but no simpler. -- A.Einstein

The vast amount of experimental systems biology data has clearly demonstrated that cellular regulatory, or information processing, networks have a degree of complexity that is far greater than what is normally encountered in the physical world. The rush to make sense of these data has so far largely produced models that are as complicated as the data themselves. When phenomenological models are sought, degrees of freedom often are eliminated in an ad hoc manner based on data availability or preferences of the modeler. These approaches fail in quantitative predictions of biological system functions the same way they have failed in many problems in the study of complex phenomena in the physical world. The sine qua non of complex systems is their interdependence, and a naïve attempt at elimination of certain degrees of freedom is tantamount to throwing the baby out with the bath water.

The central theme of my thinking is the development of coarse-graining theoretical approaches that, in contrast to the existing ones, do not eliminate essential degrees of freedom and functional properties of biological systems. I am convinced that structural complexity of cellular information processing notwithstanding, some of their large-scale functional properties can be modeled well by simpler, coarse-grained, phenomenological descriptions. The descriptions may take longer to write down than a simple Hooke's law, but there is no a priori reason why it should remain that "the best ... model of a cat is another ... cat."

There is a dearth of theoretical and computational frameworks that one can use for this task. And yet the theory of statistical inference argues that optimal understanding, prediction, and generalization is achieved in models that eliminate some details. The main difference from inanimate matter is not the complexity per se, but our inability to identify the relevant degrees of freedom to be preserved while coarse-graining. The question is not Do details matter? but rather Which details matter? and How to build models based only on them?

My research group has been addressing this problem. The key difference from earlier attempts has been the focus on objectively selecting the relevant features for the models. We have used tools from stochastic processes theory, statistical mechanics, and information theory to build the analytical framework for this. We have applied the framework to biological systems that provided the best opportunities for testing the ideas. These included basic enzymatic reactions, complex biochemical networks of kinetic proofreading, and data-driven empirical models of the tumor necrosis factor signaling. We have only scratched the surface, and more theoretical work and biological examples are needed to understand if our hypothesis is correct. To this extent, we have recruited experimental collaborators who study systems as diverse as bacteria, yeasts, amoeba, and mammalian cellular signaling. Working together, we will answer the question

Are there phenomenological, coarse-grained, and yet functionally accurate representations of cellular information processing networks, or are we forever doomed to every detail mattering?