Statistical inference and machine learning for complex networksOver the last decade, the quantitative study of networks has emerged as a fundamental tool for the study of complex systems, in part for its ability to provide a rigorous foundation for the study of biological, technological, and social complexity. The wide applicability of network models and methods, along with their considerable success in providing insight into the structure and function of real-world systems such as the Internet and World Wide Web, metabolic and protein interaction networks, and social networks both on- and off-line, has generated a remarkable degree of excitement across the sciences.
Much of this insight has been achieved using simple mathematical models of network growth and structure, and comparing simulations or analytic solutions of these models with measurements of simple structural statistics, such as degree distributions and clustering coefficients, in empirically observed networks. However, it is widely believed that real-world networks exhibit a far more complex range of structures than these models are able to capture or characterize. With the steady increase in the volume of empirical data becoming available in genomics, computer science, the social sciences, and other fields, there is an increasing need for automated tools and algorithms that can detect and classify these structures. More fundamentally, a deeper understanding of network structures could produce paradigm-shifting insights into the organization and behavior of many systems.
The goals of the research proposed here are two-fold. First, we aim to develop automated computational tools for detecting and characterizing complex structure in large-scale network data. Second, we aim to apply these tools to discover fundamental principles of network organization. Our approach combines techniques from machine learning, statistics, and statistical physics, including generative models, maximum likelihood estimation, Markov chain Monte Carlo, expectation-maximization, variational Bayes, saddle-point methods, and others. In particular, we propose to develop methods that can, given observed network data as input:
- Automatically detect different types and scales of network organization, both static and dynamic;
- Generate ensembles of networks statistically similar to observed ones, as null models for domain-specific hypotheses, or as substrates for the simulation of network-based processes;
- Predict missing links, nodes, and attributes from partially-observed networks;
- Annotate nodes and edges with their likely functional roles.
To be useful in the real world, these methods must succeed even when network data are noisy or incomplete. They should make as few model- or domain-specific assumptions as possible, but they should also be flexible enough to take domain-specific data into account when it is available