Grantee: University of Queensland, Brisbane, Queensland, Australia
Researcher: Mark A. Ragan, Ph.D.
Grant Title: Constructing bacterial genetic exchange communities
https://doi.org/10.37717/220020272
Program Area: Studying Complex Systems
Grant Type: Research Award
Amount: $404,370
Year Awarded: 2011
Duration: 3 years
For more than 150 years it has been appreciated that the biodiversity we see around us – animals, plants and other forms of macroscopic life – has arisen and diversified via a treelike process of genealogical descent. Along the trunk, branches and twigs of this tree, genetic information has passed “vertically” from parent to offspring, generation after generation down to the present day. Successive discoveries – first of chromosomes, then genes, then the DNA double helix and genetic code – have revealed in ever-greater molecular detail how this tree is perpetuated and new branches arise. Present-day genes and proteins are documents of this treelike history. Thanks to new technologies that decode these molecular documents and reconstruct the underlying branching patterns, many macroscopic organisms have now been assigned their place in this great Tree of Life.
Bacteria, however, continue to fit uncomfortably into this Tree. One hundred years ago Schmitt discovered that agglutination properties of the human paratyphoid bacillus could be heritably changed by exposure to other, unrelated types of bacteria. In 1928 Griffith showed that non-virulent pneumococci could be rendered virulent by a heat-stable substance extracted from a virulent strain, and in 1944 Avery reported DNA to be the transforming substance. We now know that many bacteria can take up foreign DNA, make it their own and pass it along to successive generations. This alternative mode of transmission, lateral genetic transfer (LGT), is orthogonal to the treelike vertical mode of inheritance described above and now underpins experimental molecular genetics and biotechnology. LGT is at least as significant outside the laboratory, most notably in spreading antibiotic-resistance genes to previously sensitive strains in hospitals, in the community and along the commercial food chain, with worrisome implications for public health in rich and poor societies alike.
Only since the late 1990s, however, with the explosion of large-scale genome sequencing, have we begun to appreciate how central LGT is to the bacterial way of life. In some species up to 40% of gene families bear evidence of lateral transfer. Not only antibiotic resistance but core physiology (carbon and nitrogen metabolism, photosynthesis, ion transport), surface adhesion, host range and antigenic properties owe their distribution among bacteria, at least in part, to lateral processes: LGT is not a sideshow, but shares centre stage with vertical genetics. Importantly, LGT is neither uniform nor random. Upon entering a bacterium from the environment, foreign DNA encounters a succession of molecular systems (host defences) that it may or may not be able to evade or exploit. Some defence systems preferentially ignore DNA from close relatives but destroy DNA from unrelated species, others the opposite; some types of events transfer large multi-gene regions of DNA into the new host, others only small regions that may partly overlap an existing gene, or be contained entirely within it. Some transfer events target and overwrite older ones, others preferentially avoid doing so.
Complex systems research offers unique perspectives and tools with which we can examine the consequences of this non-traditional genetics. Like other biological systems, the microbial biosphere can be abstracted as a gigantic network made up of nodes (vertices) connected pairwise by lines (edges). Here, the vertices represent entities that carry DNA (genomes, viruses and plasmids), and the edges represent LGT between them. Of course these networks only approximate reality: many genomes remain un-sequenced and thus lie outside our analysis (we accommodate them by assuming they lie along known edges). Rigorous statistical approaches tell us whether an edge should be drawn between any two vertices, and how confident we can be in this decision. By connecting vertices we generate a network map of LGT across the microbial biosphere – a map that will grow ever-more detailed as thousands upon thousands more genomes are sequenced, for example in large projects such as GEBA (Genomic Encyclopedia of Bacteria and Archaea) and Microbial Earth.
Just as LGT is non-random, these network maps are non-uniform. Some vertices (hubs) anchor numerous edges, others only one or two; other vertices form groups interconnected more or less densely. Earlier this year, in an invited review on the lateral spread of antibiotic resistance, we defined a Genetic Exchange Community (GEC) as a densely connected region within an LGT network. That is, GECs are sets of genomes that have, over time, donated genetic material to and received genetic material from each other via a path of lateral transfer. Using concepts from graph theory and powerful computers, we can identify and precisely enumerate GECs even in immense networks. These non-traditional genetic structures in the biosphere can differ widely in geospatial extent, taxonomic and habitat diversity, density of interconnection, and involvement of plasmids or phage; genetic determinants that are benign in one part of a GEC may be pathogenic in another. As we wrote in the review, GECs are “actively fashioned (and continually refashioned) by the complex ongoing interplay among habitats, donors, vectors, recipients, mechanisms, sequences, population structures and selection. In this sense, GECs are analogous to ecological niches: except perhaps in the broadest sense, niches do not exist a priori in the physical world, but are constructed dynamically by organisms through diverse physical, chemical and biological interactions with their environment and with each other. Microorganisms similarly construct GECs, in the process altering the genomes and physiologies of their interaction partners and reciprocally being altered by them (including the ability to differentially accommodate or resist LGT). The recombinant microorganisms may then alter their physical environment or spread to a new one.”
In this way the known microbial biosphere, together with its GECs, constitutes a complex evolving system. Our graph-based approach offers the promise that the GECs can be “identified, enumerated, analyzed and perhaps situated within a more global map of LGT that might depict the complete spectrum of exchange relationships, from active mutual exchange communities to the underlying gossamer of one-off transformations by environmental DNA.” But the complexity of this system is deeper than this, and more subtle.
As described above, LGT is a highly contingent process that ignores gene boundaries and, over time, partially overwrites its own history. The first step in our analysis must be to detect and delineate unitary regions of LGT amongst thousands of genome sequences, each millions of bases in length. Depending on the optimization decisions we take along the way – whether and when to merge nearby regions, what thresholds of statistical evidence to require, what proportions of false matches we are prepared to accept – we will identify somewhat different sets of LGT units. The consequences propagate: from different sets of units we draw different edges, generate different networks, compute different GECs, and in the case of antibiotic resistance perhaps adopt different public-health strategies.
Here we propose a systematic examination of how the decisions we take in detecting and delineating units of LGT affect the systems properties that we consequently infer for the microbial biosphere and the bacterial communities therein. How robust are these systems properties to our optimization decisions? Which graphical structures represent GECs most robustly? How does our expectation about the propagation of antibiotic resistance in different regions of the microbial biosphere depend on these optimization decisions? Which specific GECs are robust to which specific decisions, and why?
Our approach, tools and outcomes will be of significance not only in microbial ecology and public health, but much more broadly. Complex networks in nature, technology and society contain communities whose structure is not fixed a priori but must be discovered, delineated and optimized. Better understanding the interplay between unit of analysis and properties of networks and communities will lead to a deeper, more-integrated and more-nuanced appreciation of many complex systems.