Microorganisms, notably Bacteria and Archaea, are the most ancient, the most widespread and the most ubiquitous form of life on Earth. Their metabolism drives biogeochemical cycles in virtually every ecosystem and has shaped Earth's surface chemistry over billions of years. Today, prokaryotes (Bacteria and Archaea) are able to utilize a myriad of metabolic pathways to gain energy, thereby occupying extraordinary niches that most other organisms cannot. Our lab is interested in how prokaryotes interact with their environment through their metabolism to drive biogeochemical fluxes and, reciprocally, how this interaction affects microbial diversity at ecological and geological time scales. To answer these questions we use experiments, field surveys, DNA sequencing, mathematical modeling and statistics. We also develop novel computational tools for analyzing massive biological data sets.
Ongoing research topics and some potential graduate theses are summarized below. Future graduate students and postdocs are encouraged to also develop their own research ideas!
The statistical properties of prokaryotic genomic diversity
In any given environment, pathways driving redox reactions for energy acquisition can often be shared by numerous coexisting genomes, and sequential pathways are often split across genomes in various alternative combinations. The processes determining pathway distribution patterns across genomes are largely unknown. Can prokaryotic genomic diversity, either within a single ecosystem or at global scales, be described by simple stochastic models of gene content shuffling? Can prokaryotic genomes be described as metacommunities of cooperating but selfishly replicating genes? Do environmental conditions influence the degree to which metabolic pathways are split across coexisting organisms? These questions may be investigated using the myriad of sequenced genomes available in public databases, or using genomes recovered from metagenomes from a single environment.
Gene-level and genome-level processes of prokaryotic macroevolution
What can gene-centric and genome-centric paradigms tell us about prokaryotic macroevolution?
How does the invention or acquisition of new metabolic capabilities by a clade (e.g., via horizontal gene transfer) affect its overall diversification over geological time scales? For example, do Oxyphotobacteria (the only known bacterial clade capable of oxygenic photosynthesis) exhibit different speciation/extinction rates than other bacterial clades?
Development and validation of pathway-centric ecological models
Most metabolic pathways are found in a wide range of microbial taxa, each of which could potentially fill the same metabolic niche. This functional redundancy leads to a partial decoupling between a community's taxonomic composition and bulk metabolic activity. Pathway-centric models, whereby the distribution and activity of pathways are modeled regardless of the species that host them, could yield great insight into microbial metabolic network dynamics at ecosystem scales. Many questions remain along this endeavor. What is the proper format of pathway-centric models, and what are the limits of their applicability? Can pathway-centric models help us understand patterns of DNA, mRNA and protein distributions in nature? Reciprocally, how can metagenomic, metatranscriptomic and metaproteomic data be used to calibrate or validate pathway-centric models? The advent of environmental meta'omic sequencing data makes these questions more relevant than ever.
The role of genomic structure in microbial metabolic networks
What is the role of microbial genomes in microbial metabolic networks at ecosystem scales? How does the cooccurrence and interaction of different genes in genomes influence their metabolic activity and generate deviations from purely gene-centric predictions? These questions can be investigated using mechanistic models, time series monitoring of natural systems as well as microcosm experiments in the lab. Philosophically, this work is central to understanding the role of the various layers at which Life is organized, including individual genes, groups of genes (genomes) and groups of genomes (microbial communities). Practically, this work will help design more accurate biogeochemical models for natural as well as engineered ecosystems.
Development of multi-layered (gene-level + genome-level) geobiological models
Construction and evaluation of multi-layered (i.e. gene-level and genome-level) geobiological models, in which information on gene co-occurrences and pathway fragmentation is incorporated into gene-centric models. Such adjustments could substantially improve model accuracy, for example for industrial processes or marine ecosystems, where previous studies revealed that gene co-occurrences within genomes lead to deviations from purely gene-centric models. Mathematically frameworks derived from this work may be evaluated using sequencing and chemical data retrieved from microcosms in my lab or in collaboration with other labs.
Microbial evolution in hot springs
Hot springs are modern analogs of what might have been the earliest type of environments to harbor life on Earth, and are similar to environments likely supporting or having supported life on other planets. The importance of hot springs to astrobiology and early Earth biology has fueled intense research on the microorganisms inhabiting these environments, yielding insight into their diversity, physiology and metabolism. To date, however, little is known about how microorganisms evolve within these environments. Our lab is conducting molecular sequencing surveys of microbial mats in hot springs across Oregon, Nevada and Yellowstone National Park, to answer fundamental questions about microbial evolution in these extreme and ancient environments.
Experimental and mathematical characterization of microbial system kinetics
Are there universal high-level principles governing the dynamics of microbial metabolic networks, for example in response to nutrient pulses or changing boundary conditions? Microcosm experiments in the lab may help unravel these principles. Metagenomic sequencing and stable isotope probing can be used to characterize the structural and functional responses of microbial systems. System identification techniques from engineering could be used to describe the response of entire microbial systems, just as standard "Monod" response curves are used for single strains. This work could mark the beginning of a transition in the field towards understanding complex microbial system kinetics, rather than the kinetics of single strains.
Equilibrium-reaction-transport models for geobiology
The vast majority of geobiological models, such as for nitrogen and sulfur cycling in oxygen minimum zones and sediments, take the form of reaction-transport differential equations in which biologically driven reaction rates are kinetically limited. Such models require extensive kinetic parameterization, which severely limits the applicability of these models to natural systems. Rapid microbial population growth and selection for efficient metabolism, however, can lead to reaction kinetics that are much faster than physical transport processes, thus essentially leading to local thermodynamic equilibrium conditions. Equilibrium-reactive-transport models exploit such local equilibrium conditions to eliminate kinetic parameters, and hold great potential for predicting biogeochemical processes solely based on thermodynamic principles and physics. This project will explore the applicability of equilibrium-reactive-transport models to geobiology, including diagenetic processes in sediments and elemental cycling in lakes and the ocean. The project will be strongly modeling based, but may also include experimental work.
Linking thermodynamics to microbial growth in natural systems
A powerful thermodynamic concept for understanding microbial systems is the Gibbs free energy (ΔG), which quantifies the energy available from specific metabolic reactions depending on chemical environmental conditions. While the sign of ΔG (negative or positive) is a strong indicator of the presence or absence of a metabolic reaction, to date a quantitative link between the magnitude of ΔG and microbial growth is restricted to a few isolates studied under laboratory conditions. This study will use novel mathematical modeling techniques to quantify the free energy flux through natural microbial systems and relate those fluxes to measured microbial productivity rates and biomass concentrations.
Development of scalable computational tools
The ongoing explosion of available microbial sequence data presents exceptional opportunities for reconstructing microbial diversification and extinction dynamics over geological time scales, and for unraveling mechanisms that shaped today's extant genomes. These massive datasets, for example including millions of 16S rRNA gene sequences, also present substantial computational challenges, because the majority of existing analytical tools were developed for much smaller datasets. In the Louca lab we develop novel efficient algorithms for handling millions of gene sequences or genomes, for example for constructing and dating trees, detecting horizontal gene transfer events, predicting phenotypes of uncultured organisms and reconstructing diversification dynamics from phylogenies. If you enjoy the intellectual challenge of developing algorithms for computational biology, then this could be the right topic for you!
Charting global microbial diversity
The bulk of microorganisms has never been, and probably will never be, cultured, and culturing is strongly biased towards organisms with specific metabolic traits. Hence our assessment of global microbial diversity is extremely limited and biased. Modern large-scale sequencing surveys can help fill this gap. A goal of ours is to combine existing and future datasets to chart the phenotypic, phylogenetic and geographical distribution of extant microbial diversity, to quantify discovery biases, and to identify relationships between phenotypic and phylogenetic diversity at global scales. This requires the development of new pipelines that can efficiently handle unprecedentedly large datasets. The insight gained from this work will be essential to reconstructing the processes shaping global microbial diversity over geological time scales, for example using Binary State Speciation and Extinction models.
Independent research opportunities for undergraduates
Evaluating model selection techniques in ecology
Information criteria such as AIC and likelihood-ratio-test (LRT) are commonly used in ecology to select among competing models and separate "important" from "unimportant" predictor variables. Despite their widespread use, most of our understanding of the properties and performance of these selection methods are based on linear models and asymptotic considerations (i.e., in the limit of infinitely large datasets). It is not clear, for example, how well the approximate formula commonly used for the statistical significance of an LRT (based on a chi-square distribution) works for finite datasets and complex dynamical models. This project will use simulations of ecological models to systematically investigate the performance of common model selection methods in the presence of finite (realistic) datasets.
Predicting enzyme kinetics using phylogenetic hidden state prediction
Kinetic parameters for the majority of enzymes (e.g., affinity, half-saturation constant, etc) are unknown, but are needed for modeling the metabolism of microorganisms. Physicochemical predictions of kinetics are computationally extremely challenging when only the amino acid sequence is known. Phylogenetic prediction methods, whereby the evolutionary relationships between enzymes are used to estimate an enzyme's properties based on those of close relatives, hold great potential to solving this problem, but remain largely unexplored. This project would use publicly available enzyme sequences and kinetic parameters (e.g., from the BRENDA database) to examine the performance of phylogenetic prediction methods for enzyme kinetics.
Evaluation of tree construction tools at large scales
Massive sequencing datasets covering hundreds of thousands of strains present great opportunities for understanding microbial macroevolution. Unfortunately, existing methods for building phylogenetic trees scale very poorly to such large data sets, and so trade-offs between speed & accuracy become very important. To make things worse, the accuracy of computational tools and the involved trade-offs are poorly understood at these scales. This project's goal is to perform a systematic evaluation of various state-of-the-art alignment/tree-building/tree-dating tools for large datasets common in modern microbiology.
Fundamental trade-offs in enzyme kinetics
It is commonly assumed that fundamental trade-offs exist between various enzyme kinetic parameters, such as affinity and half-saturation constant, however the ubiquity of such trade-offs across enzyme families has never been systematically examined. This project will use enzyme kinetics data across the tree of life from a variety of public databases, to explore and quantify trade-offs between kinetic parameters.
Gene trees vs. species trees
Phylogenetic trees of genes have until recently been used as direct estimates of the underlying species phylogeny, i.e. the evolutionary relationship of the various species from which a gene was sampled. For example, phylogenies constructed using the 16S ribosomal RNA gene are commonly used as a representation of the evolutionary relationships between bacterial species. It is becoming increasingly recognized, however, that gene phylogenies may not fully reflect the underlying speciation history. This project will investigate how much gene trees are expected to differ from the underlying species trees in the case of bacteria, and how much these differences might influence our analyses of bacterial speciation/extinction dynamics over time.
Hypothesis testing in macroevolution
For many taxa the fossil record is too sparse to make meaningful inferences about past speciation and extinction rates, and hence molecular phylogenies of extant-only species are a popular alternative source of information. Recent findings on the identifiability of speciation-extinction models in phylogenetics ("birth-death" models) have called into question the common practice of fitting such models to phylogenies of extant species in order to estimate past speciation and extinction rates. Yet, a debate remains as to whether extant species trees could still be used for testing specific macroevolutionary hypotheses, such as whether speciation has been increasing over time or whether it was influenced by temperature. This project will systematically examine the usefulness of extant species trees for macroevolutionary hypothesis testing.
Phylogenetic relationships between hot spring-associated microbes
Microbial dispersal between hot springs is slower than between most other environments, and hence a geographic restriction of microbial taxa to specific hot springs ("endemism") is relatively likely to encounter. However, the extent and geographic structure of such endemism, especially at global scales, has not yet been systematically examined. This project will build phylogenies for thousands of hot-spring associated bacteria and archaea and investigate the typical taxonomic resolution at which microbes appear endemic to specific hot springs.
Examining the use of Sloan's neutral model in microbial ecology
Understanding the mechanisms that shape microbial community composition remains a core endeavor of modern ecological research. The widely used "Sloan neutral model" assumes that local species composition is driven by two neutral mechanisms, namely random dispersal from a common pool and ecological drift due to stochastic birth/death events. A comparison of the model's predictions to real communities is often used to quantify the extent to which communities assemble neutrally, or to identify individual species subject to non-neutral processes (e.g., selection). However, the validity and informative ability of these approaches, especially when neutrality is violated, has never been rigorously evaluated. This project will use simulations of microbial community assembly under various non-neutral conditions, to examine the how well these can be statistically distinguished from Sloan's neutral model.