Time-calibrated phylogenies of extant species ("extant timetrees") are widely used to estimate historical speciation and extinction rates by fitting stochastic birth-death models.
These approaches have long been controversial as many phylogenetic studies report zero extinction in many taxa, contradicting the high extinction rates seen in the fossil record and the fact that the majority of species ever to have existed are now extinct.
To date, the causes of this discrepancy remain unresolved. Here we provide a novel explanation for these "zero-inflated" extinction rate estimates, based on the recent discovery that there exist many alternative "congruent" diversification scenarios that cannot be distinguished based solely on extant timetrees.
Due to such congruencies, estimation methods tend to converge to some scenario congruent to (i.e., statistically indistinguishable from) the true diversification scenario, but not necessarily to the true diversification scenario itself.
This congruent scenario may exhibit negative extinction rates, a biologically meaningless but mathematically feasible situation, in which case estimators will tend to stick to the boundary estimate of zero extinction.
Based on this explanation, we make multiple testable predictions, which we confirm using analyses of simulated trees and 121 empirical trees.
In contrast to other proposed mechanisms for erroneous extinction rate estimates, our proposed mechanism specifically explains the zero-inflation of previous extinction rate estimates in the absence of detectable model violations, even for large trees.
Not only do our results likely resolve a long-standing mystery in phylogenetics, they demonstrate that model congruencies can have severe consequences in practice.
Data and code overview
R code performing the main analyses described in the paper can be downloaded below.
The code performs the following major tasks in sequence:
See the cited manuscript for detailed definitions and interpretations. The code has been tested on R v4.0.2, MacOS 10.13.6.
The code requires the R package castor v1.6.7, and will not work with older versions.
For ease of reproducibility, all required inputs (empirical timetrees and metadata) are included with the code.
- Fitting ELC birth-death models to timetrees simulated under time-dependent speciation and extinction rates, while either constraining the extinction rate to non-negative values (BDELC) or allowing for negative values (BDELCNeg).
- Fitting BDELC and BDELCNeg models to a collection of empirical timetrees.
Please read the license agreement included in the code prior to using it.
If you use any of the empirical timetrees provided below please cite their respective publications!
Citation info for each timetree can be found in the included file tree_descriptions.tsv.
|Complete R code (includes required input trees).||