One of the major reasons for the utilization of point age estimates instead of accounting for fossil age uncertainty in tip-dating analyses of entirely extinct lineages has been the limitation by most Bayesian inference software on handling paleontological datasets in clock analyses. All tree models available for clock analyses assume a diversification process that start at some point in the past (either the time of origin or the time of the most recent common ancestor of all sampled taxa) and stop at the present time (t = 0 yr). In such cases, the birth-death process stops at t = 0 before the present, and the height of the youngest nodes in the tree are also = 0. In the special case where only fossils are included, the birth-death process will stop at some time t > 0yr before the present, at the age of the youngest fossil (e.g., age = 3 Ma). However, the height of the youngest node will still be treated as = 0, which is the default tree format by most phylogenetic inference software. This difference creates the necessity of rescaling the resulting posterior fossil trees with an offset equal to the difference between the age youngest node and the present time (in the example above, offset = 3 myr).
This latter process is relatively simple if we attribute a fixed point age value to all fossils in the tree (so the age of the youngest fossil tip will be known a priori). However, using fixed point age value to all fossils creates important estimate biases, and age uncertainty at the tips should be accounted by allowing them to be estimated within a time range, as discuss in Barido‐Sottani et al. (2020). This presents issues when analyzing fully extinct clades, because all tip ages are variable and are allowed to be estimated within a time range, and each posterior tree will likely have a distinct age for the youngest taxon and thus no common fixed point in the past to establish the rescaling among all posterior trees. To address this issue, a few strategies are available:
The SA package for BEAST2
includes an alternate tree with offset format, which can be set up as
shown in the FBD
tutorial. This uses a a “treeWoffset” operator that will sample the
offset of every tree from the posterior as an additional parameter
during clock analysis. However, posterior trees produced by BEAST2 with offset ages cannot be
directly used by TreeAnnotator to calculate the maximum clade
credibility tree (MCC), as the latter still assumes that the youngest
tips in the dataset represent extant tips. To correct for this, a
“dummy” extant taxon must be added to the posterior trees to be
interpreted by TreeAnnotator, and the “dummy” extant must be pruned from
the resulting MCC tree to produce the final MCC tree. These can be done
by performing by following steps and functions from
EvoPhylo
:
Load the EvoPhylo package
First we import the posterior tree distribution and log file produced by BEAST2. The log file is needed because it contains the offset for each sample. In order to store the correct age in the tree, we are going to add a “dummy” tip to each phylogeny. This dummy tip will have age 0 for each sample, and so the length of the age between this tip and the root of the tree will encode the age of the tree including the offset.
trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo")
## Transform the trees to add a dummy tip
converted_trees <- offset.to.dummy.metadata(trees_file, log_file)
## [1] 364.0323
Here we output the transformed trees to a variable, but in practice it is often more useful to write them directly to a new trees file.
Using the transformed trees, we can then perform our post-processing as usual, for instance using TreeAnnotator to produce an MCC tree.
Our summary tree will include the dummy tip that we added earlier, so the last step of the process is to remove it, and get back the proper extinct tree with an offset.
## Find the example MCC tree from the package
tree_file <- system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo")
## Then remove the dummy tip
final_tree <- drop.dummy.beast(tree_file)
## The output contains the final tree and the offset
final_tree$offset
## [1] 366.4744
## 'treedata' S4 object that stored information
## of
## '/tmp/Rtmp0412x0/Rbuild185568163538/EvoPhylo/inst/extdata/ex_offset.MCC.tre'.
##
## ...@ phylo:
##
## Phylogenetic tree with 19 tips and 18 internal nodes.
##
## Tip labels:
## Floweria_anomala, Floweria_arctostriata, Floweria_becraftensis,
## Floweria_chemungensis1, Floweria_chemungensis2, Floweria_chemungensis3, ...
##
## Rooted; includes branch lengths.
##
## with the following features available:
## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
##
## # The associated data tibble abstraction: 38 × 16
## # The 'node', 'label' and 'isTip' are from the phylo tree.
## node label isTip height height_0.95_HPD height_median height_range length
## <int> <chr> <lgl> <name> <list> <named list> <list> <dbl>
## 1 1 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 2.44
## 2 2 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 3.50
## 3 3 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 2.65
## 4 4 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 0.804
## 5 5 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 2.46
## 6 6 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 11.4
## 7 7 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 0.360
## 8 8 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 11.6
## 9 9 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 0.597
## 10 10 Floweri… TRUE <dbl> <dbl [2]> <dbl [1]> <dbl [2]> 1.70
## # ℹ 28 more rows
## # ℹ 8 more variables: length_0.95_HPD <list>, length_median <dbl>,
## # length_range <list>, posterior <dbl>, rate <dbl>, rate_0.95_HPD <list>,
## # rate_median <dbl>, rate_range <list>
As before, we can also save the tree directly to a file.
Mr.Bayes does not
have an equivalent procedure to treeWoffset from BEAST2 for handling tip-dating
analyses of entirely extinct lineages. In this case, it is necessary to
manually add a “dummy” extant taxon to the dataset from the beginning of
the analyses (as done in Simões and Pierce (2021)). This “dummy” tip must be
scored only with missing data (“?”) and constrained as the sister taxon
to one other taxon in the data, so it does not fluctuate around the tree
as a wildcard. As with the MCC trees from BEAST2, the final summary trees from
Mr.Bayes [50% majority rule consensus tree (MRC) and the maximum
compatible tree (MCT)] will have the extant “dummy’ taxon in then. It
can be pruned using drop.dummy.mb
: