Title: | Pre- And Postprocessing of Morphological Data from Relaxed Clock Bayesian Phylogenetics |
---|---|
Description: | Performs automated morphological character partitioning for phylogenetic analyses and analyze macroevolutionary parameter outputs from clock (time-calibrated) Bayesian inference analyses, following concepts introduced by Simões and Pierce (2021) <doi:10.1038/s41559-021-01532-x>. |
Authors: | Tiago Simoes [cre, aut] , Noah Greifer [aut] , Joelle Barido-Sottani [aut] , Stephanie Pierce [aut] |
Maintainer: | Tiago Simoes <[email protected]> |
License: | GPL (>=2) |
Version: | 0.3.3 |
Built: | 2024-11-12 06:14:11 UTC |
Source: | https://github.com/tiago-simoes/evophylo |
An example dataset of morphological characters for early tetrapodomorphs from Simões & Pierce (2021). This type of data would be used as input to get_gower_dist
.
data("characters")
data("characters")
A data frame with 178 observations (characters) on 43 columns (taxa).
Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.
Converts clock rate tables, such as those produced by clockrate_summary
and imported back after including clade names, from wide to long format.
clock_reshape(rate_table)
clock_reshape(rate_table)
rate_table |
A data frame of clock rates, such as from the output of |
This function will convert clock rate tables from wide to long format, with a new column "clock" containing the clock partition from where each rate estimate was obtained as a factor. The long format is necessary for downstream analyses of selection strength (mode), as similarly done by FBD_reshape
for posterior parameter log files.
A data frame containing a single "value" column (for all rate values) and one column for the "clock" variable (indicating to which clock partition each rate values refers to)
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
get_clockrate_table_MrBayes
, summary
, clockrate_summary
, FBD_reshape
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## The example dataset rate_table_clades_means3 ## has clades and 3 clock rate columns: data("rate_table_clades_means3") ## Reshape a clock rate table with clade names to long format ## Not run: rates_by_clade <- clock_reshape(rate_table_clades_means3) ## End(Not run)
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## The example dataset rate_table_clades_means3 ## has clades and 3 clock rate columns: data("rate_table_clades_means3") ## Reshape a clock rate table with clade names to long format ## Not run: rates_by_clade <- clock_reshape(rate_table_clades_means3) ## End(Not run)
Plots the distribution density of clock rates by clock and clade. The input must have a "clade" column.
clockrate_dens_plot(rate_table, clock = NULL, stack = FALSE, nrow = 1, scales = "fixed")
clockrate_dens_plot(rate_table, clock = NULL, stack = FALSE, nrow = 1, scales = "fixed")
rate_table |
A data frame of clock rates, such as from the output of |
clock |
Which clock rates will be plotted. If unspecified, all clocks are plotted. |
stack |
Whether to display stacked density plots ( |
nrow |
When plotting rates for more than one clock, how many rows should be filled by the plots. This is passed to |
scales |
When plotting rates for more than one clock, whether the axis scales should be "fixed" (default) across clocks or allowed to vary ("free", "free_x", or "free_y"). This is passed to |
The user must manually add clades to the rate table produced by get_clockrate_table_MrBayes
before it can be used with this function. This can be doen manually with in R, such as by using a graphical user interface for editing data like the DataEditR package, or by writing the rate table to a spreadsheet and reading it back in after adding the clades. The example below uses a table that has had the clades added.
A ggplot
object, which can be modified using ggplot2 functions.
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
get_clockrate_table_MrBayes
, geom_density
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline data("RateTable_Means_3p_Clades") # Overlapping plots clockrate_dens_plot(RateTable_Means_3p_Clades, stack = FALSE, nrow = 1, scales = "fixed") # Stacked density for all three clocks, changing the color # palette to viridis using ggplot2 functions clockrate_dens_plot(RateTable_Means_3p_Clades, clock = 1:3, nrow = 1, stack = TRUE, scales = "fixed") + ggplot2::scale_color_viridis_d() + ggplot2::scale_fill_viridis_d()
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline data("RateTable_Means_3p_Clades") # Overlapping plots clockrate_dens_plot(RateTable_Means_3p_Clades, stack = FALSE, nrow = 1, scales = "fixed") # Stacked density for all three clocks, changing the color # palette to viridis using ggplot2 functions clockrate_dens_plot(RateTable_Means_3p_Clades, clock = 1:3, nrow = 1, stack = TRUE, scales = "fixed") + ggplot2::scale_color_viridis_d() + ggplot2::scale_fill_viridis_d()
Displays a scatterplot and fits regression line of one set of clock rates against another, optionally displaying their Pearson correlation coefficient (r) and R-squared values (R^2).
clockrate_reg_plot(rate_table, clock_x, clock_y, method = "lm", show_lm = TRUE, ...)
clockrate_reg_plot(rate_table, clock_x, clock_y, method = "lm", show_lm = TRUE, ...)
rate_table |
A table of clock rates, such as from the output of |
clock_x , clock_y
|
The clock rates that should go on the x- and y-axes, respectively. |
method |
The method (function) used fit the regression of one clock on the other. Check the |
show_lm |
Whether to display the Pearson correlation coefficient (r) and R-squared values (R^2) between two sets of clock rates. |
... |
Other arguments passed to |
clockrate_reg_plot()
can only be used when multiple clocks are present in the clock rate table. Unlike clockrate_summary
and clockrate_dens_plot
, no "clade" column is required.
A ggplot
object, which can be modified using ggplot2 functions.
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline data("RateTable_Means_3p_Clades") #Plot correlations between clocks 1 and 3 clockrate_reg_plot(RateTable_Means_3p_Clades, clock_x = 1, clock_y = 3) #Use arguments supplied to geom_smooth(): clockrate_reg_plot(RateTable_Means_3p_Clades, clock_x = 1, clock_y = 3, color = "red", se = FALSE)
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline data("RateTable_Means_3p_Clades") #Plot correlations between clocks 1 and 3 clockrate_reg_plot(RateTable_Means_3p_Clades, clock_x = 1, clock_y = 3) #Use arguments supplied to geom_smooth(): clockrate_reg_plot(RateTable_Means_3p_Clades, clock_x = 1, clock_y = 3, color = "red", se = FALSE)
Computes summary statistics for each clade and/or each clock partition. The input must have a "clade" column.
clockrate_summary(rate_table, file = NULL, digits = 3)
clockrate_summary(rate_table, file = NULL, digits = 3)
rate_table |
A data frame of clock rates, such as from the output of |
file |
An optional file path where the resulting table will be stored using |
digits |
The number of digits to round the summary results to. Default is 3. See |
The user must manually add clades to the rate table produced by get_clockrate_table_MrBayes
before it can be used with this function. This can be doen manually within R, such as by using a graphical user interface for editing data like the DataEditR package, or by writing the rate table to a spreadsheet and reading it back in after adding the clades. The example below uses a table that has had the clades added.
A data frame containing a row for each clade and each clock with summary statistics (n, mean, standard deviation, minimum, 1st quartile, median, third quartile, maximum).
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
get_clockrate_table_MrBayes
, summary
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline data("RateTable_Means_3p_Clades") clockrate_summary(RateTable_Means_3p_Clades)
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline data("RateTable_Means_3p_Clades") clockrate_summary(RateTable_Means_3p_Clades)
Creates and exports a Nexus file with a list of characters and their respective partitions as inferred by the make_clusters
function. The contents can be copied and pasted directly into a Mr. Bayes commands block for a partitioned clock Bayesian inference analysis.
cluster_to_nexus(cluster_df, file = NULL)
cluster_to_nexus(cluster_df, file = NULL)
cluster_df |
A |
file |
The path of the text file to be created containing the partitioning information in Nexus format. If |
The text as a string, returned invisibly if file
is not NULL
. Use cat
on the resulting output to format it correctly (i.e., to turn "\n"
into line breaks).
vignette("char-part")
for the use of this function as part of an analysis pipeline.
# See vignette("char-part") for how to use this # function as part of an analysis pipeline # Load example phylogenetic data matrix data("characters") # Create distance matrix Dmatrix <- get_gower_dist(characters) # Find optimal partitioning scheme using PAM under k=3 # partitions cluster_df <- make_clusters(Dmatrix, k = 3) # Write to Nexus file and export to .txt file: file <- tempfile(fileext = ".txt") # You would set, e.g., # file <- "path/to/file.txt" cluster_to_nexus(cluster_df, file = file)
# See vignette("char-part") for how to use this # function as part of an analysis pipeline # Load example phylogenetic data matrix data("characters") # Create distance matrix Dmatrix <- get_gower_dist(characters) # Find optimal partitioning scheme using PAM under k=3 # partitions cluster_df <- make_clusters(Dmatrix, k = 3) # Write to Nexus file and export to .txt file: file <- tempfile(fileext = ".txt") # You would set, e.g., # file <- "path/to/file.txt" cluster_to_nexus(cluster_df, file = file)
Imports parameter (.p) log files from Mr. Bayes and combines them into a single data frame. Samples can be dropped from the start of each log file (i.e., discarded as burn-in) and/or downsampled to reduce the size of the output object.
combine_log(path = ".", burnin = 0.25, downsample = 10000)
combine_log(path = ".", burnin = 0.25, downsample = 10000)
path |
The path to a folder containing (.p) log files or a character vector of log files to be read. |
burnin |
Either the number or a proportion of generations to drop from the beginning of each log file. |
downsample |
Either the number or the proportion of generations the user wants to keep after downsampling for the final (combined) log file. Generations will be dropped in approximately equally-spaced intervals. |
combine_log()
imports log files produced by Mr.Bayes, ignoring the first row of the file (which contains an ID number). The files are appended together, optionally after removing burn-in generations from the beginning and/or by further filtering throughout the rest of each file. When burnin
is greater than 0, the number or propotion of generations corresponding to the supplied value will be dropped from the beginning of each file as it is read in. For example, setting burnin = .25
(the default) will drop the first 25% of generations from each file. When downsample
is greater than 0, the file will be downsampled until the number or proportion of generations corresponding to the supplied value is reached. For example, if downsample = 10000
generations (the default) for log files from 4 independent runs (i.e., 4 (.p) files), each log file will be downsampled to 2500 generations, and the final combined data frame will contain 10000 samples, selected in approximately equally spaced intervals from the original data.
The output can be supplied to get_pwt_rates_MrBayes
and to FBD_reshape
. The latter will convert the log data frame from my wide to long format, which is necessary to be used as input for downstream analyses using FBD_summary
, FBD_dens_plot
, FBD_normality_plot
, FBD_tests1
, or FBD_tests2
.
A data frame with columns corresponding to the columns in the supplied log files and rows containing the sampled parameter values. Examples of the kind of output produced can be accessed using data("posterior1p")
and data("posterior3p")
.
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
FBD_reshape
, which reshapes a combined parameter log file for use in some other package functions.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline ## Not run: posterior <- combine_log("path/to/folder", burnin = .25, downsample = 10000) ## End(Not run)
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline ## Not run: posterior <- combine_log("path/to/folder", burnin = .25, downsample = 10000) ## End(Not run)
This method is designed to remove the dummy tip added on offset trees once postprocessing is complete (for instance once the summary tree has been built using TreeAnnotator).
drop.dummy.beast( tree.file, output.file = NULL, dummy.name = "dummy", convert.heights = TRUE )
drop.dummy.beast( tree.file, output.file = NULL, dummy.name = "dummy", convert.heights = TRUE )
tree.file |
path to file containing the tree with dummy tip |
output.file |
path to file to write converted tree. If |
dummy.name |
name of the added dummy tip, default |
convert.heights |
whether height metadata should be converted to height - offset (required to plot e.g. HPD intervals correctly). Default TRUE. |
list of tree
converted tree (as treedata) ; and offset
age of the youngest tip in the final tree
drop.dummy.mb()
for the same function using summary trees with a "dummy" extant from Mr. Bayes
# Analyze the trees with dummy tips - for instance, calculate the MCC summary tree # Then remove the dummy tip from the MCC tree final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))
# Analyze the trees with dummy tips - for instance, calculate the MCC summary tree # Then remove the dummy tip from the MCC tree final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))
This method is designed to remove the dummy tip added to a dataset before running with Mr. Bayes.
drop.dummy.mb( tree.file, output.file = NULL, dummy.name = "dummy", convert.ages = TRUE )
drop.dummy.mb( tree.file, output.file = NULL, dummy.name = "dummy", convert.ages = TRUE )
tree.file |
path to file containing the tree with dummy tip |
output.file |
path to file to write converted tree. If |
dummy.name |
name of the added dummy tip, default |
convert.ages |
whether height metadata should be converted to height - offset (required to plot e.g. HPD intervals correctly). Default TRUE. |
list of tree
converted tree (as treedata) ; and offset
age of the youngest tip in the final tree
drop.dummy.beast()
for the same function using summary trees with a "dummy" extant from BEAST2
# Remove the dummy tip from the summary tree final_tree <- drop.dummy.mb(system.file("extdata", "tree_mb_dummy.tre", package = "EvoPhylo"))
# Remove the dummy tip from the summary tree final_tree <- drop.dummy.mb(system.file("extdata", "tree_mb_dummy.tre", package = "EvoPhylo"))
Produces a density or violin plot displaying the distribution of FBD parameter samples by time bin.
FBD_dens_plot(posterior, parameter, type = "density", stack = FALSE, color = "red")
FBD_dens_plot(posterior, parameter, type = "density", stack = FALSE, color = "red")
posterior |
A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using |
parameter |
A string containing the name of an FBD parameter in the data frame; abbreviations allowed. |
type |
The type of plot; either |
stack |
When |
color |
When |
Density plots are produced using ggplot2::stat_density
, and violin plots are produced using ggplot2::geom_violin
. On violin plots, a horizontal line indicates the median (of the density), and the black dot indicates the mean.
A ggplot
object, which can be modified using ggplot2 functions.
When setting type = "violin"
, a warning may appear saying something like "In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) : collapsing to unique 'x' values". This warning can be ignored.
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
ggplot2::stat_density
, ggplot2::geom_violin
for the underlying functions to produce the plots.
combine_log
for producing a single data frame of FBD parameter posterior samples from multiple log files.
FBD_reshape
for converting a single data frame of FBD parameter estimates, such as those imported using combine_log
, from wide to long format.
FBD_summary
, FBD_normality_plot
, FBD_tests1
, and FBD_tests2
for other functions used to summarize and display the distributions of the parameters.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_dens_plot(posterior3p_long, parameter = "net_speciation", type = "density", stack = FALSE) FBD_dens_plot(posterior3p_long, parameter = "net_speciation", type = "density", stack = TRUE) FBD_dens_plot(posterior3p_long, parameter = "net_speciation", type = "violin", color = "red")
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_dens_plot(posterior3p_long, parameter = "net_speciation", type = "density", stack = FALSE) FBD_dens_plot(posterior3p_long, parameter = "net_speciation", type = "density", stack = TRUE) FBD_dens_plot(posterior3p_long, parameter = "net_speciation", type = "violin", color = "red")
Produces plots of the distributions of fossilized birth–death process (FBD) parameters to facilitate the assessment of the assumptions of normality within time bins and homogeneity of variance across time bins.
FBD_normality_plot(posterior)
FBD_normality_plot(posterior)
posterior |
A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using |
The plots produced include density plots for each parameter within each time bin (residualized to have a mean of zero), scaled so that the top of the density is at a value of one (in black). Superimposed onto these densitys are the densities of a normal distribution with the same mean and variance (and scaled by the same amount) (in red). Deviations between the normal density in red and the density of the parameters in black indiciate deviations from normality. The standard deviation of each parameter is also displayed for each time bin to facilitate assessing homogenity of variance.
A ggplot
object, which can be modified using ggplot2 functions.
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
combine_log
for producing a single data set of parameter posterior samples from individual parameter log files.
FBD_reshape
for converting posterior parameter table from wide to long format.
FBD_tests1
for statistical tests of normality and homogeneity of variance.
FBD_tests2
for tests of differences in parameter means.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_normality_plot(posterior3p_long)
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_normality_plot(posterior3p_long)
Converts FBD posterior parameter table, such as those imported using combine_log
, from wide to long format.
FBD_reshape(posterior, variables = NULL, log.type = c("MrBayes", "BEAST2"))
FBD_reshape(posterior, variables = NULL, log.type = c("MrBayes", "BEAST2"))
posterior |
Single posterior parameter sample dataset with skyline FBD parameters produced with |
variables |
Names of FBD rate variables in the log. If NULL (default), will attempt to auto-detect the names and log type. |
log.type |
Name of the software which produced the log (currently supported: MrBayes or BEAST2). Has to be set if |
The posterior parameters log files produced by Bayesian evolutionary analyses using skyline birth-death tree models, including the skyline FBD model, result into two or more estimates for each FBD parameter, one for each time bin. This function will convert a table of parameters with skyline FBD parameters from wide to long format, with one row per generation per time bin and a new column "Time_bin" containing the respective time bins as a factor. The long format is necessary for downstream analyses using FBD_summary
, FBD_dens_plot
, FBD_normality_plot
, FBD_tests1
, or FBD_tests2
, as similarly done by clock_reshape
for clock rate tables.
The format of the log files can either be specified using the variables
and log.type
or auto-detected by the function.
The "posterior" data frame can be obtained by reading in a log file directly (e.g. using the read.table
function) or by combining several output log files from Mr. Bayes using combine_log
.
A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value.
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") head(posterior3p) ## Reshape FBD table to long format posterior3p_long <- FBD_reshape(posterior3p) head(posterior3p_long)
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") head(posterior3p) ## Reshape FBD table to long format posterior3p_long <- FBD_reshape(posterior3p) head(posterior3p_long)
Produces numerical summaries of each fossilized birth–death process (FBD) posterior parameter by time bin.
FBD_summary(posterior, file = NULL, digits = 3)
FBD_summary(posterior, file = NULL, digits = 3)
posterior |
A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using |
file |
An optional file path where the resulting table will be stored using |
digits |
The number of digitis to round the summary results to. Default is 3. See |
A data frame with a row for each paramater and time bin, and columns for different summary statistics. These include the number of data points (n
) and the mean, standard deviation (sd
), minimum value (min
), first quartile (Q1
), median, third quartile (Q3
), and maximum value (max
). When file
is not NULL
, a .csv file containing this data frame will be saved to the filepath specified in file
and the output will be returned invisibly.
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
combine_log
for producing a single data set of parameter posterior samples from individual parameter log files.
FBD_reshape
for converting posterior parameter table from wide to long format.
FBD_dens_plot
, FBD_normality_plot
, FBD_tests1
, and FBD_tests2
for other functions used to summarize and display the distributions of the parameters.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_summary(posterior3p_long)
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_summary(posterior3p_long)
Produces tests of normality (within time bin, ignoring time bin, and pooling within-time bin values) and homoscedasticity (homogeneity of variances) for each fossilized birth–death process (FBD) parameter in the posterior parameter log file.
FBD_tests1(posterior, downsample = TRUE)
FBD_tests1(posterior, downsample = TRUE)
posterior |
A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using |
downsample |
Whether to downsample the observations to ensure Shapiro-Wilk normality tests can be run. If |
FBD_tests1()
performs several tests on the posterior distributions of parameter values within and across time bins. It produces the Shapiro-Wilk test for normality using shapiro.test
and the Bartlett and Fligner tests for homogeneity of variance using bartlett.test
and fligner.test
, respectively. Note that these tests are likely to be significant even if the observations are approximately normally distributed or have approximately equal variance; therefore, they should be supplemented with visual inspection using FBD_normality_plot
.
A list containing the results of the three tests with the following elements:
shapiro |
A list with an element for each parameter. Each element is a data frame with a row for each time bin and the test statistic and p-value for the Shapiro-Wilk test for normality. In addition, there will be a row for an overall test, combining all observations ignoring time bin, and a test of the residuals, which combines the group-mean-centered observations (equivalent to the residuals in a regression of the parameter on time bin). |
bartlett |
A data frame of the Bartlett test for homogeneity of variance across time bins with a row for each parameter and the test statistic and p-value for the test. |
fligner |
A data frame of the Fligner test for homogeneity of variance across time bins with a row for each parameter and the test statistic and p-value for the test. |
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
combine_log
for producing a single data set of parameter posterior samples from individual parameter log files.
FBD_reshape
for converting posterior parameter table from wide to long format.
FBD_normality_plot
for visual assessments.
FBD_tests2
for tests of differences between parameter means.
shapiro.test
, bartlett.test
, and fligner.test
for the statistical tests used.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_tests1(posterior3p_long)
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_tests1(posterior3p_long)
FBD_tests2()
performs t-tests and Mann-Whitney U-tests to compare the average value of fossilized birth–death process (FBD) parameters between time bins.
FBD_tests2(posterior, p.adjust.method = "fdr")
FBD_tests2(posterior, p.adjust.method = "fdr")
posterior |
A data frame of posterior parameter estimates containing a single "Time_bin" column and one column for each FBD parameter value. Such data frame can be imported using |
p.adjust.method |
The method use to adjust the p-values for multiple testing. See |
pairwise.t.test
and pairwise.wilcox.test
are used to calculate, respectively, the t-test and Mann-Whitney U-tests statistics and p-values. Because the power of these tests depends on the number of posterior samples, it can be helpful to examine the distributions of FBD parameter posteriors using FBD_dens_plot
instead of relying heavily on the tests.
A list with an element for each test, each of which contains a list of test results for each parameter. The results are in the form of a data frame containing the sample sizes and unadjusted and adjusted p-values for each comparison.
vignette("fbd-params")
for the use of this function as part of an analysis pipeline.
combine_log
for producing a single data set of parameter posterior samples from individual parameter log files.
FBD_reshape
for converting posterior parameter table from wide to long format.
FBD_dens_plot
, FBD_normality_plot
, FBD_tests1
, and FBD_tests2
for other functions used to summarize and display the distributions of the parameter posteriors.
pairwise.t.test
and pairwise.wilcox.test
for the tests used.
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_tests2(posterior3p_long)
# See vignette("fbd-params") for how to use this # function as part of an analysis pipeline data("posterior3p") posterior3p_long <- FBD_reshape(posterior3p) FBD_tests2(posterior3p_long)
BEAST2 stores the rates for each clock in a separate file. All trees need to be loaded using treeio::read.beast
.
get_clockrate_table_BEAST2(..., summary = "median", drop_dummy = NULL)
get_clockrate_table_BEAST2(..., summary = "median", drop_dummy = NULL)
... |
|
summary |
summary metric used for the rates. Currently supported: |
drop_dummy |
if not |
A data frame with a column containing the node identifier (node
) and one column containing the clock rates for each tree provided, in the same order as the trees.
get_clockrate_table_MrBayes()
for the equivalent function for MrBayes output files.
clockrate_summary()
for summarizing and examining properties of the resulting rate table. Note that clade membership for each node must be customized (manually added) before these functions can be used, since this is tree and dataset dependent.
#Import all clock summary trees produced by BEAST2 from your local directory ## Not run: tree_clock1 <- treeio::read.beast("tree_file_clock1.tre") tree_clock2 <- treeio::read.beast("tree_file_clock2.tre") ## End(Not run) #Or use the example BEAST2 multiple clock trees that accompany EvoPhylo. data(tree_clock1) data(tree_clock2) # obtain the rate table from BEAST2 trees rate_table <- get_clockrate_table_BEAST2(tree_clock1, tree_clock2, summary = "mean")
#Import all clock summary trees produced by BEAST2 from your local directory ## Not run: tree_clock1 <- treeio::read.beast("tree_file_clock1.tre") tree_clock2 <- treeio::read.beast("tree_file_clock2.tre") ## End(Not run) #Or use the example BEAST2 multiple clock trees that accompany EvoPhylo. data(tree_clock1) data(tree_clock2) # obtain the rate table from BEAST2 trees rate_table <- get_clockrate_table_BEAST2(tree_clock1, tree_clock2, summary = "mean")
Extract evolutionary rate summary statistics for each node from a Bayesian clock summary tree produced by Mr. Bayes and stores them in a data frame.
get_clockrate_table_MrBayes(tree, summary = "median", drop_dummy = NULL)
get_clockrate_table_MrBayes(tree, summary = "median", drop_dummy = NULL)
tree |
An S4 class object of type |
summary |
The name of the rate summary. Should be one of |
drop_dummy |
if not |
A data frame with a column containing the node identifier (node
) and one column for each relaxed clock partition in the tree object containing clock rates.
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
get_clockrate_table_BEAST2
for the equivalent function for BEAST2 output files.
clockrate_summary
for summarizing and examining properties of the resulting rate table. Note that clade membership for each node must be customized (manually added) before these functions can be used, since this is tree and dataset dependent.
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## Import summary tree with three clock partitions produced by ## Mr. Bayes (.t or .tre files) from your local directory ## Not run: tree3p <- treeio::read.mrbayes("Tree3p.t") ## End(Not run) #Or use the example Mr.Bayes multi-clock tree file (\code{tree3p}) data("tree3p") # obtain the rate table from MrBayes tree rate_table <- get_clockrate_table_MrBayes(tree3p) head(rate_table)
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## Import summary tree with three clock partitions produced by ## Mr. Bayes (.t or .tre files) from your local directory ## Not run: tree3p <- treeio::read.mrbayes("Tree3p.t") ## End(Not run) #Or use the example Mr.Bayes multi-clock tree file (\code{tree3p}) data("tree3p") # obtain the rate table from MrBayes tree rate_table <- get_clockrate_table_MrBayes(tree3p) head(rate_table)
Computes Gower distance between characters from a phylogenetic data matrix.
get_gower_dist(x, numeric = FALSE)
get_gower_dist(x, numeric = FALSE)
x |
A phylogenetic data matrix in Nexus (.nex) format, or in any other data frame or matrix format with a column for each character and terminal taxa as rows, which will be read using |
numeric |
Whether to treat the values contained in the |
The Gower distance matrix.
This function uses code adapted from StatMatch::gower.dist()
written by Marcello D'Orazio.
vignette("char-part")
for the use of this function as part of an analysis pipeline.
# See vignette("char-part") for how to use this # function as part of an analysis pipeline # Load example phylogenetic data matrix data("characters") # Create distance matrix Dmatrix <- get_gower_dist(characters) # Reading data matrix as numeric data Dmatrix <- get_gower_dist(characters, numeric = TRUE)
# See vignette("char-part") for how to use this # function as part of an analysis pipeline # Load example phylogenetic data matrix data("characters") # Create distance matrix Dmatrix <- get_gower_dist(characters) # Reading data matrix as numeric data Dmatrix <- get_gower_dist(characters, numeric = TRUE)
Produces a data frame containing the results of 1-sample t-tests for the mean of posterior clock rates against each node's absolute clock rate.
get_pwt_rates_BEAST2(rate_table, posterior)
get_pwt_rates_BEAST2(rate_table, posterior)
rate_table |
A data frame containing a single "value" column (for all rate values) and one column for the "clock" variable (indicating to which clock partition each rate values refers to), such as from the output of |
posterior |
A data frame of posterior parameter estimates including a "clockrate" column indicating the base of the clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using |
get_pwt_rates_BEAST2()
first transforms relative clock rates to absolute rate values for each node and each clock, by multiplying these by the mean posterior clock rate base value. Then, for each node and clock, a one-sample t-test is performed with the null hypothesis that the mean of the posterior clockrates is equal to that node and clock's absolute clock rate.
A long data frame with one row per node per clock and the following columns:
clade |
The name of the clade, taken from the "clade" column of |
nodes |
The node number, taken from the "node" column of |
clock |
The clock partition number |
background.rate(mean) |
The absolute background clock rate (mean clock rate for the whole tree) sampled from the posterior log file |
relative.rate(mean) |
The relative mean clock rate per branch, taken from the "rates" columns of |
absolute.rate(mean) |
The absolute mean clock rate per branch; the relative clock rate multiplied by the mean of the posterior clock rates |
p.value |
The p-value of the test comparing the mean ofthe posterior clockrates to each absolute clockrate |
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
## Not run: # See vignette("rates-selection") for how to use this # function as part of an analysis pipeline # Load example rate table and posterior data sets RateTable_Means_Clades <- system.file("extdata", "RateTable_Means_Clades.csv", package = "EvoPhylo") RateTable_Means_Clades <- read.csv(RateTable_Means_Clades, header = TRUE) posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo") posterior <- read.table(posterior, header = TRUE) get_pwt_rates_BEAST2(RateTable_Means_Clades, posterior) ## End(Not run)
## Not run: # See vignette("rates-selection") for how to use this # function as part of an analysis pipeline # Load example rate table and posterior data sets RateTable_Means_Clades <- system.file("extdata", "RateTable_Means_Clades.csv", package = "EvoPhylo") RateTable_Means_Clades <- read.csv(RateTable_Means_Clades, header = TRUE) posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo") posterior <- read.table(posterior, header = TRUE) get_pwt_rates_BEAST2(RateTable_Means_Clades, posterior) ## End(Not run)
Produces a data frame containing the results of 1-sample t-tests for the mean of posterior clock rates against each node's absolute clock rate.
get_pwt_rates_MrBayes(rate_table, posterior)
get_pwt_rates_MrBayes(rate_table, posterior)
rate_table |
A data frame containing a single "value" column (for all rate values) and one column for the "clock" variable (indicating to which clock partition each rate values refers to), such as from the output of |
posterior |
A data frame of posterior parameter estimates including a "clockrate" column indicating the base of the clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using |
get_pwt_rates_MrBayes()
first transforms relative clock rates to absolute rate values for each node and each clock, by multiplying these by the mean posterior clock rate base value. Then, for each node and clock, a one-sample t-test is performed with the null hypothesis that the mean of the posterior clockrates is equal to that node and clock's absolute clock rate.
A long data frame with one row per node per clock and the following columns:
clade |
The name of the clade, taken from the "clade" column of |
nodes |
The node number, taken from the "node" column of |
clock |
The clock partition number |
relative.rate |
The relative mean clock rate per node, taken from the "rates" columns of |
absolute.rate(mean) |
The absolute mean clock rate per node; the relative clock rate multiplied by the mean of the posterior clock rates |
null |
The absolute clock rate used as the null value in the t-test |
p.value |
The p-value of the test comparing the mean ofthe posterior clockrates to each absolute clockrate |
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline # Load example rate table and posterior data sets data("RateTable_Means_3p_Clades") data("posterior3p") get_pwt_rates_MrBayes(RateTable_Means_3p_Clades, posterior3p)
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline # Load example rate table and posterior data sets data("RateTable_Means_3p_Clades") data("posterior3p") get_pwt_rates_MrBayes(RateTable_Means_3p_Clades, posterior3p)
Computes silhouette widths index for several possible numbers of clusters(partitions) k
, which determines how well an object falls within their cluster compared to other clusters. The best number of clusters k
is the one with the highest silhouette width.
get_sil_widths(dist_mat, max.k = 10) ## S3 method for class 'sil_width_df' plot(x, ...)
get_sil_widths(dist_mat, max.k = 10) ## S3 method for class 'sil_width_df' plot(x, ...)
dist_mat |
A Gower distance matrix, the output of a call to |
max.k |
The maximum number of clusters(partitions) to search across. |
x |
A |
... |
Further arguments passed to |
get_sil_widths
calls cluster::pam
on the supplied Gower distance matrix with each number of clusters (partitions) up to max.k
and stores the average silhouette widths across the clustered characters. When plot = TRUE
, a plot of the sillhouette widths against the number of clusters is produced, though this can also be produced seperately on the resulting data frame using plot.sil_width_df()
. The number of clusters with the greatest silhouette width should be selected for use in the final clustering specification.
For get_sil_widths()
, it produces a data frame, inheriting from class "sil_width_df"
, with two columns: k
is the number of clusters, and sil_width
is the silhouette widths for each number of clusters. If plot = TRUE
, the output is returned invisibly.
For plot()
on a get_sil_widths()
object, it produces a ggplot
object that can be manipulated using ggplot2 syntax (e.g., to change the theme
or labels).
vignette("char-part")
for the use of this function as part of an analysis pipeline.
# See vignette("char-part") for how to use this # function as part of an analysis pipeline data("characters") #Reading example file as categorical data Dmatrix <- get_gower_dist(characters) #Get silhouette widths for k=7 sw <- get_sil_widths(Dmatrix, max.k = 7) sw plot(sw, color = "red", size =2)
# See vignette("char-part") for how to use this # function as part of an analysis pipeline data("characters") #Reading example file as categorical data Dmatrix <- get_gower_dist(characters) #Get silhouette widths for k=7 sw <- get_sil_widths(Dmatrix, max.k = 7) sw plot(sw, color = "red", size =2)
Determines cluster (partition) membership for phylogenetic morphological characters from the supplied Gower distance matrix and requested number of clusters using partitioning around medoids (PAM, or K-medoids). For further and independently testing the quality of the chosen partitioning scheme, users may also poduce graphic clustering (tSNEs), coloring data points according to PAM clusters, to verify PAM clustering results.
make_clusters(dist_mat, k, tsne = FALSE, tsne_dim = 2, tsne_theta = 0, ...) ## S3 method for class 'cluster_df' plot(x, seed = NA, nrow = 1, ...)
make_clusters(dist_mat, k, tsne = FALSE, tsne_dim = 2, tsne_theta = 0, ...) ## S3 method for class 'cluster_df' plot(x, seed = NA, nrow = 1, ...)
dist_mat |
A Gower distance matrix, the output of a call to |
k |
The desired number of clusters (or character partitions), the output from |
tsne |
Whether to perform Barnes-Hut t-distributed stochastic neighbor embedding (tSNE) to produce a multi-dimensional representation of the distance matrix using |
tsne_dim |
When |
tsne_theta |
When |
... |
For For |
x |
For |
seed |
For |
nrow |
For |
make_clusters
calls cluster::pam
on the supplied Gower distance matrix with the specified number of clusters to determine cluster membership for each character. PAM is analogous to K-means, but it has its clusters centered around medoids instead of centered around centroids, which are less prone to the impact from outliers and heterogeneous cluster sizes. PAM also has the advantage over k-means of utilizing Gower distance matrices instead of Euclidean distance matrices only.
When tsne = TRUE
, a Barnes-Hut t-distributed stochastic neighbor embedding is used to compute a multi-dimensional embedding of the distance matrix, coloring data points according to the PAM-defined clusters, as estimated by the function make_clusters
. This graphic clustering allows users to independently test the quality of the chosen partitioning scheme from PAM, and can help in visualizing the resulting clusters. Rtsne::Rtsne
is used to do this. The resulting dimensions will be included in the output; see Value below.
plot()
plots all morphological characters in a scatterplot with points colored based on cluster membership. When tsne = TRUE
in the call to make_clusters()
, the x- and y-axes will correspond to requested tSNE dimensions. With more than 2 dimensions, several plots will be produced, one for each pair of tSNE dimensions. These are displayed together using patchwork::plot_layout
. When tsne = FALSE
, the points will be arrange horizontally by cluster membership and randomly placed vertically.
A data frame, inheriting from class "cluster_df"
, with a row for each character with its number (character_number
) and cluster membership (cluster
). When tsne = TRUE
, additional columns will be included, one for each requested tSNE dimension, labeled tSNE_Dim1
, tSNE_Dim2
, etc., containing the values on the dimensions computed using Rtsne()
.
The pam
fit resulting from cluster::pam
is returned in the "pam.fit"
attribute of the outut object.
When using plot()
on a cluster_df
object, warnings may appear from ggrepel
saying something along the lines of "unlabeled data points (too many overlaps). Consider increasing max.overlaps". See ggrepel::geom_text_repel
for details; the max.overlaps
argument can be supplied to plot()
to increase the maximum number of element overlap in the plot. Alternatively, users can increase the size of the plot when exporting it, as it will increase the plot area and reduce the number of elements overlap. This warning can generally be ignored, though.
vignette("char-part")
for the use of this function as part of an analysis pipeline.
get_gower_dist
, get_sil_widths
, cluster_to_nexus
# See vignette("char-part") for how to use this # function as part of an analysis pipeline data("characters") # Reading example file as categorical data Dmatrix <- get_gower_dist(characters) sil_widths <- get_sil_widths(Dmatrix, max.k = 7) sil_widths # 3 clusters yields the highest silhouette width # Create clusters with PAM under k=3 partitions cluster_df <- make_clusters(Dmatrix, k = 3) # Simple plot of clusters plot(cluster_df, seed = 12345) # Create clusters with PAM under k=3 partitions and perform # tSNE (3 dimensions; default is 2) cluster_df_tsne <- make_clusters(Dmatrix, k = 3, tsne = TRUE, tsne_dim = 2) # Plot clusters, plots divided into 2 rows, and increasing # overlap of text labels (default = 10) plot(cluster_df_tsne, nrow = 2, max.overlaps = 20)
# See vignette("char-part") for how to use this # function as part of an analysis pipeline data("characters") # Reading example file as categorical data Dmatrix <- get_gower_dist(characters) sil_widths <- get_sil_widths(Dmatrix, max.k = 7) sil_widths # 3 clusters yields the highest silhouette width # Create clusters with PAM under k=3 partitions cluster_df <- make_clusters(Dmatrix, k = 3) # Simple plot of clusters plot(cluster_df, seed = 12345) # Create clusters with PAM under k=3 partitions and perform # tSNE (3 dimensions; default is 2) cluster_df_tsne <- make_clusters(Dmatrix, k = 3, tsne = TRUE, tsne_dim = 2) # Plot clusters, plots divided into 2 rows, and increasing # overlap of text labels (default = 10) plot(cluster_df_tsne, nrow = 2, max.overlaps = 20)
This method adds a dummy tip at the present (t = 0) to fully extinct trees with offsets, in order to have correct ages (otherwise the most recent tip is assumed to be at 0). This is a workaround to get the proper ages of the trees into other tools such as TreeAnnotator.
offset.to.dummy(trees.file, log.file, output.file = NULL, dummy.name = "dummy")
offset.to.dummy(trees.file, log.file, output.file = NULL, dummy.name = "dummy")
trees.file |
path to BEAST2 output file containing posterior trees |
log.file |
path to BEAST2 trace log file containing offset values |
output.file |
path to file to write converted trees. If |
dummy.name |
name of the added dummy tip, default |
NB: Any metadata present on the tips will be discarded. If you want to keep metadata (such as clock rate values),
use offset.to.dummy.metadata
instead.
list of converted trees (as treedata)
offset.to.dummy.metadata()
(slower version, keeping metadata)
# Convert trees with offset to trees with dummy tip trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo") log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo") converted_trees <- offset.to.dummy.metadata(trees_file, log_file) # Do something with the converted trees - for instance, calculate the MCC summary tree # Then remove the dummy tip from the MCC tree final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))
# Convert trees with offset to trees with dummy tip trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo") log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo") converted_trees <- offset.to.dummy.metadata(trees_file, log_file) # Do something with the converted trees - for instance, calculate the MCC summary tree # Then remove the dummy tip from the MCC tree final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))
This method adds a dummy tip at the present (t = 0) to fully extinct trees with offsets, in order to have correct ages (otherwise the most recent tip is assumed to be at 0). This is a workaround to get the proper ages of the trees into other tools such as TreeAnnotator.
offset.to.dummy.metadata( trees.file, log.file, output.file = NULL, dummy.name = "dummy" )
offset.to.dummy.metadata( trees.file, log.file, output.file = NULL, dummy.name = "dummy" )
trees.file |
path to BEAST2 output file containing posterior trees |
log.file |
path to BEAST2 trace log file containing offset values |
output.file |
path to file to write converted trees. If |
dummy.name |
name of the added dummy tip, default |
list of converted trees (as treedata)
offset.to.dummy()
(faster version discarding metadata)
# Convert trees with offset to trees with dummy tip trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo") log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo") converted_trees <- offset.to.dummy.metadata(trees_file, log_file) # Do something with the converted trees - for instance, calculate the MCC summary tree # Then remove the dummy tip from the MCC tree final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))
# Convert trees with offset to trees with dummy tip trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo") log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo") converted_trees <- offset.to.dummy.metadata(trees_file, log_file) # Do something with the converted trees - for instance, calculate the MCC summary tree # Then remove the dummy tip from the MCC tree final_tree <- drop.dummy.beast(system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo"))
Plots The distribution and mean of background rates extracted from the posterior log files from Mr. Bayes or BEAST2, as well as the distribution of background rates if log transformed to test for normality of data distribution.
plot_back_rates(type = c("MrBayes", "BEAST2"), posterior, clock = 1, trans = c("none", "log", "log10"), size = 12, quantile = 0.95)
plot_back_rates(type = c("MrBayes", "BEAST2"), posterior, clock = 1, trans = c("none", "log", "log10"), size = 12, quantile = 0.95)
type |
Whether to use data output from "Mr.Bayes" or "BEAST2". |
posterior |
A data frame of posterior parameter estimates (log file). From Mr.Bayes, it includes a "clockrate" column indicating the mean (background) clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using |
clock |
The clock partition number to calculate selection mode. Ignored if only one clock is available. |
trans |
Type of data transformation to perform on background rates extracted from the posterior log file from Mr. Bayes or BEAST2. Options include "none" (if rates are normally distributed), natural log transformation "log", and log of base 10 transformation "log10". The necessity of using data transformation can be tested using the function |
size |
Font size for title of plot |
quantile |
Upper limit for X axis (passed on to 'xlim') to remove outliers from histogram. The quantile can be any value between "0" and "1", but values equal or above "0.95" provide good results in most cases in which the data distribution is right skewed. |
Plots The distribution and mean (red dotted line) of background rates extracted from the posterior log files from Mr. Bayes or BEAST2, as well as the distribution of background rates if log transformed. Background rates should be normally distributed for meeting the assumptions of t-tests and other tests passed on by downstream functions, including get_pwt_rates_MrBayes
, get_pwt_rates_BEAST2
, and plot_treerates_sgn
.
It produces a ggplot
object that can be manipulated using ggplot2 syntax (e.g., to change the theme
or labels).
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## MrBayes example # Load example tree and posterior data("posterior3p") P <- plot_back_rates (type = "MrBayes", posterior3p, clock = 1, trans = "log10", size = 10, quantile = 0.95) P
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## MrBayes example # Load example tree and posterior data("posterior3p") P <- plot_back_rates (type = "MrBayes", posterior3p, clock = 1, trans = "log10", size = 10, quantile = 0.95) P
Plots the summary Bayesian evolutionary tree with branches, according to user-defined thresholds (in units of standard deviations) used to infer the strength and mode of selection.
plot_treerates_sgn(type = c("MrBayes", "BEAST2"), tree, posterior, trans = c("none", "log", "log10"), summary = "mean", drop.dummyextant = TRUE, clock = 1, threshold = c("1 SD", "2 SD"), low = "blue", mid = "gray90", high = "red", branch_size = 2, tip_size = 2, xlim = NULL, nbreaks = 10, geo_size = list(2, 3), geo_skip = c("Quaternary", "Holocene", "Late Pleistocene"))
plot_treerates_sgn(type = c("MrBayes", "BEAST2"), tree, posterior, trans = c("none", "log", "log10"), summary = "mean", drop.dummyextant = TRUE, clock = 1, threshold = c("1 SD", "2 SD"), low = "blue", mid = "gray90", high = "red", branch_size = 2, tip_size = 2, xlim = NULL, nbreaks = 10, geo_size = list(2, 3), geo_skip = c("Quaternary", "Holocene", "Late Pleistocene"))
type |
Whether to use data output from "Mr.Bayes" or "BEAST2". |
tree |
A |
posterior |
A data frame of posterior parameter estimates (log file). From Mr.Bayes, it includes a "clockrate" column indicating the mean (background) clock rate estimate for each generation that will be used for pairwise t-tests. Such data frame can be imported using |
trans |
Type of data transformation to perform on background rates extracted from the posterior log file from Mr. Bayes or BEAST2. Options include "none" (if rates are normally distributed), natural log transformation "log", and log of base 10 transformation "log10". The necessity of using data transformation can be tested using the function |
summary |
Only when using Mr. Bayes trees. The rate summary stats chosen to calculate selection mode. Only rates "mean" and "median" are allowed. Default is "mean". |
drop.dummyextant |
|
clock |
The clock partition number to calculate selection mode. Ignored if only one clock is available. |
threshold |
A vector of threshold values. Default is to display thresholds of ±1 relative standard deviation (SD) of the relative posterior clock rates. Should be specified as a number of standard deviations (e.g., |
low , mid , high
|
Colors passed to |
branch_size |
The thickness of the lines that form the tree. |
tip_size |
The font size for the tips of the tree. |
xlim |
The x-axis limits. Should be two negative numbers (though the axis labels will be in absolute value, i.e., Ma). |
nbreaks |
The number of interval breaks in the geological timescale. |
geo_size |
The font size for the labels in the geological scale. The first value in |
geo_skip |
A vector of interval names indicating which intervals should not be labeled. Passed directly to the |
Plots the phylogentic tree contained in tree
using ggtree::ggtree
. Branches undergoing accelerating evolutionary rates (e.g., >"1 SD"
, "3 SD"
, or "5 SD"
relative to the background rate) for each morphological clock partition suggest directional (or positive) selection for that morphological partition in that branch of the tree. Branches undergoing decelerating evolutionary rates (e.g., <"1 SD"
, "3 SD"
, or "5 SD"
relative to the background rate) for each morphological clock partition suggest stabilizing selection for that morphological partition in that branch of the tree. For details on rationale, see Simões & Pierce (2021).
Please double check that the distribution of background rates (mean rates for the tree) sampled from the posterior follow the assumptions of a normal distribution (e.g., check for normality of distribution in Tracer). Otherwise, displayed results may not have a valid interpretation.
A ggtree
object, which inherits from ggplot
.
Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.
vignette("rates-selection")
for the use of this function as part of an analysis pipeline.
ggtree::ggtree
, deeptime::coord_geo
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## MrBayes example # Load example tree and posterior data("tree3p") data("posterior3p") plot_treerates_sgn( type = "MrBayes", tree3p, posterior3p, #MrBayes tree file with data for all partitions trans = "none", summary = "mean", #MrBayes specific argument drop.dummyextant = TRUE, #MrBayes specific argument clock = 1, #Show rates for clock partition 1 threshold = c("1 SD", "3 SD"), #sets background rate threshold for selection mode branch_size = 1.5, tip_size = 3, #sets size for tree elements xlim = c(-450, -260), nbreaks = 8, geo_size = list(3, 3)) #sets limits and breaks for geoscale ## Not run: ## BEAST2 example tree_clock1 <- system.file("extdata", "Penguins_MCC_morpho_part1", package = "EvoPhylo") tree_clock1 <- treeio::read.beast(tree_clock1) posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo") posterior <- read.table(posterior, header = TRUE) plot_treerates_sgn( type = "BEAST2", tree_clock1, posterior, #BEAST2 tree file with data for partition 1 trans = "log10", clock = 1, #Show rates for clock partition 1 threshold = c("1 SD", "3 SD"), #sets background rate threshold for selection mode branch_size = 1.5, tip_size = 3, #sets size for tree elements xlim = c(-70, 30), nbreaks = 8, geo_size = list(3, 3)) #sets limits and breaks for geoscale ## End(Not run)
# See vignette("rates-selection") for how to use this # function as part of an analysis pipeline ## MrBayes example # Load example tree and posterior data("tree3p") data("posterior3p") plot_treerates_sgn( type = "MrBayes", tree3p, posterior3p, #MrBayes tree file with data for all partitions trans = "none", summary = "mean", #MrBayes specific argument drop.dummyextant = TRUE, #MrBayes specific argument clock = 1, #Show rates for clock partition 1 threshold = c("1 SD", "3 SD"), #sets background rate threshold for selection mode branch_size = 1.5, tip_size = 3, #sets size for tree elements xlim = c(-450, -260), nbreaks = 8, geo_size = list(3, 3)) #sets limits and breaks for geoscale ## Not run: ## BEAST2 example tree_clock1 <- system.file("extdata", "Penguins_MCC_morpho_part1", package = "EvoPhylo") tree_clock1 <- treeio::read.beast(tree_clock1) posterior <- system.file("extdata", "Penguins_log.log", package = "EvoPhylo") posterior <- read.table(posterior, header = TRUE) plot_treerates_sgn( type = "BEAST2", tree_clock1, posterior, #BEAST2 tree file with data for partition 1 trans = "log10", clock = 1, #Show rates for clock partition 1 threshold = c("1 SD", "3 SD"), #sets background rate threshold for selection mode branch_size = 1.5, tip_size = 3, #sets size for tree elements xlim = c(-70, 30), nbreaks = 8, geo_size = list(3, 3)) #sets limits and breaks for geoscale ## End(Not run)
Multiple clock Bayesian phylogenetic tree, imported as an S4 class object using treeio::read.beast()
.
data("post_trees")
data("post_trees")
A tidytree
object.
Example tree file for function write.beast.treedata
.
write.beast.treedata
for using this file in context.
An example dataset of posterior parameter samples resulting from a clock-based Bayesian inference analysis using the skyline fossilized birth–death process (FBD) tree model with Mr. Bayes after combining all parameter (.p) files into a single data frame with the combine_log
function. This particular example was produced by analyzing the data set with a single morphological partition from Simões & Pierce (2021).
data("posterior1p")
data("posterior1p")
A data frame with 4000 observations on several variables estimated for each generation during analysis:
A numeric vector for the generation number
LnL
A numeric vector for the natural log likelihood of the cold chain
LnPr
A numeric vector for the natural log likelihood of the priors
TH
A numeric vector for the total tree height (sum of all branch durations, as chronological units)
TL
A numeric vector for total tree length (sum of all branch lengths, as accumulated substitutions/changes)
prop_ancfossil
A numeric vector indicating the proportion of fossils recovered as ancestors
sigma
A numeric vector for the standard deviation of the lognormal distribution governing how much rates vary across characters.
net_speciation_1
, net_speciation_2
, net_speciation_3
, net_speciation_4
A numeric vector for net speciation estimates for each time bin
relative_extinction_1
, relative_extinction_2
, relative_extinction_3
, relative_extinction_4
A numeric vector for relative extinction estimates for each time bin
relative_fossilization_1
, relative_fossilization_2
, relative_fossilization_3
, relative_fossilization_4
A numeric vector for relative fossilization estimates for each time bin
tk02var
A numeric vector for the variance on the base of the clock rate
clockrate
A numeric vector for the base of the clock rate
Datasets like this one can be produced from parameter log (.p) files using combine_log
. The number of variables depends on parameter set up, but for clock analyses with Mr. Bayes, will typically include the ones above, possibly also including alpha
, which contains the shape of the gamma distribution governing how much rates vary across characters. When using the traditional FBD model rather than the skyline FBD model used to produce this dataset, there will be only one column for each of net_speciation
, relative_extinction
and relative_fossilization
. When using more than one morphological partition, different columns may be present; see posterior3p
for an example with 3 partitions.
Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.
posterior3p
for an example dataset of posterior parameter samples resulting from an analysis with 3 partitions rather than 1.
An example dataset of posterior parameter samples resulting from a clock-based Bayesian inference analysis using the skyline fossilized birth–death process (FBD) tree model with Mr. Bayes after combining all parameter (.p) files into a single data frame with the combine_log
function. This particular example was produced by analyzing the data set with three morphological partitions from Simões & Pierce (2021).
data("posterior3p")
data("posterior3p")
A data frame with 4000 observations on several variables estimated for each generation during analysis. The number of variables depends on parameter set up, but for clock analyses with Mr. Bayes, will typically include the following:
Gen
A numeric vector for the generation number
LnL
A numeric vector for the natural log likelihood of the cold chain
LnPr
A numeric vector for the natural log likelihood of the priors
TH.all.
A numeric vector for the total tree height (sum of all branch durations, as chronological units)
TL.all.
A numeric vector for total tree length (sum of all branch lengths, as accumulated substitutions/changes)
prop_ancfossil.all.
A numeric vector indicating the proportion of fossils recovered as ancestors
sigma.1.
, sigma.2.
, sigma.3.
A numeric vector for the standard deviation of the lognormal distribution governing how much rates vary across characters for each data partition
m.1.
, m.2.
, m.3.
A numeric vector for the rate multiplier parameter for each data partition
net_speciation_1.all.
, net_speciation_2.all.
, net_speciation_3.all.
, net_speciation_4.all.
A numeric vector for net speciation estimates for each time bin
relative_extinction_1.all.
, relative_extinction_2.all.
, relative_extinction_3.all.
, relative_extinction_4.all.
A numeric vector for relative extinction estimates for each time bin
relative_fossilization_1.all.
, relative_fossilization_2.all.
, relative_fossilization_3.all.
, relative_fossilization_4.all.
A numeric vector for relative fossilization estimates for each time bin
tk02var.1.
, tk02var.2.
, tk02var.3.
A numeric vector for the variance on the base of the clock rate for each clock partition
clockrate.all.
A numeric vector for the base of the clock rate
Datasets like this one can be produced from parameter log (.p) files using combine_log
. The number of variables depends on parameter set up, but for clock analyses with Mr. Bayes, will typically include the ones above, possibly also including an alpha
for each partition, which contains the shape of the gamma distribution governing how much rates vary across characters (when shape of the distribution is unlinked across partitions). When using the traditional FBD model rather than the skyline FBD model used to produce this dataset, there will be only one column for each of net_speciation
, relative_extinction
and relative_fossilization
. When using a single morphological partition, different columns may be present; see posterior1p
for an example with just one partition.
Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.
posterior1p
for an example dataset of posterior parameter samples resulting from an analysis with 1 partition rather than 3.
A data set containing the mean clock rates for a tree with 1 clock partition, such as the output of get_clockrate_table_MrBayes
but with an additional "clade" column added, which is required for use in clockrate_summary
and clockrate_dens_plot
.
data("RateTable_Means_1p_Clades")
data("RateTable_Means_1p_Clades")
A data frame with 79 observations on the following 3 variables.
clade
A character vector containing the clade names for each corresponding node
nodes
A numeric vector for the node numbers in the summary tree
rates
A numeric vector containing the mean posterior clock rate for each node
RateTable_Means_1p_Clades
was created by running get_clockrate_table_MrBayes(tree1p)
and then adding a "clade" column. It can be produced by using the following procedure:
1) Import tree file:
data("tree1p")
2) Produce clock rate table with, for instance, mean rate values from each branch in the tree:
rate_table <- get_clockrate_table_MrBayes(tree1p, summary = "mean")
write.csv(rate_table, file = "rate_table.csv", row.names = FALSE)
3) Now, manually add clades using, e.g., Excel:
3.1) Manually edit rate_table.csv, adding a "clade" column. This introduces customized clade names to individual nodes in the tree.
3.2) Save the edited rate table with a different name to differentiate from the original output (e.g., rate_table_clades_means.csv).
4) Read the file back in:
RateTable_Means_1p_Clades <- read.csv("rate_table_clades_means.csv")
head(RateTable_Means_1p_Clades)
tree1p
for the tree from which the clock rates were extracted.
get_clockrate_table_MrBayes
for extracting a clock rate table from a tree.
clockrate_summary
, clockrate_dens_plot
, and clockrate_reg_plot
for examples of using a clockrate table.
A data set containing the mean clock rates for a tree with 3 clock partitions, such as the output of get_clockrate_table_MrBayes
but with an additional "clade" column added, which is required for use in clockrate_summary
and clockrate_dens_plot
.
data("RateTable_Means_3p_Clades")
data("RateTable_Means_3p_Clades")
A data frame with 79 observations on the following 5 variables.
clade
A character vector containing the clade names for each corresponding node
nodes
A numeric vector for the node numbers in the summary tree
rates1
A numeric vector containing the mean posterior clock rate for each node for the first partition
rates2
A numeric vector containing the mean posterior clock rate for each node for the second partition
rates3
A numeric vector containing the mean posterior clock rate for each node for the third partition
RateTable_Means_3p_Clades
was created by running get_clockrate_table_MrBayes(tree3p)
and then adding a "clade" column. It can be produced by using the following procedure:
1) Import tree file:
data("tree3p")
2) Produce clock rate table with, for instance, mean rate values from each branch in the tree:
rate_table <- get_clockrate_table_MrBayes(tree3p, summary = "mean")
write.csv(rate_table, file = "rate_table.csv", row.names = FALSE)
3) Now, manually add clades using, e.g., Excel:
3.1) Manually edit rate_table.csv, adding a "clade" column. This introduces customized clade names to individual nodes in the tree.
3.2) Save the edited rate table with a different name to differentiate from the original output (e.g., rate_table_clades_means.csv).
4) Read the file back in:
RateTable_Means_3p_Clades <- read.csv("rate_table_clades_means.csv")
head(RateTable_Means_3p_Clades)
tree3p
for the tree from which the clock rates were extracted.
get_clockrate_table_MrBayes
for extracting a clock rate table from a tree.
clockrate_summary
, clockrate_dens_plot
, and clockrate_reg_plot
for examples of using a clockrate table.
A clock Bayesian phylogenetic tree with clock rates from a single clock partition (partition 1 here), imported as an S4 class object using treeio::read.beast()
.
data("tree_clock1")
data("tree_clock1")
A tidytree
object.
This example tree file was produced by analyzing the data set with a single morphological partition from
tree_clock2
for another BEAST2 tree object with clock rates from partition 2 for this same dataset.
tree3p
for another tree object with 3 clock partitions from Mr.Bayes.
tree1p
for another tree object with a single clock from Mr.Bayes.
get_clockrate_table_BEAST2
for extratcing the poserior clock rates from BEAST2 tree objects.
A clock Bayesian phylogenetic tree with clock rates from a single clock partition (partition 2 here), imported as an S4 class object using treeio::read.beast()
.
data("tree_clock2")
data("tree_clock2")
A tidytree
object.
This example tree file was produced by analyzing the data set with a single morphological partition from
tree_clock1
for another BEAST2 tree object with clock rates from partition 1 for this same dataset.
tree3p
for another tree object with 3 clock partitions from Mr.Bayes.
tree1p
for another tree object with a single clock from Mr.Bayes.
get_clockrate_table_BEAST2
for extratcing the poserior clock rates from BEAST2 tree objects.
A clock Bayesian phylogenetic tree, imported as an S4 class object using treeio::read.mrbayes()
.
data("tree1p")
data("tree1p")
A tidytree
object.
This example tree file was produced by analyzing the data set with a single morphological partition from Simões & Pierce (2021).
Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.
tree3p
for another tree object with 3 clock partitions.
get_clockrate_table_MrBayes
for extratcing the poserior clockrates from a tree object.
A clock Bayesian phylogenetic tree, imported as an S4 class object using treeio::read.mrbayes()
.
data("tree3p")
data("tree3p")
A tidytree
object.
This example tree file was produced by analyzing the data set with 3 morphological clock partitions from Simões & Pierce (2021).
Simões, T. R. and S. E. Pierce (2021). Sustained High Rates of Morphological Evolution During the Rise of Tetrapods. Nature Ecology & Evolution 5: 1403–1414.
tree1p
for another tree object with a single clock partition.
get_clockrate_table_MrBayes
for extratcing the poserior clockrates from a tree object.
Write character partitions as separate Nexus files (for use in BEAUti)
write_partitioned_alignments(x, cluster_df, file)
write_partitioned_alignments(x, cluster_df, file)
x |
character data matrix as Nexus file (.nex) or data frame (with taxa as rows and characters as columns) read directly from local directory |
cluster_df |
cluster partitions as outputted by |
file |
path to save the alignments. If |
no return value
# Load example phylogenetic data matrix data("characters") # Create distance matrix Dmatrix <- get_gower_dist(characters) # Find optimal partitioning scheme using PAM under k=3 partitions cluster_df <- make_clusters(Dmatrix, k = 3) # Write to Nexus files ## Not run: write_partitioned_alignments(characters, cluster_df, "example.nex")
# Load example phylogenetic data matrix data("characters") # Create distance matrix Dmatrix <- get_gower_dist(characters) # Find optimal partitioning scheme using PAM under k=3 partitions cluster_df <- make_clusters(Dmatrix, k = 3) # Write to Nexus files ## Not run: write_partitioned_alignments(characters, cluster_df, "example.nex")
This function was adopted and modified from treeio::write.beast to export a list of trees instead of a single tree.
write.beast.treedata(treedata, file = "", translate = TRUE, tree.name = "STATE")
write.beast.treedata(treedata, file = "", translate = TRUE, tree.name = "STATE")
treedata |
An S4 class object of type |
file |
Output file. If |
translate |
Whether to translate taxa labels. |
tree.name |
Name of the trees, default |
Writes object type treedata
containing multiple trees to a file or file content on screen
#Load file with multiple trees ## Not run: trees_file = system.file("extdata", "ex_offset.trees", package = "EvoPhylo") posterior_trees_offset = treeio::read.beast(trees_file) #Write multiple trees to screen write.beast.treedata(posterior_trees_offset) ## End(Not run)
#Load file with multiple trees ## Not run: trees_file = system.file("extdata", "ex_offset.trees", package = "EvoPhylo") posterior_trees_offset = treeio::read.beast(trees_file) #Write multiple trees to screen write.beast.treedata(posterior_trees_offset) ## End(Not run)