Parks, Matthew , Cronn, Richard , Liston, Aaron .
Separating the Wheat from the Chaff: Mitigating the Effect of Noisy Data in Plastome Phylogenomic Analyses.
Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating "noisy" data with misleading phylogenetic signal has also increased, and such data may disproportionately influence the topology of weakly supported nodes and lineages with rapid radiations and/or elevated rates of evolution.
We investigated the influence of phylogenetic noise on large data sets using plastome sequences from 102 species of Pinus and six Pinaceae outgroups. Nucleotide sites in our 142 kbp alignment were ranked by variability and serially removed in 100 bp partitions. Maximum likelihood topologies were determined for each alignment partition (minus removed variable sites) and the corresponding cumulative partition of removed variable sites. Topologies were compared using the Robinson-Foulds test to determine the point of topological consistency, predicted to be the point where "noisy" data has largely been removed from the alignment. In our alignment, this coincided with the removal of the most variable 3.8% of sites (5.5 kbp). Nonetheless, tree-wide bootstrap support remained high (median value >99%) until removal of the most variable 6% (8.7 kbp) of sites, suggesting that phylogenetic noise did not impact overall nodal support.
However, closer investigation of two taxa with historically unresolved phylogenetic positions (the four species of subsection Contortae and the morphologically distinctive, flat-needled Pinus krempfii), revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for Pinus krempfii peaked as noisy sites were removed, subsection Contortae resolved most strongly when all sites were included. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in our plastome analysis are consistent for Pinus krempfii but inconsistent for subsection Contortae, indicating that removal of noisy sites can result not only in increased resolution for poorly supported nodes, but serve as a tool for identifying highly supported, but likely incorrect topologies.
Log in to add this item to your schedule
1 - Oregon State University, Department of Botany & Plant Pathology, Cordley Hall, Corvallis, OR, 97331-2092, USA
2 - USDA Forest Service, 3200 SW Jefferson Way, Corvallis, OR, 97330, USA
3 - Oregon State University, Department Of Botany & Plant Pathology, 2082 Cordley Hall, Corvallis, OR, 97331-2902, USA
Presentation Type: Oral Paper:Papers for Sections
Location: Lindell C/Chase Park Plaza
Date: Wednesday, July 13th, 2011
Time: 10:45 AM