Detailed, accurate evolutionary trees that reveal the relatedness of living things can now be determined much faster and for thousands of species with a computing method developed by computer scientists and a biologist at The University of Texas at Austin.
They report their new method in the journal Science.
Since Charles Darwin, biologists have constructed evolutionary trees to explain the relatedness of plants, animals and other organisms. The science of figuring out these trees, known as systematics, has progressed significantly in the last two decades largely due to advances in computation, genetics and molecular biology.
However, many of the relationships among the world's 1.5 million described species (the true number could be 10 million or more) remain to be figured out, and surprises still remain. Figuring out these relationships requires analyzing large amounts of molecular data, such as DNA and protein sequences.
Computer scientist Tandy Warnow, biologist Randy Linder and their graduate students have created an automated computing method, called SATé, that can analyze these molecular data from thousands of organisms, simultaneously figuring out how the sequences should be organized and computing their evolutionary relatedness in as little as 24 hours.
Previous simultaneous methods like Warnow and Linder's have been limited to analyzing 20 species or fewer and have taken months to complete.
"SATé could completely change the practice of making evolutionary trees and revolutionize our understanding of evolution," says Warnow, professor of computer science and lead author of the study.
In addition, SATé can accurately analyze DNA sequences that are rapidly evolving. These sequences have been previously avoided due to concern that the resulting trees would be poor.
Before a tree, or phylogeny, can be determined, DNA and protein sequence data must be organized. This process is called alignment. Key to Warnow and Linder's program is its ability to quickly and accurately align these data.
"Our process is novel because it rapidly and simultaneously aligns sequences and looks for the best phylogenies," says Linder, associate professor of integrative biology. "The old way of doing this for a large number of sequences was basically to align the data once, but we can look at many arrangements to find better ones."
This is important because different alignments can lead to significantly different phylogenies, and scientists must find the phylogeny that best represents the evolutionary relationships among the species in question.
For their paper, Warnow, Linder and their students tested SATé using computer-generated data and real biological data. The biological data had been previously aligned manually by other experts.
The new phylogenies closely match those existing, both validating the method's potential, and, in some cases, validating the evolutionary trees themselves.
"Instead of doing things by hand, evolutionary biologists can now trust our automated program," says Warnow. "It will enable the creation of much more accurate trees, especially for the Tree of Life, which deals with hundreds of thousands of gene sequences from the millions of species on Earth."
"Warnow and Linder have created a method that speeds up the process and removes any subjectivity," says Michael Braun, an evolutionary biologist at the Smithsonian Institution not associated with this project. "This is a major step forward for evolutionary biology."
Computer science graduate student Kevin Liu is first author on the paper. Students Sindhu Raghavan and Serita Nelesen also contributed to the project and co-authored the paper.