This is a last in a series of posts [1,2,3] on the plant systematist Vicky Funk and her recent review article on collections-based research. Since Funk is a research scientist and curator in the National Museum of Natural History’s (NMNH) Botany Department, it isn’t surprising that she begins a section on the future use of collections with stats on herbaria. The NMNH, part of the Smithsonian Institution, is home to the U.S. National Herbarium, with a collection of over five million specimens. The goal there and at many herbaria is to digitize the data for all specimens and in some cases to also image them. If this could be done at every herbarium, the data would serve as a potent research tool not only for taxonomists but for ecologists, conservationists, and researchers in other fields who never before considered using the information about plants available in herbaria.
One burgeoning field based on the availability of digital specimen images is computer vision and machine learning techniques that make automated plant identification possible. It is sort of face recognition for plants and is developing to the point that herbarium specimens can be sorted rather well, though the processes are hardly at the point where identification is as good as that done by taxonomists. However, machine sorting could be employed as a way to narrow down the number of specimens a researcher would have to look at in hunting for new species. One recent report the computer was able to distinguish between moss groups better than the human eye could.
Funk cites several successful digitization projects, noting that the Atlas of Living Australia is a particularly comprehensive one that has resulted in online access to all records of Australian plant specimens held in the country’s national herbaria. Australia is also at the forefront in developing software tools to assist researchers in extracting as much information as possible and in the most effective ways. However, Funk sees the future as going beyond national or even regional databases: “A Central Portal so all resources are available to everyone is critical. It is particularly important that these efforts are making the data and images available to researchers in the countries where the specimens were collected, thereby supporting research in those countries” (p. 185). She is referring to the fact that the bulk of specimens collected in developing countries, particularly during their colonial pasts, are held in European and North American herbaria. A first attempt to make these specimens broadly available was the Andrew W. Mellon Foundation funding of type specimen digitization, the results now accessible through JSTOR Global Plants along with a great deal of supporting botanical literature.
But what Funk visualizes is something more comprehensive, and as an example, she describes a project funded by the Powell Center of the US Geological Service. It focuses on the approximately 2500 species of North American Compositae (Asteraceae) and the location data on hundreds of thousands of specimens aggregated from GBIF (includes information from institutions outside the US), BISON (from US government institutions) and iDigBio (US private institutions). Funk notes that this data is not only aggregated but “cleaned” to make sure it is of high quality, an issue that critics of aggregation emphasize. The data is then integrated with environmental and geophysical data on geochemistry, climate, topography, etc., as well as phylogenetics—including gene sequences from GenBank. Think of the power of this: linking specimens with sequence and environmental data. This is truly a harbinger of a new age in collections-based research. It is amazing that ten years ago, just digitizing data and imaging specimens was considered a feat, with the Paris Herbarium’s plan to digitize most of its specimens considered daring. Now the assembly line method they used has become relatively common, and other large herbaria have substantial percentages of their collections digitized and imaged.
Linking natural history collections to genetic data banks means uniting the two great arms of bioinformatics. It is a biologist’s dream come true, and this connection will become even more powerful when environmental data is brought into the mix—a much more complex process. But Funk has seen the digital world burgeon and has been one of the forces behind making it applicable to systematics. She has also helped make systematics valuable to other fields such as phylogenetics and the growing discipline of phylogenomic—being able to sequence and compare entire genomes. This is the result of new sequencing techniques that utilize fragmented DNA, just the type available in herbarium specimens. Drawing on an example from the Asteraceae, Funk cites a study in which the entire genomes of 93 of 95 Solidago, goldenrod, herbarium specimens were sequenced with the plants ranging in age from 5-45 years (Beck & Simple, 2015).
In closing Funk notes: “One exciting trend is the developing field of Integrative Systematics where collections-based systematics is combined with extensive field studies, phylogenetics, phylogenomics, detailed morphological studies, biogeographic inferences and diversification analysis to present a more comprehensive global” (p. 187). She also argues for the maintenance of collections in educational institutions to insure the instruction of future generations of systematists; the digitization of cleared leaf slides, anatomy slides, pollen images, chromosome count images, and illustrations to fill out the information available to researches; and finally a series of symposia on the Tree of Life where systematists can map out a research agenda for the rest of the 21st century.
References
Beck, J. B., & Semple, J. C. (2015). Next-Generation Sampling: Pairing Genomics with Herbarium Specimens Provides Species-Level Signal in Solidago (Asteraceae). Applications in Plant Sciences, 3(6), 1500014.