
iDigBio Portal
I recently went north, to Yale University, for the third annual Digital Data Biodiversity Research Conference, sponsored by iDigBio, the NSF-sponsored project to digitize natural history specimens. I attended the first of these conferences two years ago at the University of Michigan (see earlier post). Both were fascinating and informative, but also different from each other, in that the focus of attention in this field has moved beyond digitizing collections to using digitized collections. This seems a healthy trend, but as Katherine LeVan of National Ecological Observatory Network (NEON) mentioned, only 6% of insect collections have been even partially digitized, and Anna Monfils of Central Michigan University noted that iDigBio has information from 624 of 1600 natural history collections in the United States. Admittedly, it’s mostly small collections that aren’t represented, but Monfils went on to show that smaller collections hold larger than expected numbers of local specimens, providing finer grained information on biodiversity.
Despite the caveat about coverage, the results of the NSF funding is impressive and is leading to an explosion in the use of this data. It is difficult to keep up with the number of publications employing herbarium specimens as sources of information for studies on phenological changes, tracking invasive species, and monitoring herbivore damage. While the earlier conference included sessions on using data for niche modeling, the meeting at Yale also had presentations on how to integrate such data with other kinds of information. Integration was definitely a major theme, and two large-scale projects are front and center in this work. Nico Franz of Arizona State University is principle investigator in NEON, a massive NSF-funded project that includes 22 observatories collecting ecological data, including specimens, and then using that data in studies on environmental change. Franz noted that while other projects might collect data over short periods of time, NEON plans for the long-term and for building strong communities sharing and using that data.
Another large sale project, one headed by Yale professor Walter Jetz, is called Map of Life (MOL). Here again, integration is central to this endeavor that invites researchers to upload their biodiversity data and also to take advantage of the wealth of data and tools available through its portal. As the name implies, biogeography is an important focus, and users can search for distribution maps for species and create species lists for particular areas . As with many digital projects, this one still has a long way to go in terms of living up to its name, which implies a much broader species representation than is now available. In a session led by MOL developers, it became clear that the issue of how different kinds of data can be integrated is still extremely fraught. Even databases for different groups of organisms, vertebrates versus invertebrates for example, are difficult to integrate because important data fields are not consistent: what is essential in one field, might not be noteworthy at all in another or might be handled in a different way. Progress is being made, but as Roderick Page of the University of Glasgow notes, even linking to scientific literature is hardly a trivial task, to say nothing of more sophisticated linking.
While this may seem discouraging, there were also many bright points in the presentations. The massive Global Biodiversity Information Facility (GBIF) has, as I write, 1,330,535,865 occurrence records, that is, data on specimens and observations. Last year, GBIF launched an impressive new website and often adds new features. While the tools available through GBIF are not as sophisticated as with some other portals, it is still an incredible resource since iDigBio data is fed into GBIF as well as data from projects around the world. For example, data from the University of South Carolina, Columbia A.C. Moore Herbarium where I volunteer, which was fed into SERNEC and iDigBio, is now also available in GBIF, so researchers worldwide can access data on this collection that is particularly rich in South Carolina plants. This was not an easy undertaking—nothing in the digital world is—and it’s important to always keep that in mind as developers have flights of fancy about could be possible in the future.
Another conference highlight for me involved the use of sophisticated neural network software, such as that coming out of the Center for Brain Science at Harvard University. James Hanken, Professor of Zoology and Director of the Museum of Comparative Zoology at Harvard, reported on a project to scan slides of embryological sections and then use the neural network software to create 3-D reconstructions of the embryos. Caroline Strömberg of the University of Washington discussed a project to build a 3-D index of shapes for phytoliths, microfossils from grass leaves that can be more accurate for identifying species than pollen grains. Her lab has studied 200 species and has quantified 3-D shapes, even printing them in 3-D to literally get a feel for them. They used this information in a study of phytoliths from a dinosaur digestive track suggesting that grasses are older than previously thought. Others have questioned these results, so Strömberg’s group is now refining the identification process, measuring more points on the phytolith surface. Reporting on another paleontological study, Rose Aubery of the University of Illinois described image analysis done with Surangi W. Punyasena on plant fossil cuticle specimens to obtain taxonomic information about ancient ecosystems. What all these presentations had in common was the use of massive computational power to analyze 3-D images. At the first conference, reports of 3-D imaging were impressive, but now it is the analysis that has taken center stage. This is a good sign: all that data is proving valuable.