This series of posts is looking at articles that have appeared in Taxon and deal with broader issues than the journal’s main fare of taxonomic treatments. A timely article appeared a few months ago measuring how effective digital specimen images are in taxonomic research (Phang et al., 2022). This study grew out of the COVID pandemic when access to collections was almost nonexistent in many parts of the world. The authors were working on the genus Madhuca (Sapotaceae) for the Flora of Singapore. Two were based in Scotland and one in Singapore, but all had the same access problem. In this report they evaluated images of Madhuca collections from both Singapore and the adjacent Malaysian state of Johore. The images were found in a number of JSTOR Global Plants and herbarium databases. Another major source was the Royal Botanic Garden Edinburgh’s Sapotaceae Resource Centre (SRC) database, which also stores specimen and field images taken by researchers, often of material not otherwise available online.
The overall result of the study was that while specimen images were valuable research tools, they could not provide all the information needed for a thorough taxonomic analysis. In many cases, micro-morphological characters could not be seen clearly in digital images, even at high resolution, and these are precisely the characters often needed in defining the boundaries among species. To provide quantitative results, the researchers rated the images as of high, medium, or low utility. For the high ranking, an image needed to capture at least 5 qualitative and 3 quantitative macro-molecular characters. Medium needed to meet the first criterion but not the second, and low had to have 4 qualitative characters.
The report provides in-depth analysis of the results that I’ll just briefly recap here. Not surprisingly, the specimen images found in herbarium databases ranked more highly than those in the SRC that were taken by researchers. It wasn’t always a matter of the image quality that was the problem, but the absence of a ruler tool, like the one found in JSTOR Global Plants, or at least a measurement bar as a standard. The authors also reported: “Of the 219 specimen images examined, 125 (comprising 103 researcher images and 22 institutional images) had macromorphological characteristics hidden from view due to the low resolution of the image, the way the specimen had been mounted onto the herbarium sheet or had portions placed in an unopened capsule on the sheet” (p. 1068). Herbarium databases varied in terms of the image resolution available. Better quality images could probably be obtained by contacting the institution, but this often wasn’t possible during COVID, and in any case, would add steps to the taxonomist’s work.
Other findings were that fruit and seed measurements were difficult for all images, with very few fruiting specimens available. This was in part because there were usually only a few specimens for each species under study, a reminder of the crying need for continued collection, particularly in biodiverse areas with many species having either small populations or limited ranges. Over all, the taxonomists were only able to identify 22% of the Madhuca species from researcher images, that number rose to 34% with institutional ones, and to 94% with physical examination of the specimen when the Singapore herbarium was again accessible. This last figure resulted not only from microscopic examination of specimens, but from being able to closely examine flowers and fruits and open fragment packets. The major message of the study is that online resources are very valuable for taxonomic investigations, but don’t come close to replacing specimens themselves.
It’s important to remember that there are many uses for online collections that don’t necessarily require such close study. Virtual access is sufficient for many uses, especially when the access is through an information-rich database that’s easy to use. Usability was the focus of a post on the Natural Sciences Collection Association website written by Teagan Reinert and Karen Bacon of the National University of Ireland, Galway. It is a brief, but valuable recap of what determines a database’s rating anywhere from “very easy” to “usable but frustrating.” It articulates what many of us experience subliminally as we search for specimens.
To take the frustrating end of the spectrum first, there are sites that may have long loading time, low-quality images, return many irrelevant results, or “just don’t work.” Sometimes a keyword search is handy, but the advanced search should be easy to find, and it’s great if searches by date range or cultivated species are easy to do. Databases like those of New York Botanical Garden or the Royal Botanic Garden Edinburgh are given high marks because all the basic information on a specimen is shown without having to click further or open several screens. The latter is particularly cumbersome if many specimens need to be accessed. As for images, good quality is definitely a plus; also useful is an easy way to tell if there are differently sized images available. For each image it should be clear what the license status is, such as public domain or creative commons license. I find this very helpful, as is the last suggestion in the post: “How the image or specimen data should be cited should be stated very clearly on the website either on its own easily accessed or clear labelled page, or on the specimen’s landing page. . . . But that information can sometimes be hidden in Frequently Asked Questions or on the bottom of a page that isn’t entirely relevant.” Amen.
Phang, A., Atkins, H., & Wilkie, P. (2022). The effectiveness and limitations of digital images for taxonomic research. TAXON, 71(5), 1063–1076. https://doi.org/10.1002/tax.12767