Taxon and Digitization

Madhuca longifolia from Singapore, Royal Botanic Garden Edinburgh

This series of posts is looking at articles that have appeared in Taxon and deal with broader issues than the journal’s main fare of taxonomic treatments.  A timely article appeared a few months ago measuring how effective digital specimen images are in taxonomic research (Phang et al., 2022).  This study grew out of the COVID pandemic when access to collections was almost nonexistent in many parts of the world.  The authors were working on the genus Madhuca (Sapotaceae) for the Flora of Singapore.  Two were based in Scotland and one in Singapore, but all had the same access problem.  In this report they evaluated images of Madhuca collections from both Singapore and the adjacent Malaysian state of Johore.  The images were found in a number of JSTOR Global Plants and herbarium databases.  Another major source was the Royal Botanic Garden Edinburgh’s Sapotaceae Resource Centre (SRC) database, which also stores specimen and field images taken by researchers, often of material not otherwise available online. 

The overall result of the study was that while specimen images were valuable research tools, they could not provide all the information needed for a thorough taxonomic analysis.  In many cases, micro-morphological characters could not be seen clearly in digital images, even at high resolution, and these are precisely the characters often needed in defining the boundaries among species.  To provide quantitative results, the researchers rated the images as of high, medium, or low utility.  For the high ranking, an image needed to capture at least 5 qualitative and 3 quantitative macro-molecular characters.  Medium needed to meet the first criterion but not the second, and low had to have 4 qualitative characters. 

The report provides in-depth analysis of the results that I’ll just briefly recap here.  Not surprisingly, the specimen images found in herbarium databases ranked more highly than those in the SRC that were taken by researchers.  It wasn’t always a matter of the image quality that was the problem, but the absence of a ruler tool, like the one found in JSTOR Global Plants, or at least a measurement bar as a standard.  The authors also reported:  “Of the 219 specimen images examined, 125 (comprising 103 researcher images and 22 institutional images) had macromorphological characteristics hidden from view due to the low resolution of the image, the way the specimen had been mounted onto the herbarium sheet or had portions placed in an unopened capsule on the sheet” (p. 1068).  Herbarium databases varied in terms of the image resolution available.  Better quality images could probably be obtained by contacting the institution, but this often wasn’t possible during COVID, and in any case, would add steps to the taxonomist’s work. 

Other findings were that fruit and seed measurements were difficult for all images, with very few fruiting specimens available.  This was in part because there were usually only a few specimens for each species under study, a reminder of the crying need for continued collection, particularly in biodiverse areas with many species having either small populations or limited ranges.  Over all, the taxonomists were only able to identify 22% of the Madhuca species from researcher images, that number rose to 34% with institutional ones, and to 94% with physical examination of the specimen when the Singapore herbarium was again accessible.  This last figure resulted not only from microscopic examination of specimens, but from being able to closely examine flowers and fruits and open fragment packets.  The major message of the study is that online resources are very valuable for taxonomic investigations, but don’t come close to replacing specimens themselves.

It’s important to remember that there are many uses for online collections that don’t necessarily require such close study.  Virtual access is sufficient for many uses, especially when the access is through an information-rich database that’s easy to use.  Usability was the focus of a post on the Natural Sciences Collection Association website written by Teagan Reinert and Karen Bacon of the National University of Ireland, Galway.  It is a brief, but valuable recap of what determines a database’s rating anywhere from “very easy” to “usable but frustrating.”  It articulates what many of us experience subliminally as we search for specimens. 

To take the frustrating end of the spectrum first, there are sites that may have long loading time, low-quality images, return many irrelevant results, or “just don’t work.”  Sometimes a keyword search is handy, but the advanced search should be easy to find, and it’s great if searches by date range or cultivated species are easy to do.  Databases like those of New York Botanical Garden or the Royal Botanic Garden Edinburgh are given high marks because all the basic information on a specimen is shown without having to click further or open several screens.  The latter is particularly cumbersome if many specimens need to be accessed.  As for images, good quality is definitely a plus; also useful is an easy way to tell if there are differently sized images available.  For each image it should be clear what the license status is, such as public domain or creative commons license.  I find this very helpful, as is the last suggestion in the post:  “How the image or specimen data should be cited should be stated very clearly on the website either on its own easily accessed or clear labelled page, or on the specimen’s landing page. . . . But that information can sometimes be hidden in Frequently Asked Questions or on the bottom of a page that isn’t entirely relevant.”  Amen.

Reference

Phang, A., Atkins, H., & Wilkie, P. (2022). The effectiveness and limitations of digital images for taxonomic research. TAXON, 71(5), 1063–1076. https://doi.org/10.1002/tax.12767

Taxon and the Flora of Brazil

Title page of first part of Flora Brasiliensis (1840-1906), Biodiversity Heritage Library

I belong to the International Association for Plant Taxonomy (IAPT) not because I am a plant taxonomist, but because I want to learn about the field.  Its journal Taxon is particularly helpful in this regard, though I can’t say that I read it cover to cover.  The articles I find most interesting take a broad view of the field, delve into its history, or deal with nomenclatural issues.  In this series of posts, I’ll highlight a few recent items I found particularly informative, beginning with one having a hefty 980 contributors, the Brazil Flora Group (2022).  The author list is shorter, but still lengthy, and suggests the massive collaboration underlying the creation of a Brazilian flora.

The impetus for the project began in response to the Global Strategy for Plant Conservation (GSPC), adopted in 2002 by the parties to the Convention on Biological Diversity.  The plan’s first target was to publish a list of the world’s plants by 2010, with plants broadly defined as including algae and fungi.  In 2010 Brazil published an online “List of the Species of the Brazilian Flora” and a “Catalogue of Brazilian Plants and Fungi,” which documented 40,989 species of algae, fungi, and plants.  By that time the second target of an online World Flora by 2020 was looming.  Since Brazil is a large country with great biodiversity, these tasks were themselves correspondingly massive, especially since the last Flora Brasiliensis was published from 1840-1906 and ran to 15 volumes, documenting 19,629 species in Brazil. 

An online information system was created for the Brazilian flora species list, and it was further developed for the task of constructing an online flora.  Between 2010 and 2015, 430 specialists were involved in adding new species to the list, updating determinations, and contributing descriptive data.  In the following five years, 554 more taxonomists joined the project, then called Brazilian Flora 2020.  The Taxon paper is essentially a review of the results of this work, including what it revealed about future needs in discovering and protecting Brazil’s biodiversity and supporting the taxonomic work necessary to accomplish these goals.  Meanwhile there was another project called “Plants of Brazil: Historic Rescue and a Virtual Herbarium for Knowledge and Conservation of the Brazilian Flora—Reflora.”   It’s aim was to develop a virtual herbarium that included specimens not only from the large collection at the Rio de Janeiro Botanic Garden, but also from the Royal Botanic Gardens, Kew and the National Museum of Natural History, Paris.  These are among the many European institutions with significant collections of tropical plant specimens because of their former colonial enterprises.  The Reflora infrastructure made it possible to upload images, curate specimen records with updated identifications, and add geographic and distribution data.  As this work progressed specimens from many more collections were added so that researchers now have access to millions of specimens through the Reflora Virtual Herbarium

As a result of this work, the Brazil Flora Group was able to report that by December 31, 2020 there were 46,975 known algae, fungi, and plants in Brazil, with 19,669 endemics.    These include 6,320 fungi, 4,993 algae, 1,610 bryophytes, 1,403 ferns and lycophytes, 114 gymnosperms, and 35,549 angiosperms.  This is hardly a complete count; some areas are under collected.  The most substantial collections have come from the Cerrado and also the Atlantic Rainforest, an area that has suffered from overdevelopment with loss of native vegetation.  Regions like the Caatinga and Pantanal are less well sampled.  There was also great disparity in the rates of increase in different types of species.  Amazingly, there was a 75% rise in the number of known fungal species between 2010 and 2020, an indication of the fungal richness yet to be discovered.  Not coincidentally the largest mycological collections are in the three states where the greatest number of mycologists are located.  Angiosperm numbers, on the other hand, only increased by 7%.  Interestingly, the number of known species in the heavily sampled Brazilian Atlantic Rainforest and Cerrado actually deceased between 2015 and 2020.  Yes, new species were named, but identification of synonymies and deletion of erroneous records more than offset this increase.

The article, as befits the massive size of the project it describes, is filled with data and insights.  The Brazil Flora Group focused on a number of areas that need attention if future GSPC targets are to be met.  One major issue is the need to build a stronger taxonomic infrastructure in the country, concomitant with its biodiversity.  With almost 1,000 taxonomists involved in the flora, expertise from around the world has been marshalled and will continue to support Brazil’s efforts, but it is no substitute for expertise within the country.  What is called the “taxonomic impediment,” lack of facilities and taxonomists, is a worldwide problem, as is the second area of concern: georeferencing.  Only about half the occurrence records in the Global Biodiversity Information Facility (GBIF) have coordinates and only a third of these have uncertainty information, which is essential for spatial analyses.  Geographic data are particularly important in conservation efforts.  As was mentioned earlier, also of concern is the issue of under-sampled areas, and along with this, species and families that have been neglected taxonomically.  So there is much work to do, but still, this report is also a celebration of wonderful accomplishments.

Reference

Group, T. B. F., Gomes-da-Silva, J., Filardi, F. L. R., Barbosa, M. R. V., Baumgratz, J. F. A., Bicudo, C. E. M., Cavalcanti, T. B., Coelho, M. A. N., Costa, A. F., Costa, D. P., Dalcin, E. C., Labiak, P., Lima, H. C., Lohmann, L. G., Maia, L. C., Mansano, V. F., Menezes, M., Morim, M. P., Moura, C. W. N., … Zuntini, A. R. (2022). Brazilian Flora 2020: Leveraging the power of a collaborative scientific network. TAXON, 71(1), 178–198. https://doi.org/10.1002/tax.12640

Plant Specimens in the Future

A sample of herbarium images used for training an AI model for recognizing leaf shape (Hussein et al., 2019)

In the first post in this series, I described ideas Mason Heberling (2022) presents in his paper on the role of herbaria in plant trait studies, including an outline of why specimens have been almost ignored by ecologists and evolutionary biologists in studies of genetic and environmental influences on plant characteristics.  After this survey and a convincing argument for why specimens would be valuable in this research, he discusses how herbaria could become centers for such work.  He begin this topic with a great quote from the corn systematist Edgar Anderson (1952):  “Making a good herbarium record . . . is something like trying to stable a camel in a dog kennel” (p. 47).  I imagine Anderson attempting to wrestle a corn plant, or parts thereof, onto a herbarium sheet.  But Heberling is also thinking about how plant trait studies might need not one specimen, but a number representing different parts of a plant’s life cycle or the variations found within a population.  He is realistic in considering how much more work this would mean for herbarium staff and how much more space would be needed to store all these specimens.  That’s why he argues for a reframing of the work of herbaria, which might seem like overreaching for an article on plant traits, but he makes clear that this type of research ties in nicely with the herbarium community’s present interest in the extended specimen network (ESN):  digitally tying together many types of genetic, ecological, and morphological data with specimen data (Lendemer et al., 2019). 

Heberling deals with what information should be on a herbarium sheet for trait research beyond the basics of plant name and collector as well as date and location.  Phenological data—presence of flower or fruit—is becoming more standard, but what if leaf areas have been measured or chemical analysis done?  This information is usually fed into trait databases such as Morphobank, but is not at present often linked to a specimen.  This is why Heberling calls for the participation of the functional trait researchers in building the ESN.  It would be helpful in convincing this community of the importance of vouchers to substantiate trait data.  This might not always be feasible, but at least photographic evidence could be linked.  In the other direction, it’s important for herbarium curators to be involved in developing the Open Traits Network that is attempting to standardize and integrate trait data.          

Heberling contends that rather than declaring specimens as too imperfect a form of evidence to use in trait studies, researchers should seek to change collection practices:  “We must ask how herbaria can better address the needs of new and unanticipated specimen uses.  What information do we wish that collectors a century ago had provided with their specimens?”  Then he gets more daring:  “I propose an open reevaluation of the very collection event” (p. 108).  Decisions have to be made in the digital age about what information is on the specimen itself and what is linked to it.  As one example, he cites work that he and his colleague Bonnie Isaac (2018) have done in linking online specimen data to information including photographs they input into iNaturalist at the time of a collection event. 

As to what information is actually recorded on the specimen, Heberling notes that research shows that data fields in taxonomic software are well-standardized, but the information in those fields may not be.  Anyone who compares label data to the digital record can attest to this.  Sometimes the problem may be just a random input error, but there is also the problem of fields without controlled vocabularies, or OCR difficulties, or a particular individual’s own take on what goes where.  These problems are being resolved as best practices become more widely standardized and employed.

Then there is also the issue of intensive collecting for life history or extent of variation studies.  Heberling admits that this cannot be done in all circumstances and requires budgeting for increased curatorial work and storage that might not be possible for all institutions.  But these issues definitely need to be part of conversations on the future of herbaria.  He ends by enumerating several moves that will lead to increased effectiveness and use of plant collections including archiving population-level and ontogenetic or developmental variation.  Also there needs to be more environmental context on labels.  This has become more common with habitat descriptions and associated species often listed, but available light and other abiotic conditions should be noted, and to make this information optimally useful, a standardized vocabulary should be adopted.

Also, the ENS should be built into specimen collection itself, as in the iNaturalist case; collectors should leverage the ability to create “born digital” specimens as much as possible.  The accession should also include storage of material such as silica dried leaved in fragment packets for future research requiring destructive testing.  Finally, and perhaps most importantly, collection should be planned well into the future in order to track traits at a time of climate and habitat change.  This outline for the future is a great way for Heberling to end his article that is both rich in data and in good ideas about why herbaria are important and how they can become even more significant in the future.   

References

Anderson, E. (1952). Plants, Man and Life. University of California Press.

Heberling, J. M. (2022). Herbaria as Big Data Sources of Plant Traits. International Journal of Plant Sciences, 183(2), 87–118. https://doi.org/10.1086/717623

Heberling, J. M., & Isaac, B. L. (2018). INaturalist as a tool to expand the research value of museum specimens. Applications in Plant Sciences, 6(11), e01193. https://doi.org/10.1002/aps3.1193

Hussein, B. R., Malik, O. A., Ong, W.-H., & Slik, J. W. F. (2021). Automated Extraction of Phenotypic Leaf Traits of Individual Intact Herbarium Leaves from Herbarium Specimen Images Using Deep Learning Based Semantic Segmentation. Sensors, 21(13), 4549. https://doi.org/10.3390/s21134549

Lendemer, J., Thiers, B., Monfils, A. K., Zaspel, J., Ellwood, E. R., Bentley, A., LeVan, K., Bates, J., Jennings, D., Contreras, D., Lagomarsino, L., Mabee, P., Ford, L. S., Guralnick, R., Gropp, R. E., Revelez, M., Cobb, N., Seltmann, K., & Aime, M. C. (2020). The Extended Specimen Network: A Strategy to Enhance US Biodiversity Collections, Promote Research and Education. BioScience, 70(1), 23–30. https://doi.org/10.1093/biosci/biz140

Digital Circulation: A Different Experience

A reminder that specimens have depth: Pine folders at the A.C. Moore Herbarium, University of South Carolina, Columbia

In the last post, I discussed the digitization of specimen data to make it more available to researchers.  I think it’s important to state the obvious here:  digital examination of specimens is not the same as studying the specimen itself.  To begin with, it is a different phenomenological experience.  Sitting at a table or standing at counter strewn with specimens, gives a sense of being in a particular kind of environment, one with metal cases filled with plants and with the faint order of plant material.  Then there’s the physical experience of a specimen:  touching it if necessary, smelling it, viewing it from all different angles, using a hand lens or dissecting microscope.  These actions enrich observational practice and provide more information about the plant.

Though there are similarities in making an image of a book page and a specimen sheet, printed material is much flatter than a specimen.  Even though the plant material is pressed, it still has depth.  Pressed leaves aren’t completely flat, to say nothing of stems, flowers, and fruits.  Leaf surfaces slope away from veins; spines and hairs stick out from stems; flowers refuse to completely cede their dimensionality; and stems are not lines but columns that can have ridges.  There are complex textures everywhere in plant material, and some sense of that is lost in even the best photograph (Flannery, 2012). 

The argument could be made that some textural information has already been lost in pressing the specimen, and this is definitely true.  However, digitization compounds the problem.  There are new imaging techniques including reflectance transformation imaging (RTI) that give a greater sense of the depth in a specimen by integrating a large number of images.  The equipment and related software are complex, the amount of data generated massive, and the process time-consuming—all translating into unmanageable expense.  This system is now mostly employed on works of art; using it to image millions of specimens is a dream. 

Still, the images now available digitally are of high quality, and while the experience is not the same as examining a specimen in real time, it can often provide the information a researcher needs.  Particularly helpful is being able to study a number of specimens from different sources at the same time; and software is being developed to make this easier.  The International Image Interoperability Framework (IIIF) community was originally composed of those in art museums and libraries with the aim of creating better software for accessing and working with images from multiple institutions.  Those involved in natural history collections are now joining this group.  Not only can IIIF improve the way images are accessed and used, but collaboration between art and science institutions could lead to interesting new collaborations.   

Each herbarium uploads its own data and continues to be responsible for it.  In order to contribute to an aggregator like iDigBio or GBIF and have specimens circulate more broadly, data have to be in a particular format.  Curators are now aiming to make their data FAIRfindable in a variety of ways, accessible to a large audience, interoperable in platforms changing over time, and reusable into the future.  Each of these elements hides a host of problems, and to solve them will require continued investments.  Digital assets are wonderful but fragile things; they require as much curation as physical assets and in some cases more.  They have to be protected from damage and deterioration if they are to continue to circulate.  Some web interfaces are so user-friendly that it’s easy to forget the complexity of creating and maintaining them.

There are huge costs involved in digital collections and in facilitating new ways to make them useful with software to make it quicker and easier to query data.  This digital sophistication might seem counterintuitive to those who see natural history as an old-fashioned, outdated area of science.  Also counterintuitive is the idea that simple observation, looking at a specimen, can involve sophisticated technology and issues of dimensionality and phenomenology.  Observation is placed relatively low in the hierarchy of cognitive skills, yet has been recognized as a sophisticated research tool since the early modern period when botanists realized that careful observation was essential for learning about plants.  It was the only way forward in obtaining secure knowledge about a species.  What digital access allows is an entirely new level of observation, the ability to view an image without causing it any physical damage, to access many specimens of one species instantaneously, and to have colleagues in different institutions look at the same specimens in real time.  This communal aspect of digital collections is extremely important; it opens up a new form of image circulation. 

It is a paradox that in order to continue to share earth with such a diversity of organisms, we have to create an in-silico world where we experience nature not even second hand as we would in a herbarium, but removed even further onto a screen where the contact is only through the visual.  This digital world can be as fragile and easy to disrupt as an ecosystem, perhaps even more so.  It is a product of human ingenuity and must be sustained by that ingenuity if it is to survive, flourish, and circulate equitably and usefully.

Reference

Flannery, M. C. (2012). Flatter than a pancake: Why scanning herbarium sheet shouldn’t make them disappear. Spontaneous Generations: A Journal of the History and Philosophy of Science, 6(1), 225–232.