Digitizing Collections

2 iDigBio

The Digital Data in Biodiversity Research Conference at Ann Arbor, Michigan was cosponsored by the University of Michigan and the iDigBio project, which deals with the digitization of natural history collections at non-government institutions in the United States. iDigBio is a 10-year project now in its sixth year. As Larry Page its director noted, it is designed to provide the infrastructure necessary to store and distribute the results of natural history specimen digitization efforts and also offer training and tools to support these projects. In addition, it aims to encourage development of a community to further this work and to ensure that these electronic resources are maintained and upgraded in the future. That is obviously a tall order, and just how tall became clearer during the two-day conference.

The first general sessions set the stage with Maureen Kearney of the Smithsonian arguing for the importance of “liberating” data from the paper silos where they have been kept and also for including paleobiological information to provide a longer view. Pam Soltis of the Florida Museum of Natural History at the University of Florida discussed the difficulties of linking heterogeneous data, for example, information on specimens, genomics, and phylogeny. Yes, there are data sets dealing with each for many species, but the challenge is to make it all available through one portal. Issues include locating disparate data and dealing with its patchiness and with format differences. There are also vagaries of taxonomic names and of finding ways to get these systems to talk to each other. Progress is being made, particularly in the automation of some phases, such as recording label data using optical recognition systems, but this work takes a great deal of time and money, and it’s never finished, as maintenance is a key issue.

Next came Donald Hobern, executive secretary of GBIF, the Global Biodiversity Information Facility to which the US contributes data in the form of information not only on specimens but on species occurrences. From the GBIF portal, researchers can create species checklists for particular areas and also access data on particular taxa. The GBIF network has over 700 million georeferenced occurrence records making it a massive resource. Organizationally, it is divided into geographic nodes, with each node responsible for inputting and maintaining its data. In the afternoon, I attended the session on the North American node, which includes contributions from Canada and the United States. There Hebern spoke again outlining the network’s three main goals. The first is to remove obstacles to collaboration in the sharing and use of biodiversity data, in other words, to provide tools that allow for uploading and maintaining data in a usable form. Second is to organize evidence of recorded occurrence of any species in time and space, that is, users should be able to access data on species occurrences worldwide or within a particular geographic area and timeframe. Finally, GBIF aims to support the development of a global virtual natural history collection. In one sense, this goal has already been met because there is so much data in GBIF from so many areas, but it is hardly complete in terms of extent or data richness. In order to function at such a large scale, GBIF can only provide limited information on each occurrence. However, the infrastructure that GBIF has created and is continuing to develop is a firm foundation for a richer and robust information system in the future. An indication of this is in Science Review 2017, its annual review of the scientific articles published over the past year using GBIF data. Along with this is a bibliography of these 438 peer-reviewed articles.

The next speaker presented still another acronym, or really two. Gerald “Stinger” Guala of the US Geological Service is director of both BISON (Biodiversity Information Serving Our Nation) and ITIS (Integrated Taxonomic Information System). BISON provides access to 375 million US occurrence records, including 275 million in GBIF. However, for US records, more data on some records are available than just what’s in GBIF. Essentially, BISON is a clearinghouse for US government information on natural history collections. It cleans the data, formats it, takes quality control measures, and allows for data discovery. One of its major services is providing checklists at the local, state and national levels; a user can draw a map around an area and get a species checklist for it. Datasets on particular areas or species are also downloadable. ITIS is more limited in scope; its aim is to provide stable nomenclature. It is linked to the Catalogue of Life, a worldwide database that publishes an annual checklist with over 1.7 species. The biggest difficulty for the latter, as discussed by its director Tom Orwell of the Smithsonian, is how to deal with synonyms. This is a tough problem for all taxonomy and for all biodiversity projects, as noted by Stepen Garnett and Les Christidis (2017) in a recent Nature article on how “taxonomic anarchy” impedes conservation efforts. To put it simply: it’s difficult to enforce regulations on an endangered species if its name changes.

These presentations were followed by two about Canadian projects; James Macklin spoke on CBIF, Canada’s GBIF node, and Anne Bruneau on Canadensys, which aims to provide richer information on species than that available in GBIF. Jon Coddington of the Global Genome Biodiversity Network (GGBN) then brought up a whole different set of issues, namely those involved in storing genetic information, both sequences and specimen data. And Martin Kalfatovic the program director of the Biodiversity Heritage Library (BHL) discussed its role in providing links to relevant literature. In all, this was a mind-bending session that helped me see the differences among the many portals I have come across as I try to educate myself botanically and technologically. In the next post, I’ll discuss some even more ambitious projects that move into the 3D realm.


Garnett, S. T., & Christidis, L. (2017). Taxonomy anarchy hampers conservation. Nature, 546, 25.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s