Specimen Labels: Digitization

custernybg

The two specimens of Xanthisma spinulosum on the right are from the 1874 Custer Expedition to the Black Hills  [NYBG Herbarium]

I’ve become somewhat familiar with herbarium sheets through my visits to herbaria in several countries. However, where I’ve seen the most sheets is at New York Botanical Garden (NYBG) where I volunteer for their digitization efforts. I’ve photographed specimens, transcribed labels, and created skeleton records for newly mounted specimens before they are photographed. Each process has taught me something different. I’ve photographed thousands of Asteraceae specimens, most from the western United States collected over the past 150 years. As I put each sheet into the lightbox, I glanced at the label, and got to know the names of collectors such as John Merle Coulter, Bassett Maguire, and Per Axel Rydberg. Some of these names were associated with NYBG, others were tied to Western collections, but trades brought these plants East, as did NYBG’s acquisition of “orphan collections” when other institutions decided to rid themselves of space-taking herbaria. That’s how three Wabash College specimens of Xanthisma spinulosum (on the same sheet) ended up at NYBG, two from a General Custer expedition to the Black Hills and another from the California collectors Sara and John Lemmon.

When I input label information from 3700 Arnica labels at the garden, I got to do more than glance at labels. They provided me an introduction to people who I then looked up on the web: Connie and John Taylor of Oklahoma State University whose labels included the names of their three children as collectors; Aven Nelson one of the important early botanists of Wyoming, Marcus Jones a controversial plant collector/mining engineer, and Rupert Barneby a long-time botanist at NYBG. Each had their own style of labeling, in part determined by the age in which they lived and what was expected in terms of essential information. Transcription is labor intensive, and in order to get as many labels done and into the database so they can be accessed by researchers, not all information is recorded. For example, the location is, but data on the habitat, such as nearby plants, are not. The rationale is that searches are usually by locale, so the habitat information is less likely to be investigated and can be added later.

There is a way to transcribe labels without human input, namely with optical character recognition (OCR) software, but this doesn’t work well, to say the least, with cursive handwriting. Even for typed labels, OCR cannot be relied upon to place all the data into correct fields. Inputting label information doesn’t just mean typing it out; it must be typed into the proper areas of the software program, the correct “fields,” so it can be accessed: collector name in one field, location in another, date of collection in a third, with each in proper format. OCR software, while it can “learn” to identify certain types of information, is hardly infallible, and data entries created this way need to be checked by a human. I’ve found that the OCR input needs so much editing that it is easier to type the information directly from the label. As more data becomes available electronically, and therefore is relied upon more by researchers, accuracy becomes more and more crucial. It is nice to be able to find out what specimens were collected in a particular area at a particular time without sifting through thousands of sheets, but only if the data being searched is reliable.

Now at NYBG I am creating records for newly mounted specimens and seeing a much wider variety of label styles from all over the world, with labels written in Spanish, Portuguese, and French, yet largely decipherable because I know what is expected to be on a label and because the Latin name is always an anchor. Going forward, the aim is that all specimens in the NYBG collection will be photographed and the label information digitized—the ultimate goal for all collections in the US and in other countries as well. This will make the information in herbaria and other natural history collections more widely available and accessible for uses that were not even anticipated when many of these specimens were collected.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s