When I input label information from 3700 Arnica labels at the garden, I got to do more than glance at labels. They provided me an introduction to people who I then looked up on the web: Connie and John Taylor of Oklahoma State University whose labels included the names of their three children as collectors; Aven Nelson one of the important early botanists of Wyoming, Marcus Jones a controversial plant collector/mining engineer, and Rupert Barneby a long-time botanist at NYBG. Each had their own style of labeling, in part determined by the age in which they lived and what was expected in terms of essential information. Transcription is labor intensive, and in order to get as many labels done and into the database so they can be accessed by researchers, not all information is recorded. For example, the location is, but data on the habitat, such as nearby plants, are not. The rationale is that searches are usually by locale, so the habitat information is less likely to be investigated and can be added later.
There is a way to transcribe labels without human input, namely with optical character recognition (OCR) software, but this doesn’t work well, to say the least, with cursive handwriting. Even for typed labels, OCR cannot be relied upon to place all the data into correct fields. Inputting label information doesn’t just mean typing it out; it must be typed into the proper areas of the software program, the correct “fields,” so it can be accessed: collector name in one field, location in another, date of collection in a third, with each in proper format. OCR software, while it can “learn” to identify certain types of information, is hardly infallible, and data entries created this way need to be checked by a human. I’ve found that the OCR input needs so much editing that it is easier to type the information directly from the label. As more data becomes available electronically, and therefore is relied upon more by researchers, accuracy becomes more and more crucial. It is nice to be able to find out what specimens were collected in a particular area at a particular time without sifting through thousands of sheets, but only if the data being searched is reliable.
Now at NYBG I am creating records for newly mounted specimens and seeing a much wider variety of label styles from all over the world, with labels written in Spanish, Portuguese, and French, yet largely decipherable because I know what is expected to be on a label and because the Latin name is always an anchor. Going forward, the aim is that all specimens in the NYBG collection will be photographed and the label information digitized—the ultimate goal for all collections in the US and in other countries as well. This will make the information in herbaria and other natural history collections more widely available and accessible for uses that were not even anticipated when many of these specimens were collected.