BHL and Social Media

I have a Facebook account that I ignore. I go into it about once every six months with the intention of using it, but I can never figure out its attractions, so I abandon it yet again. However, I use Twitter a lot, not to communicate so much as to keep up on the doings at institutions that interest me such as botanical gardens, herbaria, and natural history museums. Along the way, I’ve found several people and institutions posting notable items and I follow them too. For example, Donna Young (@HerbariumDonna) of the World Museum of Liverpool tweets and re-Tweets great material, as does the herbarium at St. Andrews University, Scotland (@STA_herbarium). Needless to say, in light of my last post, I also follow the Biodiversity Library, BHL (@BioDivLibrary). This is how I can keep up with its blog and all its latest endeavors. Because it’s trying to engage with as large an audience as possible, BHL communicates through a variety of social media outlets, since, like me, people have different tastes in their favorites apps. In 2016 it added Instagram and Tumblr to its internet presence along with its more longstanding Twitter and Facebook accounts. In total, it had a 76% increase in followers between 2015 and 2016, suggesting that these efforts have been successful. Perhaps its most fruitful outreach has been through Flickr where it has posted over 100,000 images from its resources, but I’ll get back to that later. I also want to note that there was a 54% increase in the number of visits to BHL from other social media sites—almost 100,000 in all, indicating users are coming to BHL from a variety of platforms. The most notable is Pinterest; posts from its accounts provided for more than half this traffic. Obviously many Pinterest users posted images sourced from BHL directly or from its Flickr account. These numbers suggest the general expansion of the social media universe and particularly of BHL’s participation in it. They also indicate its sophisticated approach to outreach.

At the moment BHL’s efforts in this area are being substantially assisted through the work of five one-year interns in the National Digital Stewardship Residency (NDSR) developed by the Library of Congress in conjunction with the Institute of Museum and Library Services (IMLS). The five residents, now at the half-way point in their work, are at five different BHL member institutions. Pamela McClanahan at the Smithsonian Library has posted a user survey and will analyze the results, which are important to planning BHL’s future direction and where it will focus its resources. Ariadne Rehbein at the Missouri Botanical Garden has joined a Codergirl cohort in St. Louis and is also interviewing Flickr and BHL volunteer taggers about their work and how the work flows can be improved. These contributors to bettering BHL participated in a two-year grant from the NEH to develop a system for volunteers to identify and tag images in BHL volumes. This is a great example of a citizen science project where a pool of interested and committed individuals can help to enhance BHL.

At the Natural History Museum of Los Angeles County, Marissa Kings, along with several summer interns, is creating and editing metadata for the museum’s Contributions in Science publications in preparation for uploading these and other in-house publications to BHL. She is also exploring how recently digitized museum entomology specimens and related data can be linked to the relevant literature in BHL. I have very limited experience in this area, but I know enough to realize that none of this is trivial. Having well-defined workflows and metadata can make all the difference when it comes to linking different types of data. Another intern, Alicia Esquivel at the Chicago Botanic Garden, is doing statistical analyses to estimate the size of the total amount of biodiversity literature—a difficult task to say the least. But even a rough estimate would give some idea of what percentage of that literature is now in BHL, in other words, how big its impact could be on the biodiversity research community. At Harvard’s Museum of Comparative Zoology, the fifth NDSR resident, Katie Mika is learning about adding structured bibliographic metadata in Wikidata to improve the quality of references in the Wikimedia universe and to reconcile messy data. By adding BHL IDs to Wikidata, it becomes a more robust knowledge base and improves the discoverability of BHL’s content. As you can see from these brief synopses, the NDSR program is providing BHL with expertise in several key areas and allowing it to both strengthen its foundations and move in new directions.

Before I close this post on BHL and social media, I want to get back to Flickr. BHL’s Flickr site is quite literally a joy to behold. There are now over 100,000 images from BHL content in Flickr and that number continues to rise. The contributions are arranged in albums, with each album representing one publication. For example, the album for Curtis’s Botanical Magazine, Volume 136 from 1910 has 60 images. Searching for this item in BHL will provide all these images as well as the related text, but to just enjoy the beautiful illustrations, BHL at Flickr is the way to go. All these images are copyright free and downloadable. I should note that while I gravitate to the botanical literature, Audubon’s birds are here and Gessner’s animals. Needless to say, many people stumble upon this treasure trove when they are surfing in Flickr and don’t investigate further, don’t go into BHL at all. However, some do, and that is the point of social media outreach, the more the right outlets are used, the larger the payoff.

Flickr has turned out to be an effective tool for BHL. It is also a wonderful place for a biologist to spend time on one of those days when spreadsheets and graphs make no sense and it’s easy to forget what makes biology so wonderful. Another fun way to join in is with Color Our Collections. Users can download black and white illustrations contributed by member institutions and then satisfy their urge to color them in any way they want. This project, which has become popular on the web and is continuing, grew out of a social media exchange between a librarian from the New York Academy of Medicine and a committed citizen scientist/BHL tagger from Australia—a beautiful example of BHL’s global scope (Garner, Goldberg & Pou, 2016).

Reference

Garner, A., Goldberg, J., & Pou, R. (2016). Collaborative social media campaigns and special collections: A case study on #ColorOurCollections. RBM: A Journal of Rare Books, Manuscripts, and Cultural Heritage, 17(2), 100–117. https://doi.org/https://doi.org/10.5860/rbm.17.2.9663

Biodiversity Heritage Library (BHL): An Introduction

I began studying biology in the 1960s and went to graduate school when a literature review meant wrestling with huge volumes of Biological Abstracts. Not only were they physically difficult to deal with, but if my topic had a long history, I tediously had to comb many volumes. After a few hours of this research, I often suffered from a syndrome I called “library malaise,” an overwhelming urge to take a nap. It was reading the Biodiversity Heritage Library’s (BHL) annual report that brought these not-so-good old days to mind. I hadn’t thought about them in a long time, because at this point they’ve faded into oblivion. No self-respecting scientist runs to the library to search for references. Now the big problem is sifting through too many citations to find the most valuable. One way to home in on what’s needed is to use the right database or portal, and for me this is often BHL. That’s because my interests are in botany and the history of botany, areas in which BHL is strong. With this series of posts I’ll explore this amazing resource and why, since its founding in 2006, it has become so valuable.

BHL’s strong points are that it’s massive, well-organized, and committed to expanding its user base. The recently published BHL 2016 annual report gives collection statistics such as: 51,460,159 pages from 196,801 volumes digitized; over 175 million taxonomic names indexed; 1,162,346 unique users, up 10% from 2015. Two new members joined this year, BHL Australia and the Natural History Museum in Paris, bringing the total to 17. There were ten original members including the Smithsonian, Missouri Botanical Garden, and the National History Museum, London—all with sizeable digital collections and digitization expertise to get the enterprise going. The Smithsonian still plays a pivotal role, with the BHL project director, Martin Kalfatovic, being a Smithsonian librarian. From the list of original members, it’s obvious that the focus is on English-language literature, though with institutions in France, Brazil, Mexico, and the Netherlands having joined, this is changing, and of course, some of the older literature is in Latin. Since all the text in BHL is available as optical character recognition (OCR) text, it is at least somewhat translatable using Google Translate (another amazing tool for someone of my vintage).

What makes BHL particularly powerful is that it’s linked to several other rich portals, making its holdings available to a broad audience. One of its new affiliates this year is Internet Archive with which it has been collaborating from BHL’s inception. Much of what’s available through BHL is also available in IA, which is a much broader storehouse. This is also becoming true for the newer Digital Public Library of America (DPLA). While a biologist might go directly to BHL to find a resource like Linnaeus’s Species Plantarum, a student doing a project on Linnaeus might not be aware of BHL, but instead use DPLA or IA. In all three cases, they will find what they need. But portal hopping can be a nuisance. Each interface is different, and it helps to become familiar with one. I’ve used BHL enough that I’m comfortable with its search functions and other tools. It provides an easy way to create a PDF of an entire document or of selected pages from it. Downloading PDFs or JPGs of images is also easy, admittedly PDFs are easier, at least for the moment. BHL is promising updates on image processing and since it has improved its interface substantially over the years, this will in all likelihood happen.

Besides working to broaden its user base, BHL has not forgotten those for whom it was originally designed: the biodiversity research community. The pages in BHL are tagged with the taxon names they contain, which means that the entire library is searchable if a user is looking for a particular genus or species. The word “miraculous” comes to mind when I consider this, and I’ve had fun testing it out with my favorite species, Darlingtonia californica. It’s good to keep in mind that because everything in BHL is open source, much of its collection dates to before 1923 and thus is out of copyright. However, since taxonomy is very much a historical science, particularly in botany, it is important to be able to trace new names back to old ones, and BHL is crucial in doing this. Also, over the past several years it has been increasing its in-copyright holdings by agreements with a number of organizations such as Arnold Arboretum, the Field Museum, and the California Academy of Sciences to host digital copies of some of their in-copyright publications. BHL is also expanding in other ways as well. It partnered with the Smithsonian’s Field Book Project that had been digitizing the field notes of Smithsonian researchers. These are absolutely fascinating and contain valuable information on where and when organisms were sighted and specimens collected. BHL is now continuing this effort as the BHL Field Notes Project by not only hosting the already digitized materials, but getting 450,000 more pages online through a Digitizing Hidden Special Collections grant from the Council on Library and Information Resources.

If all these connections that BHL has made are impressive, there are still more, including major efforts in using social media to get the word out about the riches it holds. This aspect of the portal will be the subject of my next post.

Digitizing Collections

2 iDigBio

The Digital Data in Biodiversity Research Conference at Ann Arbor, Michigan was cosponsored by the University of Michigan and the iDigBio project, which deals with the digitization of natural history collections at non-government institutions in the United States. iDigBio is a 10-year project now in its sixth year. As Larry Page its director noted, it is designed to provide the infrastructure necessary to store and distribute the results of natural history specimen digitization efforts and also offer training and tools to support these projects. In addition, it aims to encourage development of a community to further this work and to ensure that these electronic resources are maintained and upgraded in the future. That is obviously a tall order, and just how tall became clearer during the two-day conference.

The first general sessions set the stage with Maureen Kearney of the Smithsonian arguing for the importance of “liberating” data from the paper silos where they have been kept and also for including paleobiological information to provide a longer view. Pam Soltis of the Florida Museum of Natural History at the University of Florida discussed the difficulties of linking heterogeneous data, for example, information on specimens, genomics, and phylogeny. Yes, there are data sets dealing with each for many species, but the challenge is to make it all available through one portal. Issues include locating disparate data and dealing with its patchiness and with format differences. There are also vagaries of taxonomic names and of finding ways to get these systems to talk to each other. Progress is being made, particularly in the automation of some phases, such as recording label data using optical recognition systems, but this work takes a great deal of time and money, and it’s never finished, as maintenance is a key issue.

Next came Donald Hobern, executive secretary of GBIF, the Global Biodiversity Information Facility to which the US contributes data in the form of information not only on specimens but on species occurrences. From the GBIF portal, researchers can create species checklists for particular areas and also access data on particular taxa. The GBIF network has over 700 million georeferenced occurrence records making it a massive resource. Organizationally, it is divided into geographic nodes, with each node responsible for inputting and maintaining its data. In the afternoon, I attended the session on the North American node, which includes contributions from Canada and the United States. There Hebern spoke again outlining the network’s three main goals. The first is to remove obstacles to collaboration in the sharing and use of biodiversity data, in other words, to provide tools that allow for uploading and maintaining data in a usable form. Second is to organize evidence of recorded occurrence of any species in time and space, that is, users should be able to access data on species occurrences worldwide or within a particular geographic area and timeframe. Finally, GBIF aims to support the development of a global virtual natural history collection. In one sense, this goal has already been met because there is so much data in GBIF from so many areas, but it is hardly complete in terms of extent or data richness. In order to function at such a large scale, GBIF can only provide limited information on each occurrence. However, the infrastructure that GBIF has created and is continuing to develop is a firm foundation for a richer and robust information system in the future. An indication of this is in Science Review 2017, its annual review of the scientific articles published over the past year using GBIF data. Along with this is a bibliography of these 438 peer-reviewed articles.

The next speaker presented still another acronym, or really two. Gerald “Stinger” Guala of the US Geological Service is director of both BISON (Biodiversity Information Serving Our Nation) and ITIS (Integrated Taxonomic Information System). BISON provides access to 375 million US occurrence records, including 275 million in GBIF. However, for US records, more data on some records are available than just what’s in GBIF. Essentially, BISON is a clearinghouse for US government information on natural history collections. It cleans the data, formats it, takes quality control measures, and allows for data discovery. One of its major services is providing checklists at the local, state and national levels; a user can draw a map around an area and get a species checklist for it. Datasets on particular areas or species are also downloadable. ITIS is more limited in scope; its aim is to provide stable nomenclature. It is linked to the Catalogue of Life, a worldwide database that publishes an annual checklist with over 1.7 species. The biggest difficulty for the latter, as discussed by its director Tom Orwell of the Smithsonian, is how to deal with synonyms. This is a tough problem for all taxonomy and for all biodiversity projects, as noted by Stepen Garnett and Les Christidis (2017) in a recent Nature article on how “taxonomic anarchy” impedes conservation efforts. To put it simply: it’s difficult to enforce regulations on an endangered species if its name changes.

These presentations were followed by two about Canadian projects; James Macklin spoke on CBIF, Canada’s GBIF node, and Anne Bruneau on Canadensys, which aims to provide richer information on species than that available in GBIF. Jon Coddington of the Global Genome Biodiversity Network (GGBN) then brought up a whole different set of issues, namely those involved in storing genetic information, both sequences and specimen data. And Martin Kalfatovic the program director of the Biodiversity Heritage Library (BHL) discussed its role in providing links to relevant literature. In all, this was a mind-bending session that helped me see the differences among the many portals I have come across as I try to educate myself botanically and technologically. In the next post, I’ll discuss some even more ambitious projects that move into the 3D realm.

Reference

Garnett, S. T., & Christidis, L. (2017). Taxonomy anarchy hampers conservation. Nature, 546, 25.