By Kristen St.John and Athanasios Velios
June 6-7, 2019, the Linked Conservation Data Consortium held its first in-person meeting at Stanford Libraries’ Conservation Lab in Redwood City, California. Twenty-two participants met to learn how terminology is utilized through Linked Data; to evaluate the current state of vocabularies used in Conservation; and to determine ways to enable sharing of conservation records through terminology. The workshop was arranged in three parts.
The first afternoon served as an introduction to basic technical concepts. Jon Ward of the Getty Vocabularies Program discussed the structure of Knowledge Organization Systems (KOS) which include controlled lists, glossaries, and thesauri. Then Athanasios Velios of the University of the Arts, London walked participants through key concepts behind Linked Data.
Linked Data relies on the Resource Description Framework (RDF) which is a structure for producing records using triples, linking two entities with a relationship (subject, property/relationship, object). Athanasios used an example from a conservation survey where a conservator noted that “this book has this spine lining.” The spine of the book is the subject, the spine lining is the object and the relationship between them can be formally expressed as “forms part of.” Such relationships are defined by a standard developed for cultural heritage collections known as the ICOM International Committee for Documentation-Conceptual Reference Model (CIDOC-CRM). Other relationships might be “used general technique” for linking the production of objects to the techniques used, or “consists of” for linking objects with their materials. Another relationship “has type” can be used to connect objects with vocabulary terms such as “book spine” and “spine lining.”
Then Professor Marcia Zeng of Kent State University elaborated on how vocabularies can be structured to be shared as Linked Data. She introduced the SKOS (Simple Knowledge Organization Systems) standard in which a distinction is made between the language term and the concept it represents. An example from the field of agriculture was presented: the word “rice” and the word “riz” both refer to the same concept of the seed of the plant Oryza sativa, and the concept of “rice” may have relationships to other concepts. For example, rice is a type of cereal and therefore it is considered as a “narrower concept” of cereal. Rice is also “related to” corn as they are both cereals. The structure allows for communities to use different words for the same concept while preserving these relationships. One could use either “rice” or “riz”, but the relationships to “cereal” and “corn” are maintained regardless. The same principle can be applied to conservation terms and concepts.
We wrapped up the afternoon with Jon Ward discussing the Getty Vocabulary Programs with particular reference to conservation terms in the Art and Architecture Thesaurus (AAT). The AAT is a rich source of conservation terminology and is already available for use in Linked Data applications. The Getty has been streamlining ways for people to contribute new terms and encourages the growth of conservation terminology in its resources.
On Friday morning, the goal was to move beyond introductory primers and to focus on the current state of terminology in conservation and options for increasing utilization through Linked Data. The day began with Kristen St.John of Stanford Libraries giving a recap of a short questionnaire on current use of conservation glossaries and thesauri, which was compiled as part of the LCD project (also see the article on p. 22 of NiC Issue 71 April 2019). Marcia Zeng discussed ways to combine and align terms from multiple thesauri. Specifying which terms (even when they are from different conservation vocabularies) refer to the same concept allows the sharing of records from multiple datasets even if they are produced using different vocabularies. This allows conservators to use their preferred vocabularies while ensuring that the resulting records can be retrieved by colleagues using search terms from differing vocabularies.
Athanasios then more fully discussed specific definitions of relationships from the CIDOC-CRM ontology in advance of Eleni Tsouhoula’s (Foundation of Research and Technology- Hellas) presentation on the Backbone Thesaurus project. The Backbone Thesaurus enables multiple thesauri to be aligned to the CIDOC-CRM ontology. John Graybeal talked about BioPortal (a resource based out of Stanford’s School of Medicine), developed over several years with funding from the U.S. National Institutes of Health. BioPortal allows for searching across multiple vocabulary sources to see how people are using and defining terms in the biomedical field. This tool could be adapted for use by cultural heritage organizations which have multiple Linked Data-ready thesauri or glossaries.
In the afternoon, we turned to discussions. Athanasios gave a recap of how controlled vocabularies and the CIDOC-CRM are used together. The combination of using terms that relate to concepts in a SKOS thesaurus (to search for types of things) along with the structure of defined relationships provided by CIDOC-CRM (to search for how things are related) allows for complex querying across broad groups of information or datasets.
Athanasios then led a discussion on options for alignment that Marcia outlined in her Friday presentation. There are several structural ways to relate different vocabularies to each other. Given the breadth of the Getty’s AAT and their willingness to accept conservation additions to their vocabularies, the relationship between the AAT and other vocabularies was a discussion point, as was the possibility of other structures/relationships. This was discussed further in the technical break-out session that followed.
The break-out sessions were divided into the afore-mentioned technical group, a group looking at glossaries and thesauri used in the book and paper field (the largest group of conservators present at the workshop), and a group looking at general conservation thesauri.
The technical group considered different tools for aligning terms from different thesauri and publishing conservation vocabularies as Linked Data. It also defined one option as designating a “superhero” thesaurus that one could add to or to which one could align subordinate terminologies. Another set of options relies on different levels of centralization vs independence for thesauri/glossaries relating to each other or to a reference vocabulary.
In the groups reviewing vocabularies and glossaries, we found a mixed environment. Some are either already available as Linked Open Data (AAT, Language of Bindings Thesaurus) or soon will be (RBMS Binding Terms). Others are either small enough (such as Dictionary of Book and Paper Conservation) or close enough (such as CAMEO where each term has a URL) that making them available as Linked Data is not a giant effort. We also discussed the fact that many of us use terminology that is not reflected in these glossaries, but which we have picked up through other resources and experiences.
In closing the workshop, we found that while we covered considerable ground by introducing concepts and defining options, no clear path forward has yet been developed to address the complexity of aligning and publishing existing conservation vocabularies as Linked Data. Post workshop, LCD Consortium members continue to move ahead with exploring terminology options and strategies. We are currently considering a simple, low-maintenance method of sharing aligned vocabularies in a distributed fashion, thus making sure that no single organization is burdened with maintenance in the long term. We are also looking at producing a flowchart to map decision making for methods of aligning terms from different vocabularies.
Recordings of many talks are available on the Linked Conservation Data website: https://www.ligatus.org.uk/lcd/meeting/terminology
We will be meeting in London on September 12-13, 2019 to discuss Modeling for Conservation Data. For more information visit https://www.ligatus.org.uk/lcd/meeting/modelling. For registration visit forms.gle/eh96yn5ogxQ6VNdF8
Dr Athanasios Velios is reader in documentation at the University of the Arts London as part of Ligatus, working on the documentation of conservation practice and modelling data for heritage conservation. He was trained as a conservator and has a PhD in computer applications to conservation. He was the webmaster of the International Institute for Conservation from 2009 until 2017.
Kristen St.John is head of conservation services for the Stanford Libraries. She was previously collections conservator at UCLA and special collections conservator for Rutgers. She has an MLIS and an advanced certificate in conservation from the University of Texas at Austin. Her interests include preservation education, the preservation and dissemination of conservation documentation, and historic bookbinding materials.