Competency G – Joshua D Simpson ePortfolio

Demonstrate understanding of basic principles and standards involved in organizing information such as classification and controlled vocabulary systems, cataloging systems, metadata schemas or other systems for making information accessible to a particular clientele.

A fundamental mandate for information organizations is to provide their patrons with logical and flexible ways to find the resources they’re looking for. Information professionals are constantly called upon to show users how to employ these tools and methods to locate the specific items they wish to access, as well as others in the subject areas that they’re interested in. As Linda Smith points out, however, it’s important to remember that “a seemingly simple [reference] question may turn into a puzzle requiring the use of multiple sources, including catalogs and bibliographies” (2016). Whether or not the user is able to successively execute their own searches, connecting them with the particular documents and materials which satisfy their information needs is a central duty of librarianship. Librarians at a minimum must know how to look up and interpret human-readable catalog and other records in several different systems and interfaces. This responsibility of course requires a way to find those documents, which in turn depends on implementing organizational frameworks, e.g.- classification call numbers based on subject categories, for example, or a metadata schema utilizing a controlled vocabulary.

The ALA Glossary of Library and Information Science offers this definition for a classification standard: “a particular series or system of classes arranged in some order according to some principles or concepts, purpose or interest or some combination of such” (2013). Classes are first ordered systematically to accommodate known works and future subjects of literature and other materials. Whether in a library’s stacks or in its catalog, the classification’s conceptual schedule is meant to allow items to be grouped initially on broad subject relatedness and then incrementally move towards closer subject proximity, also called collocation. This is meant to impose a “permanent and helpful order” (Librarianship Studies, 2015) over the collection, one which enables discoverability of library materials on shelves through catalog records which list a topic and its respective call number. The order inherent in a classification standard is expressed in a sequential notation code such as call numbers which make it possible to logically arrange items on shelves and determine the exact location of a given item. The notation will usually be comprised of a class number which provides a class (subject) designation, a book number representing the item’s author, and a collection number which indicates which collection it belongs to. The use of widely adopted classification standards like the Dewey Decimal Classification (DDC) and the Library of Congress Classification (LCC) permits bibliographic records and information to be understandable and accessible across institutions and to different parties who also utilize these systems.

Classification schemes have until recently have most often been organized way of a hierarchical or enumerative, tree-type structure that contains a “full set of entries for all concepts” (Chan, 2005). In this approach, the process for cataloging begins with identifying the broad category of the item from the set of concepts and then analyzing whether further class subdivisions are needed to describe the subject more precisely. The last step in classification is to assign a “mutually exclusive and exhaustive” subject category. This final class corresponds to one call number; an item cannot be assigned to another class or have multiple call numbers. Library classification can also be based on a faceted approach, which means that an item can belong to multiple categories simultaneously – usually simpler, discreet category terms – with its final subject representation being a post-coordinate compound of relevant categories chosen by a searcher.Among the advantages faceted classification has over its hierarchical sibling is more versatile information retrieval. This is due to the way that multi-category faceted classes and more access points improve the chances that a larger number of search queries will match to terms found in a faceted record. Users can combine multiple terms and fields in a search to add and / or filter attributes of the main terms and improve both precision and recall of results. Established hierarchical classification standards like DDC and LCC are still widely used – most public and school libraries organize their collections according to the DDC system, while academic libraries use LCC – and appear to have long-term staying power. Nonetheless, faceted classification systems and controlled vocabulary metadata approaches have been gaining ground, either augmenting or replacing altogether older hierarchical classification systems. A macro trend in classification has been towards combining fixed conceptual taxonomies with facets that allow scope control and add further specificity and semantic depth to the topic.

Sequential classification and metadata powered online cataloging coexist. Even though they’re based on different concepts and technologies, there’s no reason why a metadata catalog record can’t have fields for classification call numbers. It’s commonplace for libraries to use different classification and bibliographic control systems simultaneously. This speaks in part to a continued need for a tried-and-true way to find an item and its kindred subject items on a library shelf via classification, and a concurrent need to create (from the cataloger’s point of view) and read (from the user’s perspective) more descriptive surrogate records in online catalogs and databases. Within one university library there may be catalogers using metadata schema such as MARC, Dublin Core or BIBFRAME to create bibliographic records for local or shared network catalogs, processors applying LCC call numbers based on the determination of catalog subject(s), technical services staff adding PREMIS and MODS metadata to newly digitized materials, and archivists using Encoded Archival Descriptions (EAD). Discovery tools in a sense are layered over the products of this work, offering integrated access to and search of library catalogs and other bibliographic indexes, electronic scholarly content through serials vendors, and in some cases archival / special collections holdings. These tools have obviated the need for cataloging to be built on and within a library’s ILS / LMS. A primary goal in both classification and controlled vocabulary systems is to make it easier for users (including librarians) to search and retrieve records by one of several access points – metadata fields like author and subject and publisher – in a catalog / database / search engine / finding aid. A secondary goal for records created using metadata and controlled vocabularies is sharing these bibliographic entries in cooperative and union catalogs.

The world of library metadata and cataloging began a paradigmatic shift towards computer mediated and networked systems, and away from card cataloging, with the advent of the MARC standard in the late 1960s. This opened the door to bibliographic indexing based on the submission of machine-readable records which used controlled vocabularies and metadata schemas, an area that continues to evolve over 50 years later. Metadata design and controlled vocabularies are frequently planned and implemented in an integrated manner, though the process is complex enough that it often requires the work of several people including a “taxonomist” (an expert in controlled vocabulary) and a metadata “architect.” Catalogers choose taxonomic or other types of subject representation for each item, and the metadata and the assigned vocabulary are then fused in a bibliographic record in an index or database. Developers of OPACs must have an intimate understanding of the metadata format, which fields will be used as access points and facets, search interface specifications, when to use a vocabulary-based input or a free text search, how records will be displayed, and other details. The ultimate beneficiaries of these efforts are anyone who wishes to search the library catalog, and to a lesser extent, participants in shared network catalogs and their users.

The earliest use of metadata arguably goes back to ancient libraries in which the names of texts were written on stone tablets. In the 21^st century metadata has somewhat different connotations; namely, that it is used to describe digital files and resources like photos and websites. There are three main types of metadata used in a library context: administrative, structural and descriptive. These metadata schemas help librarians organize electronic resources, facilitate interoperability between software versions and operating systems, and support archiving and preservation, among other tasks. More relevant for bibliographic cataloging and “resource discovery” is descriptive metadata. NISO furnishes a short list of what resource discovery objectives metadata can help accomplish: “allowing resources to be found by relevant criteria; identifying resources; bringing similar resources together; distinguishing dissimilar resources; and giving location information” (2004). These functions fulfill the same core purpose of classification but in more flexible and user-friendly ways.

A metadata standard provides a format for creating bibliographic records, acting like a container or wrapper for describing the most important attributes of an item to be cataloged / indexed. The most important element is a resource identifier which uniquely identifies the object. In theory, a metadata “architect” can design a system just on the basis on the structure of the given database and the needs requirements of end users. But beyond just having a list of all data and access points and their requisite formats, it’s an advantage for metadata creators to know how the taxonomy maps to particular fields of the metadata-defined record. The inverse is also true: taxonomists should know what the metadata schema includes and what demands or limitations it might place on their controlled vocabularies. It is their subject vocabulary, based on an internal logic and structure (hierarchical, ontological, etc.) they created, which will end up comprising the values for many key bibliographic features in the metadata record. Heather Hedden observes that there are often additional metadata fields that are “beyond the scope and definition of ‘taxonomy’ that are nevertheless made available to the end-user to filter/refine results alongside the other, taxonomy facets,” including for important fields like author/creator, date, title keyword, text keyword, file format, etc. (2017). This highlights the need metadata architects and taxonomists to work in tandem along with UX designers creating the search interface, all developing their respective aspects of the cataloging system in an integrated way. Unfortunately, most information organizations do not have all of these specialists on staff; selecting a vendor to complete the work may be the best option.

MARC (Machine Readable Cataloging) records require a good deal of specialization for human catalogers to work with directly – though there are many tools which allow for editing MARC records, converting to and from other metadata formats, etc. For example, WorldCat offers an application – the WorldShare Record Manager – which “allows [users to] create new and enrich existing items in WorldCat with efficient, record-at-a-time metadata management for your physical and electronic materials using either a MARC 21 editor or a Text View editor” (2020). It has also been easy for several decades to use catalog copy from a WorldCat bibliographic record that was likely first created using MARC. Dublin Core greatly reduced the complexity of “resource description” by creating a much easier to read, minimal set of metadata elements, and the resulting XML schema has propagated widely. The Library of Congress is promoting a relatively recent bibliographic metadata schema called BIBFRAME, and it has enjoyed some adoption by prominent academic libraries.

“Crosswalking” and “metadata harvesting” refer to ways to improve interoperability between different XML schema and to automatically aggregate metadata descriptions from multiple sources, respectively. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) facilitates data exchange on the internet and through this has expanded access to library collections. OAI-PMH has established a strong foothold on digital archives, institutional repositories and digital libraries, among other domains. RDA (Resource Description and Access), released in 2010 as the successor to AACR2, has, with the backing of the Library of Congress rapidly become a very popular schema for descriptive library cataloging. By creating “a single, streamlined, and logically consistent model that covers all aspects of bibliographic data and that at the same time brings the modeling up-to-date with current conceptual modeling practices” (2021), and supporting linked data, RDA value vocabularies are able represent RDA entities, elements, relationship designators, and controlled terms in RDF. RDA was built to conform to IFLA’s Library Reference Model which itself “seeks to reveal the commonalities and underlying structure of bibliographic resources” (LRM 2.1, 2017).

There are many examples of controlled vocabularies – thesauri, subject headings, taxonomies, ontologies and more – and I have to admit that I sometimes find myself losing sight of the forest for all the trees. I look forward to getting hands on experience in this area of librarianship so that I can master the basics, clear up some of the lingering conceptual fog, and find out I’m drawn to this discipline. The Library of Congress Subject Headings (LCSH) is a very robust and widely used controlled vocabulary that the majority of libraries in the US use to catalog books. FAST (Faceted Application of Subject Terminology) merges the rich LCSH vocabulary (300,000+ terms) with a simplified syntax. This provides catalogers with an easy-to-learn and apply general subject vocabulary approach to describing resources, and users with navigation friendly, faceted records. There are many other vocabularies used for contextually specific settings, like MESH (Medical Subject Headings) and the Art & Architecture Thesaurus, where “more nuanced access to the literature of a particular field” (Smith & Wong, 2016) is needed. The core feature of all controlled vocabularies is the use of authorized terms which have been chosen by the developers of the vocabulary. These can be stored in in a list, an index or database and referred to as a thesaurus, taxonomy, ontology, authority file, etc. The main practical effect is that a cataloger or other record creator can only use the exact terms from the authorized list to describe a resource. With controlled vocabulary cataloging, however, a record can be “tagged” with multiple subject headings. This stands in contrast to fields which may also be found in metadata records that are assigned a natural language, “free text” or keywords data type, like abstracts or full document search. The domain of classification and cataloging is dynamic, spurred by continuing innovations in online catalog design and the proliferation of new and evolving XML metadata schemas. Smith and Wong discuss “next generation” catalogs which are becoming more user centered by offering “single search box, state-of-the-art Web interface, enriched content (book cover images, user tags, reviews), faceted navigation (refining results based on dates, languages, formats, locations, etc.), relevancy ranking of search results, did you mean …? and recommendations of related materials” (2016).

Underscoring the critical role that catalogers play within systems, the ethical dimensions of their work, and the repercussions of their decisions, Sheila Bair describes catalogers as the “foundation of all library service, as they are the ones who organize information in such a way as to make it easily accessible” (2005). A number of writers have weighed in on the power wielded (often unconsciously) by metadata system designers, vocabulary generators and catalogers and the potential for resulting biases in the outputs of the system. Ferris points out that catalogers – and, I would add, metadata and vocabulary creators – have an ethical responsibility to uphold the “integrity” of the catalog. Their highest purpose is to make sure that “the library’s catalog is a reliable source of current, coherent, and objective information that is appropriate to the needs of the catalog’s users” (2008), and this mission adds value to the bibliographic control process, ultimately improving resource findability (Ferris, 2008). Bair argues that there is a risk that this power can be used to deny access to information, or misrepresent people and groups with devaluing language, whether intentionally or not (2005). The recognition of a need for a discourse and consolidation around a code of ethics related to cataloging has been growing over the past couple of decades. An ALA sponsored e-Forum in 2017 entitled “Power that is Moral: Cataloging and Ethics”) summarized one of the core problems: “instructions when applying subject headings are framed in the arguably unattainable prescription to strive for neutrality” (2017). The e-Forum explored a number of related topics about cataloging, ethics and user communities and participants agreed that if a document laying out guidelines on these issues were to be written, it should “be flexible, written in natural, non-jargony, language; the purpose of such a document would be to help catalogers weigh options rather than prescribe cataloging practice” (2017). Patricia Kennedy offers a related but different tack on these questions, agreeing that designing and applying cataloging systems is not intrinsically neutral, but arguing for the recognition of subjective decision-making in metadata design: “Acknowledging the importance of perspective is at the heart of effective facet analysis for metadata application and inclusion” (2008).

References

Chan, L.M. (2005). Library of Congress Subject Headings: Principles and application. Westport, Conn: Libraries Unlimited.

Haider, S. (2021). Glossary of Library & Information Science. In Librarianship Studies and Information Technology. Retrieved April 10, 2021 from https://www.librarianshipstudies.com/2015/04/glossary-of-library-information-science.html

Hedden, H. (2017). Metadata and Taxonomies. In Hedden Information Management. Retrieved April 6, 2021 from http://www.hedden-information.com/metadata-and-taxonomies

IFLA Library Reference Model (LRM). (2021). Retrieved April 6, 2021 from https://www.ifla.org/publications/node/11412

Implementation of the LRM in RDA. (2017). In RDA Steering Committee. Retrieved April 2, 2021 from http://www.rda-rsc.org/ImplementationLRMinRDA

Kennedy, P. (2008) Manifestations of metadata: from Alexandria to the Web – old is new again. In The Australian Library Journal. https://doi.org/10.1080/00049670.2008.10722461

Levine-Clark, M., & Dean, T. (2013). ALA Glossary of Library and Information Science (4^th Edition). Chicago: ALA editions.

Power that is Moral : e-Forum Summary. (2017). In ALCTS News. Retrieved April 10, 2021 from

Power that is Moral: e-Forum Summary

Riley, J. (2017). Understanding Metadata: What is Metadata, and What is it for? Retrieved April 06, 2021 from https://groups.niso.org/apps/group_public/download.php/17446/Understanding%20Metadata.pdf

Smith, L.C. & Wong, M.A. (2016). Reference and information services: An introduction. Santa Barbara, California: Libraries Unlimited.

World Share Record Manager. (2020). In OCLC.org. Retrieved April 5, 2021 fromhttps://help.oclc.org/Metadata_Services/WorldShare_Record_Manager

Evidence for Competency G

Evidence 1

For INFO 202 (“Information Retrieval System Design”), a team project introduced us in a hands-on fashion to the core concepts and practical aspects of creating a controlled vocabulary for a target audience. It was the largest assignment in the course, and unfolded in stages over most of the semester. We began by choosing a specific kind of MLIS student based on one of the career pathways defined by the SJSU iSchool faculty as our target audience: those who wished to work in academic libraries. Among a host of presumptions we made about this group were that they had a strong interests in trends in scholarly research, in teaching information literacy in an academic setting, in understanding how patrons of academic libraries want to access and use collections, databases, and archived materials, and in the research and study habits of particular user groups. Creating a profile of this group and their academic interests was necessary to select 9 scholarly articles (one was already chosen for us) relevant to these MLIS students. I chose two articles and wrote summaries for them for the group to analyze in order to generate lists of concepts that the articles addressed. These concepts – some broad, some narrow – were used to create a list of “draft terms,” i.e. – the strongest terms and phrases in the concept list. We grouped these draft terms together by theme / conceptual proximity, scrubbing synonyms or unnecessary variations, and from these groups we selected the best “descriptors” to comprise our controlled vocabulary. After finalizing the controlled vocabulary, we went back through our list of 10 articles and chose the most appropriate terms from the vocabulary. All members of the team took part in each phase of generating and filtering terms and we progressed by consensus for the long duration of the project. It would have been interesting to see what the next steps might look like in using our vocabulary as an authority index in an Information Retrieval system, but I think that would entail a fairly large step up in terms of technical know-how.

INFO-202-Project-2-1

Evidence 2

During a course on Archives and Manuscripts (INFO 256) that I took we spent a significant amount of time discussing finding aids and MARC records, both of which have some relevance to this competency. While we glanced at the use of MARC21 files to create finding aids in online archives and the provision in MARC to specify the URI / location of a finding aid, for the most part we looked at the two subjects separately. The assignment I present here was an exercise in creating a MARC record for a letter from the John Swett Papers held at UC Berkeley’s Bancroft Library. We were provided with scans of a letter and postcard from an “A. Harris” to “Mrs. John Swett” and asked to create values for a MARC record based on our best effort to read and interpret the letter. We were given a Single Letter Cataloging Form which contained notes on specific MARC fields. It ended up being a helpful point of reference. The assignment was both an exercise in patient close reading of a manuscript written in difficult handwriting with an eye towards identifying the relevant metadata to encode with MARC tags. While the General Note (500) and Summary, Etc. (540) made up the bulk of the MARC record, there were many others used that collectively provided a helpful crash course in MARC tags and indicators.

INFO-256-Cataloging-Assignment-Josh-Simpson