An assignment discussing the use of Controlled Vocabulary against the ides of social tagging in metadata (Folksonomy). This assignment was part of the requirements for the class: Classification and Subject Indexing for the Diploma in Library in Information Science.
2. Should libraries discontinue using and maintaining controlled
subject vocabularies?
By Ryan Scicluna
Library Assistant
Outreach Department
University of Malta Library
Tel: 2340 2541
e-mail: ryan.scicluna@um.edu.mt
http://www.um.edu.mt/library
3. When it comes to information retrieval, one can use multiple search strategies to find
relevant material. However, the most common way to search is typing up keywords in
search engines and hoping to find something relevant. Most researchers do not know about
controlled vocabulary (CV) as a tool that information professionals use to group items
dealing with a similar topic under one self-describing heading. Unlike keyword searching,
which is using natural language to index items and trying to find as much as possible using
one specific term, CV is a specialized, technical way of performing searches using a
combination of preferred terms to retrieve results that are more precise. Gerald Salton
(1989) predicted the problem of Keyword searching over ten years ago in his research on
machine indexing when he noted that the output produced by high recall low precision
terms tends to burden the user with unmanageably large piles of retrieved material. Also,
most of the time the situation of searching with keywords and CV is presented as an
either/or decision, both for system designer and users of systems (Peters and Kurth, 1991).
Subsequently, keyword searching should be used in conjunction with CV subject searching.
Keyword searching looks for words anywhere in a document or bibliographic record and
this usually retrieves a large number of results, which can be exhausting or even
misleading. It is always best to begin a search using keywords, especially when the
researcher is not familiar with the topic being investigated. After exploring several results
one can then identify several subject headings (CV) used for a specific topic and perform a
search using the CV. This will retrieve few items but with a higher chance of relevant
material.
4. Considering all this, the debate is still raging on whether subject headings and CV should be
scrapped in favour of keyword searching. There are voices saying that subject headings are
too restrictive. They contend that more flexible ways of finding, such as semantic search,
bring fresh relationships to light relationships that could never be found using locked-in
controlled vocabularies. The creation of subject headings is expensive, and fewer users are
taking advantage of them (Badke, 2012). Because the creation and use of controlled
vocabularies is very labour-intensive, it has been claimed that thesauri are not cost
effective (Svenonius, 1986). Another argument in favour of this is the fact that CV has never
caught on outside the library community (Iglesias and Stringer, 2008).
Controlled Vocabulary (subject searching) vs Keyword searching
Harping (2010) defines CV as an information tool that contains standardized words and
phrases used to refer to ideas, physical characteristics, people, places, events, subject
matter, and many other concepts. Furthermore, in a world where access to information is
increasing exponentially, CVs are helpful since they allow the categorization, indexing and
retrieval of information (Harping, 2010). Harping (2010), continues by elaborating that CV
is encouraged because it is designed to create the greatest possible consistency among
cataloguers by limiting choices of terminology according to the scope of the collection and
the focus field being indexed.
CV defines lists of words in which certain terms are chosen as preferred terms, and their
synonyms act as pointers to the preferred terms. It improves technical communication and
provides consistency. Searching a phrase in a database is more efficient and precise when
working with CV since it brings all different terms related to one object together under a
single word or phrase, besides saving time to search under different synonyms for that
term.
5. Another serious problem, but a much harder one to document is the question of what is
being missed. Consider for example the many words we use to describe footwear such as
shoes, sneakers, loafers, heels, sandals, etc... In a large periodical database, if one is
searching using just one of the terms, the search will probably yield a few hits and the user
will have no hint that other related material is available when using other keywords/terms
(Mico and Popp, 1994).
Authors, creators and end users who are familiar with the terminology used within a
particular field, prefer using specialized CV because it makes it easier, both to assign
subject headings and to search by subject headings.
Different types of Controlled Vocabularies
As one can imagine there is a number of CVs used in different libraries, databases, etc...
These CVs are all specialized to particular priorities. One example of such a CV system is
the Library of Congress Subject Headings (LCSH) maintained by the Library of Congress,
which provides a more precise subject description than others. LCSH system has been
criticized because it lacks instructions, which lead to inconsistencies within the system.
However, most libraries use the LCSH because like most CVs, LCSH has a hierarchical
structure, with broader, narrower and related terms indicated for most headings (Rolla,
2009). This helps to classify a number of varied items into similar or related subjects.
Another example of CV is subject specific one. Unlike the LCSH, specialized vocabulary such
as the Arts and Architecture Thesaurus (AAT) is used for various disciplines specializing in
art, architecture, and cultural material in museums (Harping, 2010). Furthermore, the AAT
is defined by Reitz (2004) as a structured vocabulary for describing and indexing works of
visual art and architecture.
6. The vast increase of new terminology in the medical literature has been the major factor to
expand the medical terms of the MeSH vocabulary. The MeSH subject headings were
introduced in the 1960 by the United States National Library of Medicine to organize the
medical literature and facilitate the retrieval in this special field (Nelson, 2009). The
National Library of Medicine (2013) tends to the maintenance of the MeSH thesaurus, this
thesaurus is better suited to be used for indexing, cataloguing and searching for biomedical
and health-related information and documents.
Similarly, CINAHL database includes an authoritative index of nursing and allied health
subjects and is designed for nurses, allied health professionals, research nurse educators
and students. It covers a wide range of topics from nursing to consumer health and
alternative medicine.
Likewise, in 2007, the American Dental Association (ADA) developed the systematized
Nomenclature of Dentistry (Snodent), a vocabulary designed for use in the electronic
setting and use for the electronic health and dental records. This system provides
standardized terms for describing dental disease whilst capturing patients’ clinical details
and characteristics.
All these different CVs aid researchers who are studying or researching the specific topic,
but what about researchers who are not familiar with the terminology or researchers from
a different field trying to find interdisciplinary works? For example, a person searching for
heart diseases has a higher probability of searching using the keywords “heart disease”
than using any of the terms used by MeSH such as: “Atrial fibrillation”, “Coronary heart
disease”, “Myocardial infarction”, “Ventricular tachycardia”, etc... Again, this also relates to
the problem of missed items/content/information.
7. Folksonomy – an aid for Keyword searching?
If CV is so important, why is there a huge academic debate for the use of keyword searching
instead? The answer is in the search behaviour of researchers. Users have learned to use
keywords in multiple indexes (KWMI) searching instead of first finding the relevant subject
headings. The Principle of least effort may in fact partly explain why most users do not
know that the library catalogue has controlled vocabulary (Drabenstott, 1991). KWMI is a
broad search strategy that often returns long lists of results, creating an overload (Van
Pulis and Ludy, 1988). KWMI is a natural-language uncontrolled-vocabulary search, which
fails to group related materials together and as a result much valuable material [to the
user] may be missed (Bates, 2003).
This has given rise to some institutions using social tagging or crowdsourcing metadata
tagging for their subject/keyword descriptor fields. This is academically known as
Folksonomy. The structure of folksonomies emerges from users tagging information and
objects using personally meaningful terms, rather than terms selected from a controlled
vocabulary (Porter, 2011). Social tagging gave rise to many debates on the use and
effectiveness of CV. Folksonomy or tagging allows individuals to create value through
organization for themselves and for others (White, 2011).
The results of a study conducted by Dryad Repository to examine the use of vocabularies
used in journal publications in evolutionary biology indicated that no single vocabulary
was sufficient to describe the interdisciplinary field of evolutionary biology (White, 2012).
8. Conclusion
Librarians, researchers and system developers need to stop seeing consumers and patients
as passive recipients of terminologies and ask, instead, for help in developing the
terminologies (Smith, 2011). Libraries can no longer cling to decrepit, arcane, inward-
focused standards such as MARC (Machine Readable Cataloguing), not if the ultimate goal is
to be part of a great global sea of data (White, 2013). Social tagging allows for different
perspectives on a particular subject to be linked with other records of similar or relevant
material. This also provides multiple access points and thus increases the visibility of
particular records. Instead of discouraging CV, social tagging can be used in conjunction
with CV with cataloguers providing the subject headings while researchers provide their
own keywords for the particular material. The combination of text word and descriptor
(Fidel, 1992), allow links to be created for more effective search strategy.
A catalogue without a cross-reference structure for variant forms of names and subject
headings will give its users inferior service. Keyword searching is a powerful retrieval
technique, but it cannot compensate for the lack of database structure (Jamieson, Dolan
and Declerck, 1986). A combination of user terminology and CV will benefit all. For
example, though most public libraries use CV, they would benefit more readily from user
tags, since their collections are often primary popular materials (Rolla, 2009).
9. References
Badke, W. (2012). Save the subject heading. Online, 36(6), 48-50.
Bates, M. J. (2003). Research and design review: Improving user access to library catalog and partal
information. Library of Congress Bicentennial Conference on Bibliographic Control for The New
Millennium,
Bates, M. J. (1986). Subject access in online catalogs: A design model. Journal of the American
Society for Information Science, 37(6), 357-376. doi:10.1002/(SICI)1097-
4571(198611)37:6<357::AID-ASI1>3.0.CO;2-H
Drabenstott, Karen M. (1991). Online catalog user needs and behavior. Paper presented at the Proc.
of Think Tank on the Present and Future of the Online Catalog. RASD Occasional Papers, No. 9.
Chicago : American Library Association.
Gross, T., Taylor, A. G., & Joudrey, D. N. (2015). Still a lot to lose: The role of controlled vocabulary
in keyword searching. Cataloging & Classification Quarterly, 53(1), 1-39.
doi:10.1080/01639374.2014.917447
Harping, P. (2010). Introduction to controlled vocabularies: Terminology for art, architecture, and
other cultural works. Los Angeles: Getty Research Institute.
Iglesias, E., & Suellen, S. H. (2008). Topic maps and the ILS: An undelivered promise. Library Hi
Tech, 26(1), 12-18. doi:10.1108/07378830810857753
Jamieson, A. J., Dolan, E., & Declerck, L. (1986). Keyword searching vs. authority control in an online
catalog. Journal of Academic Librarianship, 12(5), 277.
Micco, M., & Popp, R. (1994). Improving library subject access (ILSA): A theory of clustering based in
classification. Library Hi Tech, 12(1), 55-66. doi:10.1108/eb047911
National Library of Medicine. (2015). Retrieved from
http://www.nlm.nih.gov/mesh/intro_preface.html#pref_rem
Nelson, S. J. (2009). Medical terminologies that work: The example of MeSH. Pervasive Systems,
Algorithms, and Networks (ISPAN), 2009 10th International Symposium on, 380-384.
doi:10.1109/I-SPAN.2009.84
10. Peters, T. A., & Kurth, M. (1991). Controlled and uncontrolled vocabulary subject searching in an
academic library online catalog. Information Technology and Libraries, 10(3), 201.
Porter, J. (2011). Folksonomies in the library: Their impact on user experience, and their implications
for the work of librarians. Australian Library Journal, 60(3), 248-255.
Reitz, J. M. (2004). Dictionary for library and information science Libraries Unlimited.
Rolla, P. J. (2009). User tags versus subject headings: Can user-supplied data improve subject access
to library collections? Library Resources & Technical Services, 53(3), 174-184.
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of
information by computer Addison-Wesley Longman Publishing Co., Inc.
Smith, C., A. (2011). Consumer language, patient language, and thesauri: A review of the literature.
Journal of the Medical Library Association, 99(2), 135-144. doi:10.3163/1536-5050.99.2.005
Svenonius, E. (1986). Unanswered questions in the design of controlled vocabularies. Journal of the
American Society for Information Science, 37(5), 331-340. doi:10.1002/(SICI)1097-
4571(198609)37:5<331::AID-ASI8>3.0.CO;2-E
Van Pulis, N., & Ludy, L. E. (1988). Subject searching in an online catalog with authority control.
College and Research Libraries, 49(6), 523-533.
White, H. (2013). Examining scientific vocabulary: Mapping controlled vocabularies with free text
keywords. Cataloging & Classification Quarterly, 51(6), 655-674.
doi:10.1080/01639374.2013.777004
Wiberley Jr, S., & Daugherty, R. A. (1988). Users' persistence in scanning lists of references. College
and Research Libraries, 49(2), 149-156.