Glossary of terms relating to thesauri and other forms of structured vocabulary for information retrieval



  • Metadata tags are used to describe documents, pages, images, software, video and audio files, and other content objects for the purposes of improved navigation and retrieval.

Controlled vocabularies:

  • A controlled vocabulary is any defined subset of natural language, which may take the form of an authority file containing a list of preferred terms, or a synonym ring containing a list of equivalent terms.

Synonym rings:

  • A synonym ring connects a set of words that are defined as equivalent for the purposes of retrieval.
  • In practice, a set of words may not be true synonyms, and might include related product names and misspellings.
  • When a user enters a word into a search engine, that word is checked against the text file. If the word is found, then the query is “exploded” to include all the equivalent words.
  • However, if query term expansion happens behind the scenes, users can be confused by results that don’t actually include their keywords.
  • Synonym rings may also result in less relevant results.
    • Balance may be achieved by:
      • Using synonym rings by default but ordering exact keyword matches at the top of the results list.
      • Ignoring synonym rings for initial searches but including the option to “expand search to include related terms” if there were few or no results.

Authority files:

  • Strictly defined, an authority file is a list of preferred terms or acceptable values. It does not include variants or synonyms.
  • However, in practice, authority files are commonly inclusive of preferred and variant terms.
  • An authority file allows content authors and indexers to use approved terms efficiently and consistently.
  • A preferred term can be used as a unique identifier for each collection of equivalent terms, allowing more efficient addition, deletion, and modification of variant terms.
  • Showing preferred terms in search results can help educate users:
    • Helps to correct spellings.
    • Helps explain industry terminology.
    • Helps build brand recognition.

Classification schemes:

  • Also known as a taxonomy.
  • In Information Architecture, it is used to mean a hierarchical arrangement of preferred terms.
  • These hierarchies can take different shapes and serve multiple purposes, including:
    • A front end, browsable Yahoo-like hierarchy that’s a visible and fundamental part of the user interface.
    • A back end tool used by information architects, authors, and indexers for organising and tagging documents.
  • Example: The Dewey Decimal Classification (DDC).


  • In Information Architecture, a thesaurus takes the form of an online database, tightly integrated with the user interface of a website.
  • It is a semantic network of concepts, connecting words to their synonyms, homonyms, antonyms, broader and narrower terms, and related items.
  • A traditional thesaurus helps people go from one word to many words, whereas an IA thesaurus does the opposite.
  • Its most important goal is synonym management – the mapping of many synonyms or word variants onto one preferred term or concept – so the ambiguities of language don’t prevent people from finding what they need.
  • Textbook definition: A controlled vocabulary in which equivalence, hierarchical, and associative relationships and identified for the purposes of improved retrieval.

Technical lingo:

  • Preferred Term (PT)
  • Variant Term (VT)
  • Broader Term (BT)
  • Narrower Term NT)
  • Related Term (RT)
  • Use (U)
  • Used for (UF)
  • Scope Note (SN)

Types of thesauri:

  • 3 types:
    • Classic thesaurus
    • Indexing thesaurus
    • Searching thesaurus

Semantic relationships:

  • A thesaurus is set apart from simpler controlled vocabularies by its rich array of semantic relationships.
    • Equivalence (A = B)
      • The equivalence relationship is employed to connect preferred terms and their variants.
      • The goal is to group terms defined as equivalent for the purposes of retrieval, such as:
        • Synonyms
        • Near-synonyms
        • Acronyms
        • Abbreviations
        • Lexical variants
        • Common misspellings
        • Retired products
        • Competitors’ products
    • Hierarchical (A(B))
      • The hierarchical relationship divides up the information space into categories and subcategories, relating broader and narrower concepts through the familiar parent-child relationship.
      • There are 3 subtypes of hierarchical relationship:
        • Generic: B is a member of class A and inherits the characteristics of its parent. (E.g., Bird NT Magpie)
        • Whole-part: B is part of A. (E.g., Foot NT Big Toe)
        • Instance: B is an instance or example of A (E.g., Seas NT Mediterranean Sea)
    • Associative (A )*overlap*( B)
      • Associative relationships involve strongly implied semantic connections that aren’t captured within the equivalence or hierarchical relationships. (E.g., Hammer RT Nail)
      • Defining these relationships is a highly subjective process.
      • Used in e-commerce to connect customers to related products and services.

Preferred terms:

  • Term form guidelines:
    • Grammatical form: Use nouns – users are betters at remembering these than verbs or adjectives. However, task-orientated wordpres and some adjectives are OK.
    • Spelling: Select a “defined authority” – a dictionary, glossary, or your own house style. Might also consider the most common spelling used by your users. Consistency is the most important point about spelling.
    • Singular and plural form: Use the plural form of count nouns (e.g., cars, roads, maps). Use the singular form of conceptual nouns (e.g., math, biology). This is less important nowadays with the advent of new search technologies.
    • Abbreviations and acronyms: Default to proper use – for the most part an organisation’s preferred terms should be the full words.

Term selection:

  • Need to consider the balance between literary warrant (occurrence of terms in documents) and user warrant (terms selected to serve the needs of the majority of users).
    • Selecting the most appropriate terms requires a review of organisation goals and consideration of how the thesaurus will be integrated with the website.
      • Consider whether preferred terms should educate users about industry vocabulary.
      • Consider whether preferred terms will be the source of entry vocabulary.

Term definition:

  • Tools for managing ambiguity, aside from the selection of distinctive preferred terms:
    • Parenthetical term qualifiers
      • Provide a way to manage homographs.
      • E.g., Cells(biology), Cells(electric), cells(prison).
    • Scope notes
      • Are another way to increase specificity
      • Can appear similar to definitions, but are intended to restrict meaning to one concept, whereas definition often suggests multiple meanings.

Term specificity:

  • E.g., should “knowledge management software” be represented as 1 term, 2 terms, or 3 terms?
  • A balance must be struck based on context, especially given the size of the website at hand.
  • As content volume increases, it becomes more important to increase precision via the use of compound terms, thus preventing a situation in which there are too many results for every search and preferred term.
  • Context is also relevant in terms of the kind of website a term will be used on – e.g., “knowledge management software” might be a good term for the Knowledge Management Magazine website, but for a more generalised IT website like CNET, it would be better to use “knowledge management” and “software” as independent terms.

Polyhierarchy: Terms are allowed to be cross-listed in multiple categories.

Pure hierarchy: Each term appears in one and only one place.

Faceted classification:

  • Uses the concept of multiple taxonomies that focus on different dimensions of the content, rather than a one-taxonomy-fits-all approach.
  • Common facets in the business world include:
    • Topic
    • Product
    • Document type
    • Audience
    • Geography
    • Price

From the Willpower Information website:

a posteriori relationship

use syntagmatic relationship

a priori relationship

use paradigmatic relationship


(a) a set of sibling terms or (b) a subset of sibling terms grouped under a node label which specifies a characteristic of division
e.g. in a this extract
(vehicles by number of wheels)
– monocycles
– bicycles
– – motor bicycles
– – pedal bicycles
– tricycles
– four-wheeled vehicles
(vehicles by motive power)
– mechanically powered vehicles
– – motor bicycles
– – motor cars
– human powered vehicles
– – pedal bicycles
– hybrid human/mechanically powered vehicles
– – mopeds
the complete array of sibling terms under vehicles consists of
four-wheeled vehicles
mechanically powered vehicles
human powered vehicles
hybrid human/mechanically powered vehicles
This may be subdivided into subsets by grouping under the node labels, forming two smaller arrays:
four-wheeled vehicles
mechanically powered vehicles
human powered vehicles
hybrid human/mechanically powered vehicles.
Another array, at a lower level, under the broader term bicycles, is composed of the two sibling terms
motor bicycles
pedal bicycles
See the note under characteristic of division on the options for dealing with hybrids such as mopeds.

authority file

use authority list

authority list

use for authority file
controlled vocabulary, generally of proper names, for use in naming particular entities consistently
Separate authority lists may be maintained for different types of entity; for example there may be separate lists for personal names, organization names and geographical names. The format of names used in an authority file should be documented and preferably accord with recognised standards. An example of a personal and corporate name authority file might look like this:
British Tabulating Machine Company
merged into International Computers and Tabulators
Gates, Bill
use for Gates, William Henry
Gates, William Henry
use Gates, Bill
use International Computers Limited
use International Computers and Tabulators
International Computers and Tabulators
created by merger of British Tabulating Machine Company and Powers-Samas
subsequently International Computers Limited
International Computers Limited
formerly International Computers and Tabulators
merged into International Computers and Tabulators
Prime Computer Inc.
Science Museum (London). Library
Victory (ship)


finding information by examining lists or sequences of items, typically starting with general items and, on the basis of what has been found there, moving to more specific items


statement of the subjects represented by a notation in a classification scheme
A caption may have to be read in conjunction with its hierarchical context. It need not be as complete or as self-contained as a scope note or even a preferred term in a thesaurus.

chain index

an index to a classification scheme, in which entries are generated by successive left truncation of strings of terms representing compound concepts
e.g. in the example of citation order, a compound concept is represented by the pre-coordinated string
bicycles – tyres – punctured – repairing – instruction books
In a classification scheme arranged in this way, everything on bicycles will be grouped together, but material on tyres or instruction books will be scattered. To provide index entries to allow these scattered topics to be found, we write the string in the reverse order, and successively truncate it from the left, making an index entry for each resulting substring:
instruction books – repairing – punctured tyres – bicycles
repairing – punctured tyres – bicycles
punctured tyres – bicycles
tyres – bicycles
These entries are then arranged in alphabetical order:
instruction books – repairing – punctured tyres – bicycles
punctured tyres – bicycles
repairing – punctured tyres – bicycles
tyres – bicycles
Each of these index entries would be followed by the appropriate notation to link it to its place in the classification scheme. As the citation order of this classification determines that everything about tyres for bicycles will be grouped together in the classified sequence, it is not necessary for the index to have entries such as tyres – bicycles – punctured, or other combinations and permutations of the terms in the string. A chain index is thus more economical than a fully permuted index, in which a string of five terms would generate 120 index entries.
The mechanical method of generating a chain index described here may be modified by editorial intervention to suppress entries which are likely to be unsought, and to combine terms grammatically to make the index entries more readable; this has been done in the above example where punctured tyres has been used rather than punctured – tyres.

characteristic of division

an attribute by which a concept can be subdivided into an array of narrower concepts each having a distinct value of that attribute
e.g. In the following, “number of wheels” and “motive power” are the characteristics by which the concept of “vehicles” is divided. These are shown in the node labels (vehicles by number of wheels) and (vehicles by motive power).
The concepts in an array should be mutually exclusive, having distinct values of the characteristic of division, though lower-level concepts can occur under more than one. For example, hybrids, such as mopeds (mechanically-assisted pedal cycles) are by definition both mechanically powered and human powered. They can therefore be listed as narrower terms of both concepts, as shown below. In some cases it may be desirable to provide explictly for such hybrids, as shown in the examples under array. The scope note should clarify whether a term such as human powered vehicles is to be used for vehicles that are exclusively or partially human-powered.
(vehicles by number of wheels)
– vehicles without wheels
– monocycles
– bicycles
– – motor bicycles
– – pedal bicycles
– tricycles
– four-wheeled vehicles
– vehicles with more than 4 wheels
(vehicles by motive power)
– mechanically powered vehicles
– – mopeds
– – motor bicycles
– – motor cars
– human powered vehicles
– – mopeds
– – pedal bicycles

citation order

order in which preferred-terms or notations are combined in a pre-coordinate indexing system or a classification scheme to form strings representing compound concepts
The choice of citation order determines which concepts are the most important to be grouped together in a catalogue or list, and increases consistency in the construction of strings for similar subjects.
Citation order is usually specified in terms of the facets to which concepts belong or the roles that they play in relation to other concepts in the string. A sequence that is often appropriate, especially for technical subjects, is:
thing – kind – part – property – material – process – operation – system operated on – product – by-product – agent – space – time – form
bicycles – tyres – punctured – repairing – instruction books
(thing) – (part) – (property) – (operation) – (form)
When concepts from two different arrays within a single facet are to be combined, the citation order is normally such that the array listed later in the schedule is cited first (takes priority in grouping). Thus if wines were grouped by into two arrays: first (wines by colour) and second (wines by country), the combined concepts would be listed under the second of these, thus:
(wines by colour)
– red wines
– white wines
(wines by country)
– Australian wines
– – (Australian wines by colour)
– – red Australian wines
– – white Australian wines
– French wines
– – (French wines by colour)
– – red French wines
– – white French wines
It is possible to make additional entries under permutations of these citation orders of facets and arrays, but this not only increases the size of a catalogue but also leads to inconsistency as there is a risk that some permutations will be omitted. Some resources may be assigned one version of the complex concept and some another, so that there is not a complete list under either.


grouping together of similar or related things and the separation of dissimilar or unrelated things and the arrangement of the resulting groups in a logical and helpful sequence

classification scheme

schedule of concepts, arranged by classification
A classification scheme may also include an index.

While a thesaurus inherently contains a classification of terms in its hierarchical relationships, it is intended for specific retrieval, and it is often useful to have another way of grouping objects. This may relate to administrative distribution of responsibility for “collections” within a museum, or to subdivisions of these collections into groups which depend on local emphasis. It is also often necessary to be able to print a list of objects arranged by subject in a way which differs from the alphabetical order of thesaurus terms. Each subject group may be expressed as a compound phrase, and given a classification number or code to make sorting possible.

classified display

display of a thesaurus structure in which terms representing concepts are brought together because of the subjects to which they relate
Such a display may contain sections of hierarchical display, but may also bring together related terms from different facets, such as the people, activities and objects relating to a subject. Classified displays may include node labels containing facet names as well as node labels specifying characteristics of division. An example of a classified display is shown under node label.

coined term

a new term created to express a concept for which no suitable term exists in the required language.


the set of documents that may be accessed by a structured vocabulary, whether the items in it are collected in one place or distributed over a network

complex concept

concept that combines two or more simpler concepts
human resource management combines the idea of people with their usefulness as resources requiring management
Complex concepts are sometimes expressed in a single word, but are more often conveyed by a multi-word term.


unit of thought
The semantic content of a concept can be re-expressed by a combination of other and different concepts, which may vary from one language or culture to another. Concepts exist in the mind as abstract entities which are independent of the terms used to label them.

concept scheme

set of concepts, optionally including statements about semantic relationships between those concepts.
Thesauri, classification schemes, subject heading lists, taxonomies, terminologies, glossaries and other types of controlled vocabularies are all examples of concept schemes.

controlled vocabulary

prescribed list of terms or headings each one having an assigned meaning
Controlled vocabularies are designed for use in classifying or indexing documents and for searching them. They normally contain a unique preferred term for each concept or entity with links to that term from non-preferred terms. They may also show relationships between terms.


use preferred term


use for information resource
item that can be classified or indexed in order that it may be retrieved
This definition refers not only to written and printed materials in paper or microform versions (for example, books, journals, diagrams, maps), but also to non-printed media, machine-readable and digitized records, Internet and intranet resources, films, sound recordings, people and organizations as knowledge resources, buildings, sites, monuments, three-dimensional objects or realia; and to collections of such items or parts of such items.

entry term

use non-preferred term

enumerative classification scheme

classification scheme in which all the concepts available for use are listed in the schedules
Compare with synthetic classification scheme.

equivalence relationship

relationship between two terms that both represent the same concept
When two or more such terms are in the same monolingualthesaurus, one of them is designated a preferred term and the other(s) non-preferred term(s); the relationship is known as intra-vocabulary equivalence. When both terms are preferred terms in different thesauri, the relationship is known as cross-vocabulary equivalence.


use for fundamental facet
grouping of concepts of the same inherent category
Examples of categories that may be used for grouping concepts into facets are: activities, disciplines, people, materials, living organisms, objects, places and times. e.g.
(1) animals, mice, daffodils and bacteria could all be members of a living organisms facet;
(2) digging, writing and cooking could all be members of an activities facet;
(3) Paris, the United Kingdom and the Alps could all be members of a places facet.
Categories are normally chosen so that facets are mutually exclusive; a concept cannot then occur in more than one facet. In a classification scheme, facets may be restricted to a single discipline, such as a diseases facet in medicine, or may be common facets such as people, time, place and form, which apply across all disciplines. Facets may be subdivided into mutually exclusive subfacets.
Some writers use the term “facet” to specify the role that a concept plays in a complex concept, as well as the category to which it belongs. For example, they may say that materials can belong to “raw materials” or “products” facets, and people may be in “agents” or “patients” facets. For clarity, it is better to avoid this usage, keeping the term “facet” for fundamental categories such as “materials” or “people” and specifying roles separately. Both facets and roles are used in setting up rules for citation order.
Other writers use the term “facet” to mean “attributes” or “properties”, confusing them with characteristics of division. There may be multiple characteristics of division of concepts within a single facet, e.g. within a materials facet there may be a concept of wines, subdivided into several arrays, not mutually exclusive, each headed by a node label such as , , , and so on. Any specific wine can be listed in several of these arrays. Searching by these is better called searching by parameters or characteristics rather than by facets.

facet analysis

analysis of subject areas into constituent concepts grouped into facets

facet indicator

notational device that indicates the start of a new facet within a synthesized compound classmark
Examples of facet indicators are the 0 in the Dewey Decimal Classification, and parentheses and quotation symbols in the Universal Decimal Classification. In the past the term facet indicator has been used as synonymous with node label but that usage should be avoided, to avoid confusion.

faceted classification scheme

classification scheme in which subjects are analysed into their constituent facets
Schedules are compiled for each facet, and terms or notations from these may be combined according to prescribed rules to express a complex concept.


a collection of terms allocated to resources by users in order to categorise or index them in a way that the users consider useful
Terms in folksonomies, often called tags, are typically added in an uncontrolled manner, without any underlying structure or principles. They may be idiosyncratic, but may also use current terminology more quickly than it can be incorporated into a controlled and structured scheme.

Free text

It is highly desirable .to be able to search for specific words or phrases which occur in object descriptions. These may identify individual items by unique words such as trade names which do not occur often enough to justify inclusion in the thesaurus. A computer system may “invert” some or all fields of the record, i.e. making all the words in them available for searching through a free-text index, or it may be possible to scan records by reading them sequentially while looking for particular words. The latter process is fairly slow, but is a useful way of refining a search once an initial group has been selected by using thesaurus terms.

fundamental facet

use facet


a list of terms, together with definitions, specific to a given field of knowledge, usually presented in alphabetical order
Such terms are usually of a technical, abstruse or archaic nature. A glossary is often related to a specific document and appears as an appendix to it.

hierarchical display

display of a thesaurus structure based on broader/narrower concept relationships
In such a display narrower terms are commonly shown indented under the broader term which is their parent. Each hierarchy, starting from a “top term” contains terms from only a single facet or subfacet, so node labels containing facet names do not occur within hierachies, though they may be shown at the top of each. A hierarchical display may contain node labels specifying characteristics of division. An example of a hierarchical display is shown under characteristic of division.


one of two or more words that have the same spelling, but different meanings
e.g. The term bank could refer to a financial institution or the side of a river.


A specific term that is not included in a controlled vocabulary, but which may be assigned to a document because it is considered useful for retrieval
Identifiers are often proper names, trade names, codes, jargon and specialised terms. They should be distinguished from controlled vocabulary terms by being recorded in a separate field of a catalogue record or by being flagged in some way. Some computer systems assign a unique number or code to each concept or term for purposes of managing the vocabulary, and it may be known as a ‘concept identifier’, ‘term identifier’ or simply ‘term ID’. This type of identifier should not be confused with the usage defined here.


intellectual analysis of the subject matter of a document to identify the concepts represented in it, and allocation of the corresponding preferred terms to allow the information to be retrieved
The term “subject indexing” is often used for this concept, but within a context that does not deal with other elements such as authors or dates, “indexing” is sufficient.

information resource

use document

information retrieval

all the techniques and processes used to provide for identifying items relevant to an information need, from a collection or network of documents
Selection and inclusion of items in the collection are included in this definition; likewise browsing and other forms of information seeking.


ability of two or more systems or components to exchange information and to use the information that has been exchanged
Vocabularies can support interoperability by including relations to other semantic structures, by presenting data in standard formats and by using systems that support common computer protocols.


A word or phrase occurring in the natural language of a document that is considered significant for retrieval.
In addition to the above preferred meaning this word is also used loosely with the following two other possible meanings, which are often confused. The use of “keyword” with these meanings should be avoided.
A preferred term from a controlled vocabulary, assigned to a document
An identifier.

lead-in term

use non-preferred term

loan term

term borrowed from another language that has become accepted in the borrowing language
e.g. glasnost, gourmets

many-to-one mapping

mapping where two or more terms, notations or concepts in one vocabulary are represented by a single term, notation or concept in another vocabulary

mapping (process)

the process of establishing relationships between the terms, notations or concepts of one vocabulary and those of another

mapping (product of mapping process)

statements of the relationships between the terms, notations or concepts of one vocabulary and those of another


data that describes characteristics of a document
Metadata is essentially a catalogue record, providing (a) access points by which records of documents can be sorted or retrieved and (b) descriptive information, by which the relevance of a document can be assessed without consulting it in full. Preferred terms or notations selected during the indexing process are commonly applied as metadata elements to describe the subject of a document.


subset of a thesaurus, usually containing terms from a subject area narrower than the scope of the whole thesaurus
The UNESCO thesaurus is subdivided into seven microthesauri; the UK Archival Thesaurus, based on the UNESCO thesaurus, is also subdivided into microthesauri, which it calls “fields of knowledge”.

monohierarchical structure

hierarchical arrangement of concepts, in a thesaurus or classification scheme, in which each concept can have only one broader concept
Compare with polyhierarchical structure. In a monohierarchical structure, each concept can occur at only one place in the hierarchy and other broader term relationships have to be shown as related term relationships.
e.g. in a monohierarchical structure, the concept pianos cannot be listed as a narrower term of both keyboard instruments and stringed instruments; a choice has to be made of one of these concepts to determine its placing.

multilingual thesaurus

thesaurus using more than one language, in which each concept is represented by a preferred term in each of the languages, and there is a single structure of hierarchical and associative relationships between concepts which is independent of language.

multi-word term

term consisting of more than one word
e.g. human resource management.
Multi-word terms typically label complex concepts and are admissible in a thesaurus as preferred terms.

node label

label inserted into a hierarchical or classified display to show how the terms have been arranged
A node label contains one of two different types of information: either (1) the name of a facet or subfacet to which following terms belong (this type would be better called a “facet label”, but unfortunately this usage is not established in the literature or standards); or (2) the attribute or characteristic of division by which an array of sibling terms has been sorted or grouped.
e.g. the following classified display starts with the facet “disciplines” and changes of facet are shown by node labels of type 1, shown in parentheses. A node label of type 2 is shown in angle brackets:
– (people)
– photographic models
– – – – female photographic models
– – male photographic models
– photographers
– (operations)
– taking photographs
– developing
– printing
– (objects)
– cameras
– photographs
– – black and white photographs
– – colour photographs


use non-preferred term

non-preferred term

use for entry term; lead-in term; non-descriptor
term that is not assigned to documents but is provided as an entry point in a thesaurus or alphabetical index
A non-preferred term is followed by a reference to the appropriate preferred term or terms, e.g. hounds USE dogs


symbol or group of symbols representing a simple or compound concept
Notation may be used to sort and/or locate concepts in a pre-determined systematic order, and optionally to display how concepts have been structured and grouped. A notation can provide the link between alphabetical and systematic lists in a thesaurus and between the alphabetical index and the classified sequence of a classification scheme.
e.g., partial schedule showing notation in the left-hand column:
P200 photography
P250 – – photographic equipment
P251 – – – camera accessories
P251.3 – – – – flash guns
P251.5 – – – – tripods
P253 – – – cameras and camera components
P253.1 – – – – camera components
P253.13 – – – – – camera lenses
P253.15 – – – – – camera viewfinders
P253.2 – – – – cameras

notation system

set of symbols, with rules for combining them to create notations for concepts
This set may be any selection of numerals, upper and lower case alphabetic characters, and punctuation symbols. The larger the set of symbols on which the notation is based, the greater the number of concepts that can be represented by distinct notations of the same length.
Punctuation marks may be used in notations:
to show relationships between concepts, e.g. 53:61 “physics in relation to medicine”;
to act as facet indicators, showing that the subsequent symbols refer to a concept from a different facet, e.g. 61(94) “medicine in Australia”;
to show where the notation may be abbreviated if desired, e.g. ; 641.5’68, “cooking for special occasions”, may be abbreviated to 641.5, “cooking”;
to break up long strings to make them easier to read, e.g the period and spaces in 635.977 138 9 “fertilisers for flowering trees” or KVQ EOM MUR “unemployment in rural communities in India”.

one-to-many mapping

mapping where a single term, notation or concept in one vocabulary is represented by two or more terms, notations or concepts in another vocabulary

one-to-one mapping

mapping where a single term, notation or concept in one vocabulary is represented by a single term, notations or concepts in another vocabulary
The representations in the two vocabularies may or may not be identical.


specification of the concepts of a domain and their relationships, structured to allow computer processing and reasoning
As the nature of the relationships can be specified as part of the ontology, many more types of relationship are possible than in a thesaurus.

orphan term

a preferred term that has no hierarchical relationships

paradigmatic relationship

use for a priori relationship; semantic relationship
relationship between concepts which is inherent in the concepts themselves
Such relationships are shown in a structured vocabulary, independently of any indexed document.

parametric searching

searching for concepts with specific values of characteristics of division
e.g. searching for wines for which the colour is red and the alcohol content is from 5% to 10%.
This type of search is for concepts that occur within one or more arrays of a single facet, e.g. narrower terms of wine in a “materials” facet grouped under the node labels (wine by colour) and (wine by alcohol content). In some systems it is possible to search for a range of values rather than just for specific values.
It is to be distinguished from searches for compound concepts which may be made up of concepts from different facets, such as wine from a “materials” facet combined with red colour and alcohol content from a “properties” facet.

polyhierachical structure

hierarchical arrangement of concepts, in a thesaurus or classification scheme, in which each concept can have more than one broader concept
Compare with monohierarchical structure. In a polyhierarchical structure, a single concept can occur at more than one place in the hierarchy. Its attributes and relationships, and specifically its scope note and its narrower and related terms, are the same wherever it occurs.
e.g. in a polyhierarchical structure, pianos may be listed as a narrower term of both keyboard instruments and stringed instruments.

post-coordinate indexing

system of indexing in which the subject of a document is analysed into its constituent concepts by an indexer but the preferred terms so allocated are not combined until they are selected by a user at the search stage
e.g. when using post-coordinate indexing, a manual on bicycle repair might be assigned the three separate preferred terms
instruction books
Someone searching for such a manual would compose a search statement such as (bicycles AND repairing AND instruction books). The document would also be retrieved by a search for (bicycles AND instruction books) or for any one or more of the preferred terms. Compare pre-coordinate indexing.
measure of retrieval performance defined by R/T, where R is the number of relevant items retrieved and T is the total number of items retrieved

pre-coordinate indexing

system of indexing in which the preferred terms allocated to a particular document are syntactically combined in one or more sequences representing the only combinations available for retrieval purposes
e.g. when using pre-coordinate indexing, a manual on bicycle repair might be assigned the indexing string made up of three preferred terms in combination:
bicycles – repairing – instruction books
This brings all aspects of repairing bicycles together in a catalogue or browsing list, and might be followed by
bicycles – repairing – tools
There would be no direct alphabetical access to this subject under repairing, instruction books, or tools. This does not mean that the individual concepts within a pre-coordinated string cannot be searched for separately, either as controlled preferred terms or as free text, but such methods are not part of the pre-coordinate indexing system. Compare post-coordinate indexing.

preferred term

use for descriptor
term specified by a controlled vocabulary for use to represent a concept when indexing.
e.g. schools; school uniform; costs of schooling; teaching.
A preferred term should preferably be a noun or noun phrase.


one of two or more terms whose meanings are generally regarded as different in ordinary usage but which may be treated as labels for the same concept, for the purposes of a given controlled vocabulary
e.g. diseases, disorders; earthquakes, earth tremors


measure of retrieval performance defined by R/N, where R is the number of relevant items retrieved and N is the total number of relevant items in the collection


terms, notations, cross-references and scope notes set out to exhibit the content and structure of a structured vocabulary

scope note

note which defines or clarifies the meaning of a concept as it is used in the structured vocabulary
A preferred term used to label a concept may have several meanings in normal usage. A scope note may restrict the concept to only one of these meanings, and may refer to other concepts that are included or excluded from the scope of the concept being defined.

search thesaurus

vocabulary intended to assist searching even though it has not been used to index the documents being searched
Search thesauri are designed to facilitate choice of terms and/or expansion of search expressions to include terms for broader, narrower or related concepts, as well as synonyms. Optionally, a normal thesaurus may be used as a search thesaurus.

semantic network

actual or virtual graphical representation of concepts and the relationships between them
A semantic network is a way of representing an ontology. The vertices of the network represent concepts and the edges represent semantic relationships between them. The vertices are sometimes called “nodes”, which are not to be confused with the node labels of a thesaurus or a faceted classification.

semantic relationship

use paradigmatic relationship

sibling term

one of two or more terms with the same immediate broader term
e.g. In the following, biology, chemistry, geology and physics are sibling terms. So are nuclear physics and quantum physics.
– biology
– chemistry
– – analytical chemistry
– geology
– physics
– – nuclear physics
– – quantum physics

source vocabulary

language or vocabulary which serves as a starting point when seeking a corresponding term in another language or vocabulary
When working with two vocabularies, the source vocabulary for one concept may be the target vocabulary for another concept.


capability of a structured vocabulary to express a subject in depth and in detail
Specificity has an important influence on retrieval performance, as it determines the accuracy with which concepts may be pinpointed, and consequently the facility to exclude unwanted documents.


sequence of preferred terms representing a compound concept in a pre-coordinate indexing system
structured vocabulary
set of terms, headings or concept codes and their inter-relationships which may be used to support information


A structured vocabulary may also be used for other purposes. In the context of information retrieval, the vocabulary should be accompanied by rules for how to apply the terms.


a subdivision of a facet, based on inherent categories
Subfacets, like facets, should be defined so that they are mutually exclusive. For example, an “agents” facet might be subdivided into “individuals” and “organisations” subfacets; an “activities” facet might be subdivided into transitive “actions” and intransitive “processes” subfacets.
Some writers use the term “subfacet” as synonymous with array, or with the slightly broader meaning of the whole subtree of concepts grouped under a node label showing a characteristic of division, rather than just the first level array of sibling terms. I suggest that it should not be used with these meanings, as the intuitive meaning is “a subdivision of a facet” and we already have the terms “array” and “subtree” for the other meanings.

subject heading list

use subject heading scheme

subject heading scheme

use for subject heading list
controlled vocabulary comprising single terms available for subject indexing, plus rules for combining the single terms in strings
The principles for constructing subject heading lists differ from the principles of thesaurus construction. Subject heading lists may have provision for the construction of pre-coordinated indexing strings including headings and one or more levels of subheading.

Subject retrieval techniques

A thesaurus is an essential component for reliable information retrieval, but it can usefully be complemented by two other types of subject retrieval mechanism, like Classification schemes and Free Text


one of two or more terms whose meanings are considered to be the same in a wide range of contexts
Abbreviations and their full forms may be treated as synonyms. e.g. HIV, human immunodeficiency virus; guarantees, warranties

synonym ring

group of terms that are considered equivalent for the purposes of retrieval.
Synonym rings are particularly useful in search thesauri, used for searching unindexed material, where a search for any one of the terms in the ring can retrieve occurrences of any of the terms in the ring.

syntactic relationship

use syntagmatic relationship

syntagmatic relationship

use for a posteriori relationship; syntactic relationship
relationship between concepts that exists only because they occur together in a document being indexed
Such relationships are not generally valid in contexts other than the document being indexed, and therefore they do not form part of the structure of a thesaurus.

synthetic classification scheme

classification scheme in which users can synthesize terms or notation for complex concepts from lists of simpler concepts
Compare with enumerative classification scheme.

target vocabulary

language or vocabulary in which a term is sought corresponding to an existing term in a source language or


When working with two vocabularies, the target vocabulary for one concept may be the source vocabulary for another concept.


monohierarchical classification of concepts, as used, for example, in the classification of biological organisms
The above definition is a personal opinion; the definition proposed in BS8723-3 is “structured vocabulary using classificatory principles as well as thesaural features, designed as a navigation tool for use with electronic media”. The term is used loosely to mean various types of classification schemes, subject heading lists or thesauri, particularly when applied to the indexing of Internet resources. In my opinion this use should be avoided because of its vagueness and uncertainty; when a non-specific meaning is intended, concept scheme or controlled vocabulary should be used instead.


word or phrase used to label a concept
Terms in a thesaurus can be either preferred terms or non-preferred terms.


controlled vocabulary in which concepts are represented by preferred terms, formally organized so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms
The purpose of a thesaurus is to guide both the indexer and the searcher to select the same preferred term or combination of preferred terms to represent a given subject.

topic map

concept scheme conforming to the specification given in the international standard ISO/IEC 13250 : Topic maps
ISO/IEC 13250 gives the following three definitions for “topic map”:
a) A set of information resources regarded by a topic map application as a bounded object set whose hub document is a topic map document conforming to the SGML architecture defined by this International Standard.
b) Any topic map document conforming to the SGML architecture defined by this International Standard, or the document element (topicmap) of such a document.
c) The document element type (topicmap) of the topic map document architecture.
The introduction to ISO/IEC 13250 says: “In general, the structural information conveyed by topic maps includes:
– groupings of addressable information objects around topics (‘occurrences’), and
– relationships between topics (‘associations’)”.

vocabulary control

restriction of choice of indexing terms to those in a specified list
This restriction increases the likelihood of indexers and searchers choosing the same term to label a concept.

From the Getty Glossary and Bibliography for Vocabularies

This is a selection:

access point
An entry point to a systematic arrangement of information, specifically an indexed field or
heading in a work record, vocabulary record, or other content object that is formatted and
indexed in order to provide access to the information in the record.

alphanumeric classification scheme
A set of controlled codes (letters or numbers, or both letters and numbers) that represent
concepts or headings and generally have an implied taxonomy that can be surmised from the
codes (for example, the Dewey Decimal System number 735.942). See also chain indexing.
alternate descriptor (ALT)
A variant form of a descriptor available for use; usually a singular form or a different part of
speech than the descriptor (for example, lithograph is an alternate descriptor for the plural
descriptor, lithographs). The relationship indicator for this type of term is ALT.

associative relationship
In a thesaurus, the relationship between concepts that are closely related conceptually, but the
relationship is not hierarchical because it is not whole/part or genus/species. The relationship
indicator for this relationship is RT (for related term). See also equivalence relationship and
hierarchical relationship.
asymmetric relationship
In the context of a thesaurus, refers to a reciprocal relationship that is different in one direction
than it is in the reverse direction, for example BT/NT. See also symmetric relationship.
authoritative source
A published source that is based on reliable documentary evidence that is accepted as true by
most experts, and used as a standard source in a given discipline.
authority file
Also called simply an authority. A file, typically electronic, that serves as a source of
standardized forms of names, terms, titles, etc. Authority files should include references or links
from variant forms to preferred forms. The main purpose of an authority is to enforce usage,
often requiring users to use only the preferred term for a given concept. Any type of vocabulary
can be used as an authority. See also controlled vocabulary and local authority.
authority heading
A preferred, authorized heading used in a vocabulary, particularly a bibliographic authority file,
typically including a string of names or terms with additional information as necessary to allow
disambiguation between identical headings (for example, United StatesUHistoryUCivil War,
1861-1865UBattlefields and United StatesUHistoryUCivil War, 1861-1865UCampaigns). The
types of authority headings used by the U.S. Library of Congress are subject authority headings,
name authority headings, title authority headings, name/title authority headings, and keyword
authority headings. See also heading.

automatic indexing
In the context of online retrieval, indexing by the analysis of text or other content using
computer algorithms. The focus is on automatic methods used behind the scenes with little or no
input from individual searchers, with the exception of relevance feedback. The results tend to be
broad and imprecise, as contrasted to human indexing.

broader term (BT)
Also called a broader context. A vocabulary record to which another record or multiple records
are subordinate in a hierarchy. In thesauri, the relationship indicator for this type of term is BT.
Variations on the notation include BTG, (broader term generic), BTP (broader term partitive), BTI
(broader term instance), BT1 (broader term level 1), BT2 (broader term level 2), etc.
The process whereby a user of a system or Web site visually scans and maneuvers through
navigation lists, results lists, hierarchical displays, or other content in order to make a selection,
as contrasted to the user entering a search term in a search box.

In the context of this book, the person who records information in records for works. See also
indexer and end user.
In the context of this book, the process of describing and indexing a work or image, particularly
in a collections management system or other automated system. Cataloging involves the use of
prescribed fields of information and rules (for example, the rules described in CCO and CDWA).
cataloging rules
See editorial rules.
cataloging tool
A system that focuses on content description and labeling output (for example, wall labels or
slide labels), often part of a more complex collection management system.
chain indexing
Also called chain procedure. A technique for indexing that uses a numeric or alphanumeric
classification system where the entries have meaning beyond simple numeric sequencing, such
as the Dewey Decimal System (for example, in Dewey number 735.942, 735 means sculpture
after the year 1400 CE, 9 means geographic area, 4 means Europe, and 2 means England).
See narrower term.
In the context of this book, the process of arranging works or other content objects
systematically in groups or categories of shared similarity according to established criteria and
using terms to identify the classes.
classification notation
In a vocabulary, a numeric, alphabetic, or alphanumeric code in a system of codes used to
classify or categorize entries; may be used in a hierarchical arrangement to impose a display or
sorting order on the lines or levels in the hierarchy (for example, V, V.PC, V.PE). See also
classified display
See hierarchical display

controlled list
A simple list of terms used to control terminology. In a well-constructed controlled list, the
following should be true: each term must be unique; terms should all be members of the same
class; terms should not be overlapping in meaning; terms should be equal in
granularity/specificity; and terms should be arranged alphabetically or in another logical order. A
type of controlled vocabulary.
controlled vocabulary
An organized arrangement of words and phrases used to index content and/or to retrieve content
through browsing or searching. A controlled vocabulary typically includes preferred and variant
terms and has a limited scope or describes a specific domain.
co-occurrence mapping
Also called co-occurrence clustering. An automated method of compiling groups of terms that
tend to occur together in certain contexts and are therefore presumed to be related in some way;
the resulting groups of terms are considered to be loosely related and may be used to
automatically broaden a userds search or to suggest alternative search terms to users in order to
improve search results. See also automatic indexing.

database index
Also called a data index. A particular type of data structure that improves the speed of
operations in a table by allowing the quick location of particular records based on key column
values. Indexes are essential for good database performance. The concept is distinguished from
indexing (human indexing) and automatic indexing

direct mapping
In the context of interoperability of vocabularies, refers to the matching of terms one-to-one in
two controlled vocabularies. While the vocabularies need not be the same size (one may be
smaller or larger) or cover exactly the same content, where overlap exists, there should be the
same meaning and level of specificity between the two terms in each controlled vocabulary. See
also switching.

displayed index
An index that is visible and available to end-users for browsing. See also non-displayed index.

entry array
A type of display, often used for headings, in which any two or more entries that have the same
broader heading (for example, Religious artUAncient Egyptian, Religious artUChristian, Religious
artUHindu, etc.) are grouped together vertically under the broader heading. While this is not a
true hierarchical display, it may resemble a hierarchical display through use of indentation.
equivalence relationship
In a thesaurus, the relationship between synonymous terms or names for the same concept,
typically distinguishing preferred terms (descriptors) and non-preferred terms (variants or UFs).
See also associative relationship and hierarchical relationship.

extension vocabulary
A thesaurus that is created with the intention of, or is later adapted for, linking to another
vocabulary that is larger, broader, or more generic; it is typically linked through node linking,
rather than being integrated at many points in the original vocabulary. See also node linking,
satellite vocabulary, and microcontrolled vocabulary.

A neologism referring to an assemblage of concepts, which are represented by terms and names
(called tags) that are compiled through social tagging, generally on the Web. A folksonomy
differs from a taxonomy in that it is not structured hierarchically, and the authors of the
folksonomy are typically the casual users of the content rather than professional indexers
following standard protocols and using standardized controlled vocabularies.

generic structure
A display format for a thesaurus in which all hierarchical levels are displayed by using
indentation, codes, or punctuation marks. See also flat format.

group-level cataloging
Describing and assigning indexing terms for a group of works as a whole, typically focusing on
the most important or most frequently occurring characteristics in the items of the group. See
also item-level cataloging.
guide term
Also called a node label. A record represented by a term or phrase that is created as a
hierarchical level to provide order and structure to thesauri by grouping narrower terms
according to a given logic. Guide terms are not used for indexing and are often enclosed in
angled brackets or otherwise distinguished from other terms in displays.

Also called a label. A string of words comprising a term combined with other information that
serves to modify, disambiguate, amplify, or create a context for the main term in displays.
Examples include the listing of qualifiers and/or broader contexts for terms (for example, rhyta
(, containers)), place types and administrative broader
contexts for place names (for example, Dayr al-Bahri (deserted settlement) (QinŅ governorate,
Egypt)), or biographical information for peopleds names (for example, Francesco Aliunno (Italian
calligrapher, active 15th century)). See also subject heading list, name authority, and
authority heading.

hierarchical display
Also called a systematic display or classified display. In a thesaurus, a graphic arrangement
of terms showing broader/narrower relationships through the use of indentation, codes, or
another method.

hierarchical relationship
The broader and narrower (parent/child) relationship between two entities in a thesaurus,
namely whole/part (for example, Montréal is part of Québec), genus/species (for example,
bronze is a type of metal), or instance relationships (for example, Montréal is an instance of a
city). It is the basic structure that creates a hierarchy.
An organization of records related by levels of superordination and subordination. Each record in
the hierarchy, except the root, is a narrower context of the record above it. See also sub-facet,
polyhierarchy, and monohierarchy.

Also called indention. In the context of printing or other displays of typed words or texts, refers
to the white or blank space of a fixed width on a row or rows along the right or left margin of a
display, as commonly used to indicate the first line in a new paragraph of text. Right-hand
graduated indentation is used to indicate relationships between parents and their descendents in
hierarchical displays of thesauri.
A person who assigns indexing terms for a work or image, typically the same person as the
cataloger. See also cataloger.
indexer thesaurus
A thesaurus designed to control terminology and guide indexers in the choice of terms. See also
end-user thesaurus.
Also called human indexing. In the context of this book, the process of evaluating information
and designating indexing terms by using controlled vocabulary that will aid in finding and
accessing the cultural work record. Refers to indexing done by human labor, not to the automatic
parsing of data into a database index, which is used by a system to speed up search and

keyword index
An index based on individual words (keywords) found in a vocabulary term, text, or other content

See heading.

A set of correspondences between terms, fields, or element names, used for translating data
from one standard or vocabulary into another, or as a means of combining terms or data for
search and retrieval.

A structured set of descriptive elements used to describe a definable entity. This data may
include one or more pieces of information, which can exist as separate physical forms. In the
context of art information, metadata could include includes data associated with information
about the creation, physical characteristics, history, location, administration, or preservation of
the work.

microcontrolled vocabulary
Also called a microthesaurus. A controlled vocabulary that is limited in the range of topics
covered, but fits within the domain of a larger, broader, or more generic controlled vocabulary. It
typically contains highly specialized terms that are not necessarily in the broader controlled
vocabulary, but that map to the hierarchical structure of the broader controlled vocabulary. See
also satellite vocabulary and extension vocabulary.

In a compound term or name, the adjectival component that modifies the noun (for example,
flying in flying buttresses; Mount in Mount Etna). See also focus.
A hierarchy in which each child has only one immediate parent. Distinguished from a
Expressed in a single language, as distinguished from multilingual. In a monolingual thesaurus,
the terms and names are expressed in only one language.

Expressed in more than one language, as distinguished from monolingual. In a multilingual
thesaurus, terms and other information may be expressed in more than one language.
name authority
An authority containing proper names, most often personal names. See also subject authority
narrow results
To adjust criteria in a search in order to retrieve a smaller number of more precise results that
better match the intention of the searcher.
narrower term
Also called narrower context. A record to which another record or multiple records are
superordinate in a hierarchy (for example, Brewster chair is a narrower term to armchair). In
thesauri, the relationship indicator for this type of term is NT. Variations on the notation include
NTG, (narrower term generic), NTP (narrower term partitive), NTI (narrower term instance), NT1
(narrower term level 1), NT2 (narrower term level 2), etc.
natural language
Spoken or written texts, as distinguished from fielded data and controlled vocabulary.
natural order form
In the context of a controlled vocabulary, the form of a multiple-word name or term, where the
name or term is not inverted (as may be appropriate for an index) but appears in the form that
would be used in speech or a written text (for example, Christopher Wren or flying buttresses).
See also inverted form.

In the context of a thesaurus, any point or record in the hierarchy that is a location at which a
branch or individual record (leaf) is attached; thus, the basic conceptual unit used to build
node label
See guide term.
node linking
Also called leaf linking. In the context of combining multiple vocabularies, a method that uses
various nodes in the hierarchical structure of a source controlled vocabulary to link to more
detailed controlled vocabularies that are applicable to a single node of the parent hierarchy. The
vocabulary linked to a broader vocabulary in this way is often called an extension vocabulary.
non-displayed index
An machine-readable index that is not displayed for browsing or other direct access of end-users,
but is used behind the scenes to improve accuracy or speed in search and retrieval. Such indexes
may be created beforehand or on the fly at the time of the query. See also displayed index.

non-preferred term
Also called a non-preferred name. Any term in a vocabulary record that is not the preferred
term, which is the term flagged as preferred for use as default in displays.

For a thesaurus, the alphabetic code used to express term types (D, ALT, UF), associative
relationship (RT), hierarchical relationships (BT, NT, BTG, NTG, BTP, NTP, BTI, NTI, BT1, BT2,
NT1, NT2), and scope notes (SN), among others. See also classification notation.

A formal, machine-readable specification of a conceptual model, in which concepts, properties,
relationships, functions, constraints, and axioms are all explicitly defined. While an ontology is not
technically a controlled vocabulary, it uses one or more controlled vocabularies for a defined
domain and expresses the vocabulary in a representative language that has a grammar for using
vocabulary terms in an automated way to express something meaningful.

permuted index
A type of index where individual words of a term are rotated to bring each word of the term into
alphabetical order in the term list. See also inverted form.

pick list
A user interface feature that allows the user to select from a pre-set list of terms, typically used
to control vocabulary for indexing or to provide options in a query. A pick list is generally
populated with a controlled list.
A thesaurus in which any record may be linked to multiple parent records.

A measure of a search system’s effectiveness in terms of retrieving only relevant results,
expressed as the ratio of relevant records or documents retrieved from a database to the total
number retrieved in response to the query. A high-precision search means that most of the
results retrieved will be relevant; however, a high-precision search will not necessarily retrieve all
relevant results. Recall and precision are inverse ratios. When one goes up, the other goes down.
See also recall.

preferred term
Also called a preferred name. The term designated among all synonyms or lexical variants for a
concept to be used as the default term to represent the concept in displays and other situations.
In a monolingual thesaurus, the preferred term is also the only descriptor in the record. In a
multilingual thesaurus, there may be a descriptor for every language, but there is often only one
preferred term for the record as a whole. See also descriptor.

A word or phrase used to distinguish a term in a vocabulary from otherwise identical terms that
have different meanings. A qualifier is separated from the term, generally displayed within
parentheses. It is also called a gloss, although strictly speaking a qualifier should be used only
with homographs, and a gloss has a more general meaning in the field of linguistics. See also

query expansion (QE)
Reformulating a query in order to return a broader or more comprehensive set of results (for
example, adding synonyms to the userds search term expands a query).
A measure of a search system’s effectiveness in terms of retrieving all results that are possibly
relevant, expressed as the ratio of the number of relevant records or documents retrieved over
all the relevant records or documents. A high recall search retrieves a comprehensive set of
relevant results; however, it also increases the likelihood that marginally relevant content objects
will also be retrieved. Recall and precision are inverse ratios. See also precision.

related term (RT)
A concept that is associatively (not hierarchically) linked to another concept in a thesaurus. In
thesauri, the relationship indicator for this type of term is RT. See associative relationship.
relational table database
Also called a relational database. A database in which data is organized into columns and rows
according to specific defined relationships (for example, in a vocabulary database, a table of
terms may be linked to a table for languages).

The extent to which information retrieved in a search is judged by the user to meet the criteria of
the query.

results list
The records or other data retrieved in response to a query and presented on line or in a system
in an organized display.

Also called root node or top term. The highest level of the hierarchy, from which all branches
descend. In thesauri, the relationship indicator for this type of term is TT, for top term.
rotated listing
See permuted display.
satellite vocabulary
A thesaurus that is created with the intention of, or is later adapted for, linking to another
vocabulary that is larger, broader, or more generic; it may be integrated at many points in the
original vocabulary. See also node linking, extension vocabulary, and microcontrolled
Also called a scheme. In the context of this book, the organization, structure, and rules for a set
of data (for example, the set of tables, views, indices, and descriptions for columns in a
database, or the organization and description of an XML document).

scope note (SN)
A note explaining coverage, specialized usage, and meaning of terms. In thesauri, the
relationship indicator for this note is SN.
see reference
A type of cross-reference, usually in a printed index, directing the reader from a non-preferred
term or subject heading to the preferred term or subject heading for the same concept. The term
or subject heading at the see reference is a synonym for the preferred term or heading.
see also reference
A type of cross-reference, usually in a printed index, directing the reader to a related term or
entry. A see also reference differs from a see reference in that the see also reference is not made
between synonyms, but between terms or headings that are more peripherally related.
semantic linking
A method of linking terms in a vocabulary or larger database according to the meaning of the
terms and relationships between terms.
semantic relationship
See paradigmatic relationship.

A concept that shares the same immediate broader context (one level higher) as other concepts.
Siblings are subordinate to the same broader concept and that are at the same hierarchical level.
single-to-multiple term equivalence
In the context of mapping terms from different vocabularies to each other, the situation that
occurs when a term in one vocabulary has no direct match in the second vocabulary, but instead
must be mapped to a combination of terms.
social tagging
The decentralized practice and method by which individuals and groups create, manage, and
share tags (terms, names, etc.) to annotate and categorize digital resources in an online “social”
In the context of this book, the automated process of organizing a results list, data elements in a
record, or other data, in a particular sequence based on established criteria or attributes of the
data, for example alphabetically, by parent string, or by an associated date. There may be
primary sort criteria and secondary sort criteria (for example, first sort place names in a results
list alphabetically, and then U for homographs in the list U sort by parent string).

In the context of building vocabularies, a citable reference to a term in the literature that helps
establish its form, spelling, usage, and meaning. See also literary warrant.
source authority
In the context of this book, a bibliographic authority file used to control the citations providing
warrant for terms in a vocabulary or information in a work record.
source language
In the context of translating or mapping one vocabulary to a vocabulary in another language, the
language of the original vocabulary. See also target language.
specialized vocabulary
See microcontrolled vocabulary.

In the context of indexing, the degree of precision or granularity used in assigning terms.
Measures of greater specificity include the use of the narrowest applicable indexing term rather
than a broader, more generic term. See also exhaustivity.

A vocabulary, set of rules, code of practice, or description of characteristics and parameters, that
is documented, established by experts or approved by an authoritative body, and widely
recognized or employed as an authoritative exemplar of correctness or best practice; used within
a discipline or domain in order to promote interoperability and efficiency.

In the context of mapping terms for search and retrieval, the alteration of a term by
automatically truncating or removing common suffixes, word endings, or prefixes in order to find
a match, usually applied to sets of related words that are derived from a common root and
appear in a variety of grammatical forms (for example, paint, painting, painted).

stop list
In the context of search and retrieval, words in a vocabulary or target data that are ignored in
searching or matching because they occur too frequently or are otherwise of little value in
retrieval for a given domain. Common stop lists for a text contain articles, conjunctions, and
prepositions, although these words are typically not included in a stop list for a vocabulary.
string syntax
Also called string indexing. The creation of headings by computer algorithm, characterized by
headings that are more consistent than the typically idiosyncratic headings created by hand (for
example, the automatic concatenation of a parent string in a heading for a geographic place, San
Gimignano (Siena province, Tuscany, Italy)).
See data structure.
A major conceptual division of a thesaurus that is located near the top of the tree but under a
facet. Also called a hierarchy in the AAT, although hierarchy has a more general meaning as
In the context of this book, the focus concept of a vocabulary record (for example, the subject of
a ULAN record is a person). Also used to refer to the subject matter (often iconographical
content) of what is depicted in or by a work of art or the content of a text.
subject heading list
An alphabetical list of words or phrases used to indicate the content of a text or other thing;
characterized by precoordination of terminology, meaning that several unique concepts are
combined in a string (for example, Archaeology and artUChinaUHistoryU20th century). A type
of controlled vocabulary. See also authority heading.
subject indexing
A term typically used in the context of bibliographic cataloging but also applicable to cataloging
art, referring to the application of indexing terms to the content of the document, as contrasted
to a description of its physical characteristics.

syndetic structure
Also called cross-reference links. In the context of a vocabulary, refers to the linking of
equivalent, broader, narrower, and other related terms so that they can be used as cross
references to each other and to related headings for the purpose of access.
A term having a different form, but exactly or very nearly the same meaning as another term.
See also true synonymy and near-synonymy. Compare lexical variant.
synonym ring list
A type of controlled vocabulary containing terms that are considered equivalent for the purposes
of retrieval, but not necessarily having true synonymy.
A type of semantic relation in which two words or terms have the same or very similar meaning.
See also true synonymy and near synonymy.
In the context of this book, the structure of elements in a compound term or name (for example,
last name first, comma, first name, middle initial) or heading; also used to refer to the structure
of elements in a search query (for example, rules for the placement of Boolean operators OR,
AND, or NOT between terms), analogous to the linguistic structure of elements in a sentence.
synthesis note
A brief preliminary finding, example, or recommendation; this expression was used in the original
print publication of the AAT to refer to bottom-of-page notes throughout each sub-facet (or
hierarchy) that suggested ways in which descriptors from that sub-facet could be combined in
postcoordination with other descriptors (these recommendations are now found in the AAT
Editorial Manual).

A classification organized into a hierarchical structure and applicable for a defined domain. Often
used to refer to the classification of living organisms according to physical characteristics, but the
term and principles can be applied to classification in any discipline. Unlike thesauri, taxonomies
typically do not include synonyms and associative relationships.

A word or group of words representing a single concept; a vocabulary record comprises terms
and other information, including relationships, scope note, sources, etc. Additionally, in the
jargon of thesaurus construction, the word term is often used as shorthand to actually refer to
the concept that is represented by that term (for example, BT and NT actually refer to the
relationships between concepts). The distinction between a term in the strict sense and term
meaning a record must often be inferred from the context of the discussion.
term frequency (TF)
An automatic ranking method often used in a formula with inverse document frequency in
information retrieval and text mining to measure how important a term is to a set of data and
how useful it will be in retrieval.
term record
In the jargon of thesaurus construction, the collection of information associated with a descriptor,
including the history of the term, its relationships to other terms and records, etc. In this book, it
is referred to as a record (or a concept record), in order to distinguish it from the information
that is actually associated only with the term table in a relational database model (for example,
language of the term, contributor of the term).

A controlled vocabulary arranged in a specific order and characterized by three relationships:
equivalence, hierarchical, and associative. Thesauri may be monolingual or multilingual. Their
purposes are to promote consistency in the indexing of content and to facilitate searching and
top term
See root. In thesauri, the relationship indicator for this type of term is TT.

tree structure
A controlled vocabulary display format in which the complete hierarchy of records is shown or
accessible by clicking. The tree structure may be constructing by assigning a tree number or line
number to each record, or by another method. See also systematic display.
true synonymy
The characteristic of terms or names that have meanings that are identical or as nearly identical
as is possible with language. The purpose of enforcing true synonymy in a vocabulary is to
increase precision in indexing and retrieval. See also near synonymy.
In searching and matching, the action of cutting off characters in a search term in order to find
all terms with a certain common string of characters; it typically involves the user employing a
wildcard symbol to search for a string of characters no matter what other characters follow (or
sometimes, precede) that string (for example, searching for arch* will retrieve arch, arches,
architrave, architecture, architectural history, etc.).

unique identifier
A number or other string that is associated with a record or piece of data, exists only once in a
database, and is used to uniquely identify and disambiguate that record or piece of data from all
others in the database.

used for term
Also called a UF. In thesaurus jargon, a term that is not a descriptor and not an alternate
descriptor. If the thesaurus is being used as an authority, a used for term is not authorized for
indexing. Used for terms typically comprise spelling or grammatical variants of the descriptor or
have true synonymity with the descriptor.
See end user.

user warrant
Justification for a term in a controlled vocabulary based on the frequency of user queries that
employ the term. User warrant may be used for terms intended for retrieval, but is typically not
sufficient warrant for posting a term in a thesaurus used for indexing. See also literary warrant
and organizational warrant.

variant term
In a vocabulary, a term that is not the preferred term but refers to the same concept, including
used for terms and alternate descriptors.

vocabulary control
The process of enforcing the use of certain terminology with the goal of providing consistency
and improving retrieval.
See controlled vocabulary.


Leave a Reply

Your email address will not be published. Required fields are marked *