Improving the Classification Model Overview Pattern Matching


The previous pages have introduces various COGENT concepts by way of a series of models. One of the more significant issues that arose in those pages was that of representation. This page provides a more rigorous introduction to representation within COGENT. The style of the presentation is somewhat more formal than that in previous pages, but this is necessary given that precise syntactic details of COGENT's representation language.

Any information processing model requires that the information to be processed is represented in a way that the model understands. In the categorisation model developed earlier, the names of animals were represented by the corresponding English words. These words are part of the COGENT representation language, but the full representation language includes a variety of more complex structured representations, as well as representations for numbers and variables.

COGENT's representation language is borrowed from Prolog, a programming language developed in the 1970s for artificial intelligence applications. Those with a background in Prolog may skip this section.


The principal representational unit of COGENT (and Prolog) is the term. All information that is to be represented must be represented as a term. There are a number of different types of terms, allowing the representation of a number of different types of information. These include: atoms (for representing information with no internal structure); numbers (for representing numeric information); lists (for representing sequences); compound terms (for representing information with arbitrarily complex internal structure); and variables (for representing unknown or non-fixed information).


Most of the information represented in the previous exercise was represented as atoms. An atom is just a sequence of letters or other characters such as cat. There are a number of limitations on precisely what characters may occur in an atom of this form. Firstly, the atom must start with a lower case letter. The characters following this may be of either upper or lower case, and can also include digits and the underscore character ("_"). Thus dog, four_legs, and a_X3bu are all atoms. Other combinations of characters can also be made into atoms by enclosing them in single quotation marks. The following are thus also atoms: 'CAT', 'four-legs', 'A&B'.

Exercise 1: Which of the following are valid Prolog/COGENT atoms?
Answers are at the bottom of the page.

Atoms are generally used to represent atomic things. That is, to represent things that have no internal structure (or whose internal structure is not relevant). It was sensible to use atoms in the previous exercise because the categorisation process really just involved looking up the names of animals in buffers. It would not normally be appropriate, however, to use an atom to represent the meaning of sentences. For example, suppose it is necessary to represent the meanings of "Tigger is miaowing" and "Fido believes Tigger is miaowing". Although atoms could be used to represent each meaning, such a representation would lose information about the relationship between the meanings of these two sentences. As discussed below, this information can be retained by using compound terms.


Numerical information may be represented in the standard form: as either integers (e.g., 9) or real numbers (e.g., 3.14159).


Much of the power of the Prolog/COGENT representation language comes from the possibility of constructing new terms from other elements of the language. Lists are one type of term which employ this construction. Lists are generally used to represent sequences of information. Thus, a list might be an appropriate representation to use when modelling a memory span task requiring the presentation and recall of an ordered sequence of words.

A list consists of a left square bracket followed by a comma-separated sequence of terms followed by a right square bracket, e.g., [cat, elephant, fish, lion, dog, fish] (a list with six atomic elements). Lists can have any number of elements. If a list has zero elements it is known as the empty list, and is represented as [].

A second common use of lists is to represent sets of things (or even multi-sets: sets whose elements may occur more than once). This can be done by simply ignoring the sequential ordering information contained in the list representation. This, the list [cat, elephant, fish, lion, dog] may be used to represent a set of animal names (rather than a sequence of animal names) by ensuring that all references to the list ignore the position of individual elements.

Compound Terms

Compounds terms are, like lists, terms built from other terms. They are frequently used to represent structured information in which the structure is more complex than that which occurs in lists. Compound terms, for example, allow the representation of the meaning of sentences in terms of representations of the sentence parts. Thus, the meaning of "Tigger is miaowing" might be represented by the compound term miaows(tigger).

In general, a compound term consists of an atom (in the above case miaows) immediately followed by a left bracket followed by a comma-separated sequence of other terms, followed by a right round bracket. The initial atom is referred to as the compound term's functor. The sub-terms between a compound term's brackets are known as its arguments, and the number of arguments is the term's arity. Note that the comma-separated sequence of terms cannot be empty (i.e., the arity of a compound term cannot be 0), and there must not be any space between the compound term's functor and the opening round bracket. Space may be inserted freely between a compound term's arguments (or between those arguments and the commas that separate them), and should be used consistently to improve the readability of the representation.

The following are example compound terms: legs(four), has(dog, legs(4)), features(tigger, [legs(4), claws, skin(fur)]). Note how the final example consists of a term with two arguments, the second of which is a list of three elements, two of which are themselves compound terms. Highly complex representations may be built by using this structuring of terms.

Exercise 2: What are the functors and arities of the following compound terms?
animal(cat, vertebrate)
numbers([12,15,34,35,37,41], [9,13,18,32,35,43])
Answers are at the bottom of the page.


Variables allow the representation of information that is either unknown or that may vary. Variables are represented as sequences of letter, digits, and underscore characters that begin with an upper case letter or the underscore character. Thus, CAT and Rat_4 are variables. Note that variables must not have single quotes around them: a character sequence beginning with an upper case letter that is surrounding by single quotes is understood by COGENT to be an atom.

Exercise 3: Which of the following are valid Prolog/COGENT variables?
Answers are at the bottom of the page.

Variables may occur by themselves (as in some of the above), or as sub-terms of lists or compound terms. [cat, ANIMAL, fox] is a list whose second element is a variable (and whose first and third elements are atoms). Similarly, colour(cat, X) is a compound term whose second argument is a variable.

The role and use of variables within the representation language (i.e., the semantics of variables) is described in the next section, Pattern Matching.

Exceptional Terms

The above paragraphs cover most of the representational language of COGENT/Prolog, but there are two special representational short-hands that are commonly used: operators and head/tail list notation.


In the language as described above, a compound term representing a simple arithmetic expression (e.g., 3 + 4) must be written using a very clumsy notation: '+'(3, 4). This is a compound term whose functor is '+' and whose arity is 2. Operators allow certain compound terms (especially arithmetic expressions) to be written in a more readable way.

Certain pre-defined functors are understood by COGENT/Prolog to be operators. If a functor is a binary operator, then a term of the '+'(3, 4) can be written in the conventional way, as 3 + 4. Note that the brackets and the single quotes are not required.

The range of pre-defined operators varies with the specific implementation of Prolog, but commonly defined ones are the standard arithmetic operators (+, -, *, /, < and >). These operators can be used in complex expressions, and when used in such expressions they have the usual precedences. Thus, 3 + 4 * 5 is is a compound term with arity 2 and functor "+". The second argument of this term is 4 * 5, itself a compound term. Precedence can be over-ridden by using round brackets. Thus (3 + 4) * 5 is is a compound term with arity 2 and functor "*". The first argument of this term is 3 + 4, again a compound term.

Operators may be used in all kinds of terms (not just arithmetic expressions). Thus, has-fur is the same as '-'(has, fur), and a/b is the same as '/'(a, b).

Head/Tail List Notation

The notation [H|T] is often used to represent a list whose first element is H and whose remaining elements are the list T. The important things to notice here are the use of the list separator, |, which divides the list into two parts, and the fact that the second part of the list, the tail, is itself a list.

Further Information

For more information on the Prolog language, see either your Prolog manual or one of the many books on the subject, such as Clocksin & Mellish (1987) or Bratko (1986).

Exercise 4: Representing Concepts with Feature-Lists

Change the buffer representation in the categorisation model to something like:

animal(cat, [has_fur, legs(4), has_tail, has_spine, ...]

and use the built-in list processing condition member/2 (as in member(has_spine, Features)) to determine if something is a vertebrate.

Note the use of lists, the use of compound terms within lists, and the use of the underscore character in terms like has_fur.

Answers to Selected Exercises

Exercise 1:
roderick: Yes.
293: No. This is an integer, not a sequence of letters.
has-fur: No. The '-' cannot normally appear as a symbol in an atom.
X: No. This is a variable.
alpha: Yes.
y: Yes.
Mary: No. The initial upper case letter makes this a variable.
a/b: No. The '/' cannot normally appear as a symbol in an atom.
'#*%@&"': Yes. The single quotes around the sequence of characters ensures that this is an atom.
a___c: Yes.
Exercise 2:
Functor: animal; Arity: 2
Functor: disease; Arity: 1
Functor: numbers; Arity: 2
Functor: believes; Arity: 2
Exercise 3:
roderick: No. This is an atom.
293: No. This is an integer.
has-fur: No. This is actually a compound term, though it is not the standard way of writing such terms.
X: Yes.
alpha: No. This is an atom.
y: No. This is an atom.
Mary: Yes.
a/b: No. This is another compound term.
'#*%@&"': No. This is an atom.
'Paul': No. The single quotes around the sequence of characters ensures that this is an atom.

Improving the Classification Model Overview Pattern Matching