COGENT Online |

CONTENTS |

COGENT Version 2.3 Help |

Feed-forward network boxes provide COGENT with a general two-layer feed-forward network capability. They provide an object which consists of a set of input nodes and a set of output nodes, together with a set of weighted connections between the nodes. Properties can be used to set the number of nodes in the input and output sets to any arbitrary integer. Networks are able to map input vectors (of the specified width) to output vectors, and to learn input/output correspondences (subject to the usual perceptron learning limitations).

Networks can be sent a variety of messages. If a network is sent a raw
vector (represented by a list of numbers), and the width of that vector is
equal to the width of the network's input layer, the network will transform
the input vector and produce an output vector, which will be sent off along
any send arrows leaving the network. If, on the other hand, a network receives
a signal of the form `train(InputVector, OutputVector)`, where
`InputVector` is a vector whose width is equal to that of the network's
input layer and where `OutputVector` is a vector whose width is equal
to that of the network's output layer, then the network will adjust its weight
matrix so when triggered by `InputVector`, its output more nearly
approximates `OutputVector`. Any message received by a network which is
not of one of the above two forms will result in a warning.

If a network receives several input vectors at once, then all vectors are processed (in pseudo-parallel) and a set of output vectors is generated. If a network receives several training vector pairs at once, then all pairs are used to calculate a set of weight modifications, and the average weight modifications are applied to the network. Thus, a network can be trained sequentially (by feeding it a different training pair on each cycle), or it can be trained in a parallel burst (by feeding a whole set of training pairs at once). If a network is sent both raw vectors and training data on the same cycle, the raw vectors are processed before the training takes effect.

Networks are highly configurable. Fourteen properties govern their behaviour:

**Input Width (possible values: integers in the range 1 -- 9999; default: 10)**

This parameter specifies the number of nodes in input vectors accepted by
the network.

**Output Width (possible values: integers in the range 1 -- 9999; default: 10)**

This parameter specifies the number of nodes in output vectors generated by
the network.

**Connectivity (possible values: any real number; default: 1.00)**

This determines the proportion of possible connections between input and
output nodes which are actually present in the network. A connectivity of 0.50
will mean that, on average, 1 in every 2 possible connections is actually
present. Although it is possible in principle to set this parameter to any
real number, in practice only numbers in the range 0 to 1 are sensible.

**Min Act (possible values: any real number; default: -1.00)**

This parameter specifies the minimum activation that an output node may
achieve.

**Max Act (possible values: any real number; default: 1.00)**

This parameter specifies the maximum activation that an output node may
achieve.

**Act Function (possible values: linear/sigmoidal; default: linear)**

The basic principle of network operation is that the output at each node is
the result of applying some squashing function to the sum of the weighted
inputs to that node. Weights are initially determined by further parameters
specified below, and change as the network learn. The shape of the squashing
function, however, is specified by the **Act Function** parameter. Two
basic functions are available:

- A
**linear**function employs a piece-wise linear squashing function, such that inputs below one threshold are mapped to the minimum activation, and inputs greater than a second threshold are mapped to the maximum activation. Inputs between the thresholds are mapped linearly between the activation limits. - A
**sigmoidal**function employs a non-linear squashing function based on the sigmoid or logistic equation. Low inputs are mapped to values near the minimum activation, and high inputs are mapped to values close to the maximum activation. Intermediate inputs are mapped non-linearly between the activation limits. (Note that the sigmoidal function is equivalent to the standard sigmoid function when activations range from 0 to 1 and the standard hyperbolic tangent function when activations range from -1 to +1.)

The precise activation function (in terms of its thresholds, and the
gradient of the function at intermediate inputs) is determined from the values
of **Min Act** **Max Act**, **Act Function**, **Act Slope** and
**Act Midpoint**.

**Act Midpoint (possible values: any real number; default: 0.00)**

As noted above, weighted inputs are summed before a squashing activation
function is applied. Extreme sums map to extreme activation values. This
parameter partially determines what extreme sums are by specifying what counts
as a middling sum. If the sum is equal to the activation midpoint, then the
output for that node will be the average of **Min Act** and **Max Act**.
This property is independent of the actual activation function selected.

**Act Slope (possible values: any real number; default: 1.00)**

This parameter specifies the gradient of the activation function when the
input to the function is that specified by **Act Midpoint**.

The specification of activation functions in terms of these various
parameters makes it simple to compare different activation functions.
Selecting different values for **Act Function** will preserve the basic
characteristics (in terms of steepness, and action on extreme inputs) of the
activation function, as shown in the following figure, which shows different
activation functions with equal minimum activation, maximum activation,
midpoint activation and slopes.

**Initialise (possible values: Each Trial/Each Block/Each Subject/Each Experiment/Each Session;
default: Each Trial)**

The timing of network initialisation is determined by this property. When the
value is Each Trial, the network will automatically initialise itself at the
beginning of each trial. When the value is Each Block, the network will
initialise itself at the beginning of each block of trials (i.e., weights will
be preserved across trials within a block). Similarly, when the value is Each
Subject, weights will be preserved across simulated blocks. When the value is
Each Experiment, weights will be preserved across subjects, and when the value
is Each Session, weights will be preserved across experiments.

**Learning Rule (possible values: delta/Hebbian; default: delta)**

Feed-forward networks are capable of either delta-rule learning or Hebbian
learning. The value of this parameter controls the learning algorithm employed
by any specific network.

**Learning Rate (possible values: a real number greater 0; default: 0.10)**

This parameter is used in the calculation of weight changes. In general, a
high learning rate will mean that the weight matrix responds more quickly to
input-output training pairs, but may result in the network being
insufficiently sensitive to its parts training history.

**Initial Weights (possible values: uniform/normal: default: uniform)**

This parameter governs the shape of the initial weight distribution function.
If it is set to **uniform**, then on initialisation, weights will be
randomly selected from a uniform distribution. The parameters which govern the
distribution (i.e., the minimum and maximum possible weights) are determined
by the two following properties: **Weight Parameter A** and **Weight
Parameter B**. If **Initial Weights** is set to **normal**, then on
initialisation, weights will be randomly selected from a normal distribution.
The parameters which govern the distribution (i.e., the mean and standard
deviation of the possible weights) are determined by the two following
properties: **Weight Parameter A** and **Weight Parameter B**.

**Weight Parameter A (possible values: any real number; default: -1.00)**

If **Initial Weights** is set to uniform, then this specifies the lower
limit of the weight distribution. It **Initial Weights** is set to normal,
then this specifies the mean of the weight distribution.

**Weight Parameter B (possible values: any real number; default: 1.00)**

If **Initial Weights** is set to uniform, then this specifies the upper
limit of the weight distribution. It **Initial Weights** is set to normal,
then this specifies the standard deviation of the weight distribution.

The weight matrix viewer provides access to a dynamically updated representation of the network's weight matrix. The viewer is available as a page on all windows associated with feed-forward networks, and is updated on each processing cycle.

COGENT Online |

CONTENTS |

COGENT Version 2.3 Help |