COGENT Online
CONTENTS
COGENT Version 2.3 Help

Network/Feed-Forward

Introduction

Feed-forward network boxes provide COGENT with a general two-layer feed-forward network capability. They provide an object which consists of a set of input nodes and a set of output nodes, together with a set of weighted connections between the nodes. Properties can be used to set the number of nodes in the input and output sets to any arbitrary integer. Networks are able to map input vectors (of the specified width) to output vectors, and to learn input/output correspondences (subject to the usual perceptron learning limitations).

Training and Testing

Networks can be sent a variety of messages. If a network is sent a raw vector (represented by a list of numbers), and the width of that vector is equal to the width of the network's input layer, the network will transform the input vector and produce an output vector, which will be sent off along any send arrows leaving the network. If, on the other hand, a network receives a signal of the form train(InputVector, OutputVector), where InputVector is a vector whose width is equal to that of the network's input layer and where OutputVector is a vector whose width is equal to that of the network's output layer, then the network will adjust its weight matrix so when triggered by InputVector, its output more nearly approximates OutputVector. Any message received by a network which is not of one of the above two forms will result in a warning.

If a network receives several input vectors at once, then all vectors are processed (in pseudo-parallel) and a set of output vectors is generated. If a network receives several training vector pairs at once, then all pairs are used to calculate a set of weight modifications, and the average weight modifications are applied to the network. Thus, a network can be trained sequentially (by feeding it a different training pair on each cycle), or it can be trained in a parallel burst (by feeding a whole set of training pairs at once). If a network is sent both raw vectors and training data on the same cycle, the raw vectors are processed before the training takes effect.

Properties

Networks are highly configurable. Fourteen properties govern their behaviour:

Input Width (possible values: integers in the range 1 -- 9999; default: 10)
This parameter specifies the number of nodes in input vectors accepted by the network.

Output Width (possible values: integers in the range 1 -- 9999; default: 10)
This parameter specifies the number of nodes in output vectors generated by the network.

Connectivity (possible values: any real number; default: 1.00)
This determines the proportion of possible connections between input and output nodes which are actually present in the network. A connectivity of 0.50 will mean that, on average, 1 in every 2 possible connections is actually present. Although it is possible in principle to set this parameter to any real number, in practice only numbers in the range 0 to 1 are sensible.

Min Act (possible values: any real number; default: -1.00)
This parameter specifies the minimum activation that an output node may achieve.

Max Act (possible values: any real number; default: 1.00)
This parameter specifies the maximum activation that an output node may achieve.

Act Function (possible values: linear/sigmoidal; default: linear)
The basic principle of network operation is that the output at each node is the result of applying some squashing function to the sum of the weighted inputs to that node. Weights are initially determined by further parameters specified below, and change as the network learn. The shape of the squashing function, however, is specified by the Act Function parameter. Two basic functions are available:

  1. A linear function employs a piece-wise linear squashing function, such that inputs below one threshold are mapped to the minimum activation, and inputs greater than a second threshold are mapped to the maximum activation. Inputs between the thresholds are mapped linearly between the activation limits.
  2. A sigmoidal function employs a non-linear squashing function based on the sigmoid or logistic equation. Low inputs are mapped to values near the minimum activation, and high inputs are mapped to values close to the maximum activation. Intermediate inputs are mapped non-linearly between the activation limits. (Note that the sigmoidal function is equivalent to the standard sigmoid function when activations range from 0 to 1 and the standard hyperbolic tangent function when activations range from -1 to +1.)

The precise activation function (in terms of its thresholds, and the gradient of the function at intermediate inputs) is determined from the values of Min Act Max Act, Act Function, Act Slope and Act Midpoint.

Act Midpoint (possible values: any real number; default: 0.00)
As noted above, weighted inputs are summed before a squashing activation function is applied. Extreme sums map to extreme activation values. This parameter partially determines what extreme sums are by specifying what counts as a middling sum. If the sum is equal to the activation midpoint, then the output for that node will be the average of Min Act and Max Act. This property is independent of the actual activation function selected.

Act Slope (possible values: any real number; default: 1.00)
This parameter specifies the gradient of the activation function when the input to the function is that specified by Act Midpoint.

The specification of activation functions in terms of these various parameters makes it simple to compare different activation functions. Selecting different values for Act Function will preserve the basic characteristics (in terms of steepness, and action on extreme inputs) of the activation function, as shown in the following figure, which shows different activation functions with equal minimum activation, maximum activation, midpoint activation and slopes.

Initialise (possible values: Each Trial/Each Block/Each Subject/Each Experiment/Each Session; default: Each Trial)
The timing of network initialisation is determined by this property. When the value is Each Trial, the network will automatically initialise itself at the beginning of each trial. When the value is Each Block, the network will initialise itself at the beginning of each block of trials (i.e., weights will be preserved across trials within a block). Similarly, when the value is Each Subject, weights will be preserved across simulated blocks. When the value is Each Experiment, weights will be preserved across subjects, and when the value is Each Session, weights will be preserved across experiments.

Learning Rule (possible values: delta/Hebbian; default: delta)
Feed-forward networks are capable of either delta-rule learning or Hebbian learning. The value of this parameter controls the learning algorithm employed by any specific network.

Learning Rate (possible values: a real number greater 0; default: 0.10)
This parameter is used in the calculation of weight changes. In general, a high learning rate will mean that the weight matrix responds more quickly to input-output training pairs, but may result in the network being insufficiently sensitive to its parts training history.

Initial Weights (possible values: uniform/normal: default: uniform)
This parameter governs the shape of the initial weight distribution function. If it is set to uniform, then on initialisation, weights will be randomly selected from a uniform distribution. The parameters which govern the distribution (i.e., the minimum and maximum possible weights) are determined by the two following properties: Weight Parameter A and Weight Parameter B. If Initial Weights is set to normal, then on initialisation, weights will be randomly selected from a normal distribution. The parameters which govern the distribution (i.e., the mean and standard deviation of the possible weights) are determined by the two following properties: Weight Parameter A and Weight Parameter B.

Weight Parameter A (possible values: any real number; default: -1.00)
If Initial Weights is set to uniform, then this specifies the lower limit of the weight distribution. It Initial Weights is set to normal, then this specifies the mean of the weight distribution.

Weight Parameter B (possible values: any real number; default: 1.00)
If Initial Weights is set to uniform, then this specifies the upper limit of the weight distribution. It Initial Weights is set to normal, then this specifies the standard deviation of the weight distribution.

The Weight Matrix Viewer

The weight matrix viewer provides access to a dynamically updated representation of the network's weight matrix. The viewer is available as a page on all windows associated with feed-forward networks, and is updated on each processing cycle.


COGENT Online
CONTENTS
COGENT Version 2.3 Help