18 Artificial Intelligence

Author

CG Tredoux

Note on this chapter

This chapter is supplementary reading. It covers AI and neural networks in more depth than the core course requires. Read it if the topic interests you, or return to it when these ideas come up elsewhere in your studies. Some of it is quite advanced, and since it is still in progress, read for interest only.

18.1 What is Artificial Intelligence?

Artificial intelligence (AI) is the project of building systems that perform tasks which, in humans, require intelligence: perception, learning, language use, reasoning, planning, and decision making. The term was introduced in the 1955 Dartmouth proposal, which framed the goal as describing aspects of intelligence precisely enough that a machine could simulate them (McCarthy et al., 1955).

AI can be treated as engineering or as theory of mind. An engineering system may work without matching human cognition; a cognitive model aims to explain behaviour and mechanism. Psychology draws on both: AI methods can be tools, and they can serve as models of representation and processing.

In 1950, Alan Turing proposed a practical approach to the question. Rather than asking “Can machines think?”, a question that gets stuck on definitions, he proposed the imitation game: a human interrogator communicates by text with two hidden partners, one human and one machine. If the machine can fool the interrogator consistently, Turing argued, we have as much reason to call it intelligent as we do a person. The test sidesteps philosophical debates and focuses on observable performance. The imitation game (now commonly called the Turing test) set the tone for a field that judges intelligence by what a system can do. (Turing, 1950)

18.2 A brief history of the idea

Long before electronic computers, Charles Babbage designed calculating machines in the 1820s and 1830s. His Analytical Engine anticipated the architecture of modern computers: it had a memory store, a processing unit, and punched cards for programming, borrowed from the Jacquard loom. Ada Lovelace, working with Babbage, argued in 1843 that such a machine could manipulate any symbols that could be formally represented, not just numbers. These ideas anticipated the architecture of modern computers even though the machines were never completed. (Science Museum Group, n.d.)

By mid-20th century, Turing’s 1950 paper had reframed the problem as a test of behaviour. The next landmark was the Dartmouth workshop proposal of 1955. John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon wrote a funding proposal stating that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” The workshop, held in 1956, gave the field its name and established its core agenda: making machines that could use language, form abstractions, solve problems, and improve themselves. Many of the attendees led AI research for decades. (McCarthy et al., 1955)

Early AI work emphasised symbolic problem solving and rule-based reasoning. Intelligence was framed as the manipulation of symbols according to explicit rules. The key technical challenges were representation (encoding knowledge in a form the machine can work with) and search (navigating large spaces of possible actions). This approach produced demonstrations in games, theorem proving, and planning. But the real world is noisy, ambiguous, and incomplete. Machine translation was an early casualty: researchers discovered that translating natural language required not just grammar rules but vast world knowledge, and progress stalled. Expert systems in the 1970s and 1980s attempted to capture specialist knowledge as explicit rules. MYCIN, for example, encoded hundreds of rules for diagnosing bacterial infections. These systems worked in narrow domains but were brittle: they failed when they encountered situations their rules did not cover, even slightly novel ones. The resulting disappointment contributed to the AI winters: periods of sharply reduced funding and public confidence. A first AI winter struck in the mid-1970s; a second in the late 1980s. (McCorduck, 2004; Nilsson, 2010)

Running in parallel was a different approach rooted in associationism: the philosophical tradition holding that complex mental life arises from the association of simple elements. In modern form, those elements are artificial neurons connected by adjustable links. This connectionist approach is bottom-up: rather than programming rules, you specify a network architecture and a learning procedure, and the system finds its own rules from examples. The distinction matters. is top-down: the programmer specifies the rules and the machine follows them. The connectionist approach is bottom-up: the programmer specifies a network and a learning procedure, and the system discovers its own representations by being exposed to examples. The knowledge lives not in explicit rules but in the pattern of connection strengths across many units. (Nilsson, 2010)

Early milestones: McCulloch and Pitts (1943) modelled neurons as logical devices; Hebb (1949) proposed that experience strengthens synaptic connections; Rosenblatt (1958) built the , the first practical learning machine. Minsky and Papert’s 1969 analysis showed that single-layer perceptrons had fundamental limits, cooling interest for a decade. The revival came in 1986 when Rumelhart, Hinton, and Williams demonstrated , a training algorithm for multi-layer networks. The two-volume Parallel Distributed Processing (PDP) books, also published in 1986, provided the theoretical framework and worked examples. Networks could now learn to perform tasks that were impossible to specify as rules: a neural network trained on thousands of handwritten digits, for instance, can learn to recognise them with accuracy, without any explicit rules about what constitutes a digit. (Hebb, 1949; McCulloch & Pitts, 1943; Minsky & Papert, 1969; Rosenblatt, 1958; Rumelhart, Hinton, et al., 1986; Rumelhart, McClelland, et al., 1986)

From the 1980s onward, probabilistic and statistical methods gained influence, reframing intelligence as inference under uncertainty. The modern AI landscape mixes symbolic and statistical approaches. By the 2010s, the amount of compute used in the largest training runs was rising far faster than Moore’s Law, while the internet supplied enormous datasets: billions of images, trillions of words, and vast archives of video and audio. These improvements made large-scale learning practical and changed what AI could do (OpenAI, 2018).

18.3 Overview

Of the many approaches to AI, one family now dominates both research and public attention: neural networks and their descendants, including deep learning and . The successes of the 2010s and 2020s (image recognition at human-level accuracy, language models that write fluent prose, game-playing systems that defeat world champions) are all built on neural network architectures. This chapter focuses on neural networks because they are the most active area of current AI research and the approach that provides the most testable hypotheses about representation and learning in the mind.

Neural networks (NNs) are models built from simple processing units connected by weighted links. Each unit does a small computation; the network’s behaviour emerges from the pattern of connections. Learning means adjusting those connections so that outputs match desired targets. This simple idea, adjust connection strengths to reduce errors, underlies everything from a single perceptron to a large language model with billions of parameters. (Rosenblatt, 1958; Rumelhart, McClelland, et al., 1986)

The chapter proceeds in five steps. First, we trace the early neural network models that framed neurons as computational units and introduced learning rules. Second, we explain what a neural network is: weights, sums, , and layered structure. Third, we explain training with and backpropagation. Fourth, we survey the main network types used today. Finally, we connect these models to psychological work and discuss large language models and their limits.

18.4 From neuron models to learning machines

18.4.1 The McCulloch-Pitts neuron

McCulloch (a neurophysiologist) and Pitts (a logician) proposed in 1943 that a neuron is a threshold device: it receives inputs, and if the total exceeds a threshold, the neuron fires (outputs 1); otherwise, it stays silent (outputs 0). This is a simplification of real neurons, but it had significant consequences. Their key result was that networks of such units can implement logic gates: AND, OR, and NOT.

A logic gate takes one or more binary inputs (each 0 or 1) and produces a binary output according to a fixed rule. An AND gate outputs 1 only if all its inputs are 1. An OR gate outputs 1 if any input is 1. A NOT gate flips its input. These are the building blocks of Boolean logic.

Because any digital computation can be broken into combinations of AND, OR, and NOT operations, the result meant that neural networks could, in principle, compute anything a digital computer can. McCulloch and Pitts thus provided a formal language for thinking about cognition in computational terms, framing the brain as a system that occupies a configuration at each moment (a state) and moves to a new configuration depending on its inputs, much like a digital computer stepping through a program. (McCulloch & Pitts, 1943)

Logic gates and XOR

18.4.2 Hebbian learning

Donald Hebb, a Canadian psychologist at McGill, proposed in 1949 that when neuron A repeatedly helps fire neuron B, the connection from A to B strengthens. This is often stated as “neurons that fire together wire together.” Hebb proposed that this strengthening of connections is the physical basis of association and memory: when two neurons are consistently active together, their link grows, so in the future, activity in one more reliably triggers activity in the other (i.e., what you were introduced to in Psy2014s as ‘long-term potentiation’). This links experience directly to changes in synapses.

was not a complete learning algorithm: it did not specify how errors should be corrected, and it tends in theory to let connections grow without bound. But it provided a crucial bridge: a plausible biological mechanism by which the brain could store associations through altered connection strengths, and a starting point for designing learning rules in artificial networks. (Hebb, 1949)

18.4.3 The perceptron and its limits

Frank Rosenblatt, a psychologist at Cornell, built the perceptron in 1958: a device that takes several inputs, multiplies each by a weight, sums the results, and compares the sum to a threshold to produce a binary output. The key innovation is the learning rule: when the perceptron errs, weights are adjusted to make the correct answer more likely next time. The perceptron can learn any classification: one where a straight line (or flat surface) can separate the correct answers from the incorrect ones. (Rosenblatt, 1958)

However, many real problems are not linearly separable. The XOR function is the classic case. XOR outputs 1 if exactly one of two inputs is 1, but 0 if both inputs are the same. If you plot the four possible input pairs on a graph, the two “positive” cases sit on opposite corners of a square and the two “negative” cases on the other two corners. No straight line separates the positives from the negatives. A perceptron can only draw straight lines, so it cannot learn XOR.

The XOR problem. A single-layer perceptron can only draw a straight line to divide “yes” from “no”, but there is no straight line that puts the two teal dots on one side and the two gray dots on the other. The groups are interleaved. This is what “linearly inseparable” means

Minsky and Papert (1969) proved this mathematically. A single-layer perceptron cannot solve any non-linearly-separable problem. They acknowledged that multi-layer networks might overcome this, but expressed scepticism about training them. Their analysis, combined with the prestige of its authors, had a dampening effect on connectionist research that lasted nearly two decades. It would take backpropagation and multi-layer networks to show that the limitation belonged to single-layer networks, not to neural networks in general. (Minsky & Papert, 1969)

18.5 What a neural network is

A neural network is a collection of units arranged in layers. Each unit does the following:

receives a set of numerical inputs,
multiplies each input by a weight (its learned importance),
sums the weighted inputs and adds a bias constant,
passes the result through an activation function to produce its output.

Weights are the most important part of a neural network. A weight is a number attached to each connection between units. If a weight is large and positive, the corresponding input strongly promotes activity. If the weight is negative, the input inhibits activity. If the weight is near zero, that input has little effect. If you have encountered regression in statistics, the concept is similar: in \(y = b_0 + b_1 x_1 + b_2 x_2\), the coefficients \(b_1\) and \(b_2\) tell you how much each predictor contributes. Weights play the same role. The key difference is that in regression the researcher interprets the coefficients; in a neural network, the weights are learned automatically from data. Bias is a constant added to the weighted sum before the activation function. It shifts the unit’s baseline response up or down, independent of the inputs, like an intercept in a regression. Activation functions determine what output the unit produces given its weighted sum. This is where non-linearity enters. Without a non-linear function, stacking multiple layers achieves nothing: a linear function of a linear function is still linear, so a deep network of linear units collapses mathematically to a single layer. Non-linearity breaks this limit. When you stack layers, each adding its own non-linearity, the network can represent increasingly complex relationships. Three common activation functions are:

Step function: outputs 0 below threshold, 1 above. Used in the original perceptron, but the sharp jump makes it difficult to use with gradient-based learning.
Sigmoid: a smooth S-curve between 0 and 1, interpretable as a probability. Standard in the 1980s and 1990s connectionist revival.
ReLU (rectified linear unit): \(f(x) = \max(0, x)\). Here \(f(x)\) means “the output when the input is \(x\),” and \(\max(0, x)\) means “choose whichever is larger: 0 or \(x\).” So if \(x\) is positive, the function returns \(x\); if \(x\) is negative, it returns 0. Simple and effective; the most common activation function in modern networks.

Mathematically, the computation for one unit is:

\[s = \sum_{i=1}^{n} w_i x_i + b, \quad y = f(s)\]

where \(s\) is the weighted sum before activation, the symbol \(\sum\) means “add up,” \(i\) is just a counter that runs from 1 to \(n\), \(n\) is the number of inputs, \(w_i\) is the weight attached to input \(i\), \(x_i\) is input \(i\), \(b\) is the bias, \(y\) is the unit’s output, and \(f\) is the activation function.

In plain language: multiply each input by its weight, add those products together, add the bias, and then pass that total through the activation function to get the output.

Worked example: Suppose \(x_1 = 1\), \(x_2 = 0\), \(w_1 = 0.8\), \(w_2 = -0.4\), and \(b = 0.1\). The weighted sum is \(s = (0.8 \times 1) + (-0.4 \times 0) + 0.1 = 0.9\). With a sigmoid activation, the output is about 0.71. With a step activation at threshold 0, the output is 1. The same inputs produce different outputs depending on the activation function chosen.

Linking to psychological variables: These components connect to concrete research questions. Suppose the task is to classify whether a face is familiar. The inputs might be measurable features extracted from the face image, or behavioural measures such as response time and confidence. The output might be a binary decision (familiar or unfamiliar) or a continuous probability of familiarity. The weights determine how much each feature contributes to the decision, much as beta weights in multiple regression determine how much each predictor contributes. Learning, in this context, means adjusting weights so the network’s outputs move closer to the correct targets. (Rumelhart, McClelland, et al., 1986)

Single neuron: weights, sum, activation

Layers: A full network stacks many units into layers. The input layer takes the raw data. The output layer produces the network’s answer. Between them are one or more , so called because their outputs are not directly observed. What do hidden layers compute? They compute features, measurable properties of the input that are useful for the task. A hidden unit combines several inputs (weighted and summed), applies a non-linear activation, and effectively creates a new feature that detects a particular pattern. For instance, one hidden unit might learn to respond to a diagonal edge; another to a colour gradient. These intermediate features are discovered during training, not specified by the designer.

The number of learnable parameters (weights and biases) grows quickly with the number of layers and units. A small network may have a few dozen parameters; a large language model can have hundreds of billions. This capacity allows the network to fit complex data, but it also raises the risk of : a network with enough parameters can memorise training data perfectly yet fail on new data. The practical challenge is building a model expressive enough to learn genuine structure but not so flexible that it memorises noise. (Goodfellow et al., 2016)

18.5.1 Hidden layers and why they matter

Hidden layers are the key move that allows neural networks to go beyond single-layer perceptrons. Recall the XOR problem: a single perceptron cannot solve it because no straight line separates the categories. A network with one hidden layer can solve XOR. The hidden units each learn to detect a different pattern in the inputs, re-describing the problem in new terms. In the re-described space, the categories are linearly separable, and the output unit can draw a straight line to separate them. In geometric terms, hidden units carve the input space into regions; the output unit combines those regions to produce the correct classification.

Hidden layers also allow the network to learn intermediate features. A hidden unit might learn to respond to a combination of inputs such as “x1 high and x2 low.” Those features are not specified by the researcher; they are emergent, discovered during training. The idea parallels the way perceptual systems combine simple features into complex ones.

Why are they called “hidden”? The input layer is visible because you feed it data; the output layer is visible because you read the answer from it. The layers in between are hidden from the outside world. Their representations are internal, discovered during training rather than specified by the designer. The process can be summarised in two steps. First, the hidden layer transforms the input into a new representation, re-describing the data in terms that are useful for the task. Second, the output layer draws a boundary in this new space. The hidden layer does the difficult work of untangling the data; the output layer draws the boundary in the untangled space. (Rumelhart, Hinton, et al., 1986)

In psychological terms, XOR is a case where cues do not combine additively. Neither input alone predicts the output; the combination of the two inputs determines the answer entirely. This is an interaction in statistical terms. Hidden layers allow the network to learn whichever combination rule fits the data, because they build intermediate detectors that capture the relevant interactions. This connects directly to psychology: the effect of lighting on face recognition may depend on the viewing angle, and neither factor alone tells the full story. (Minsky & Papert, 1969; Rumelhart, Hinton, et al., 1986)

18.6 Learning and backpropagation

Learning requires two ingredients: a and an update rule.

18.6.1 Loss functions

The loss function (also called the error or cost function) measures how wrong the network’s current outputs are. It assigns a single number to performance: zero means perfect; larger values mean worse. It is the network’s only guide to learning. Two common choices:

Mean squared error (MSE): for each training example, compute the squared difference between the network’s output and the correct answer, then average across all examples. If the network predicts 0.7 and the correct answer is 1.0, the squared error is \((0.7 - 1.0)^2 = 0.09\). MSE is standard for regression tasks.
Cross-entropy: standard for classification. It penalises confident wrong answers heavily. If the network says it is 99% sure of the wrong category, cross-entropy assigns a very large loss; if it is only 51% sure, the loss is much smaller.

Entropy measures the average surprise in a set of outcomes: if a coin always lands heads, there is no surprise and entropy is zero; if it lands heads or tails with equal probability, surprise is high. Cross-entropy compares two probability distributions: the network’s predicted probabilities and the correct answers. It measures how surprised the network would be by the actual outcomes. High surprise means a large loss. Cross-entropy penalises confident wrong answers much more than hesitant ones, making it suited to classification tasks where the network outputs probabilities. (Goodfellow et al., 2016)

18.6.2 Gradient descent

Gradient descent uses derivatives to reduce the loss. For each weight, the algorithm asks: if I increase this weight by a tiny amount, does the loss go up or down? The derivative of the loss with respect to that weight gives the answer. If the derivative is positive, increasing the weight increases the loss, so the weight should decrease. If negative, the weight should increase. The learning rate controls how large each step is: too large and the network overshoots; too small and learning is slow.

A landscape analogy helps here: imagine the loss as the height of terrain and the current weights as a position on that terrain. Gradient descent rolls the ball downhill. At each step, it moves in the direction of steepest descent. Eventually it settles in a valley, a combination of weights where the loss is low. The procedure is not guaranteed to find the global lowest point; it may settle in a local minimum. In practice, though, gradient descent works well. (Rumelhart, Hinton, et al., 1986)

For mean squared error, the loss is:

\[L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)^2\]

where \(L\) is the overall loss, \(N\) is the number of training examples, the hat in \(\hat{y}_i\) means “predicted value,” so \(\hat{y}_i\) is the model’s prediction for example \(i\), \(y_i\) is the correct answer for example \(i\), and \(\sum\) means add these squared errors across all examples before dividing by \(N\) to get the average.

In plain language: for each example, work out how far the prediction is from the correct answer, square that difference so that larger mistakes count more, and then average across all examples.

18.6.3 Backpropagation

Backpropagation (backward propagation of errors) makes gradient descent practical for multi-layer networks. The problem is that in a multi-layer network, we can measure error at the output layer, but how do we adjust weights in the hidden layers, which have no direct access to the correct answer? Backpropagation answers this using the chain rule from calculus. The chain rule is: if a change in \(A\) causes a change in \(B\), and a change in \(B\) causes a change in \(C\), then the effect of \(A\) on \(C\) can be computed by multiplying the two individual effects. The symbol \(\partial\) that appears below means a partial derivative: it tells us how much one quantity changes when we vary just one variable and hold the others fixed.

Applied to neural networks: the chain rule lets us trace the effect of each weight on the final loss, layer by layer. The algorithm works backward through the network. First, it computes the error at the output. Then it asks: how much did each weight in the last hidden layer contribute to that error? Then: how much did each weight in the second-to-last layer contribute? And so on, back to the first hidden layer. Each hidden unit receives a signal about how much it contributed to the overall error, and its weights are adjusted accordingly. The chain rule for a weight \(w\) can be written as:

\[\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial s} \cdot \frac{\partial s}{\partial w}\]

where \(L\) is the loss, \(w\) is the weight we want to update, \(\hat{y}\) is the predicted output, \(s\) is the weighted sum before activation, and each fraction asks “how much does the top quantity change when the bottom quantity changes?” The dots mean multiply the three pieces together.

In plain language: this equation breaks a hard question, “how does this weight affect the loss?”, into three easier questions: how the loss depends on the prediction, how the prediction depends on the weighted sum, and how the weighted sum depends on the weight itself. Multiplying those pieces gives the weight’s contribution to the error. (Rumelhart, Hinton, et al., 1986)

Backpropagation is not a model of biological learning: it requires a global error signal and precise weight updates that are not obviously available to real neurons. It is, however, the standard algorithm in machine learning and the baseline for most deep learning systems.

Backpropagation with a hidden layer

18.6.4 Training, validation, and testing

Before training begins, available data are divided into three subsets:

Training set: what the network learns from. Weights are adjusted to reduce loss on these examples.
Validation set: never trained on directly. After each round of training, performance on the validation set is checked. This gives an independent measure of whether the network is learning the general pattern or memorising the training examples. The validation set is also used to choose hyperparameters, such as the number of layers, the learning rate, and the batch size. Hyperparameters are set by the researcher before training; they are distinct from the learned weights and biases.
Test set: held back entirely until the end. Used once, to estimate performance on truly new data.

This separation matters because a network can achieve perfect performance on its training data and still fail on new data (overfitting). The validation set acts as an early warning: if training loss keeps decreasing but validation loss starts rising, the network is memorising rather than generalising, and training should stop. (Goodfellow et al., 2016)

18.6.5 Managing overfitting

Several techniques reduce overfitting:

Early stopping: stop training when validation loss begins to rise.
Weight decay: add a penalty for large weights, analogous to ridge regression (\(L2\) penalty) in statistics.
Dropout: randomly turn off a fraction of units during training, forcing the network to build redundant representations and preventing reliance on any single unit.
Data augmentation: create modified versions of training examples (flipping, rotating, or slightly distorting images) to expand the dataset. (Goodfellow et al., 2016)

18.6.6 Learning regimes

: the model is given input-output pairs and learns to predict the correct output. Covers classification and regression.

: inputs are given without labels and the model must discover structure. Relevant to perception because structure can emerge without explicit feedback.

: the model takes actions, receives rewards, and adjusts behaviour to maximise long-term reward. The learning signal is sparse and delayed, making credit assignment harder than in supervised learning. (Sutton & Barto, 2018)

For multi-class classification, a activation is used in the output layer. It converts raw output scores for all categories into probabilities that sum to 1:

\[p_j = \frac{e^{z_j}}{\sum_{i=1}^{k} e^{z_i}}\]

where \(p_j\) is the predicted probability of category \(j\), \(z_j\) is the raw score the network gives to category \(j\) before converting scores into probabilities, \(k\) is the total number of categories, \(\sum_{i=1}^{k}\) means add across all categories from 1 to \(k\), and \(e\) is a mathematical constant, approximately 2.718.

In plain language: softmax takes all the raw output scores, turns them into positive numbers, and rescales them so that the final probabilities across all categories add up to 1. The category with the highest probability is the network’s prediction.

18.7 Network types and what they are for

Neural networks come in several forms. The key distinctions are how information flows through the network and what kind of data it handles.

The simplest type is the single-layer perceptron, discussed above. It connects inputs directly to outputs with no hidden layers and can only learn linar classification boundaries.

A feedforward multi-layer network adds hidden layers and can represent non-linear functions. Information flows in one direction only: from input through hidden layers to output, with no loops or feedback. “Deep learning” refers to networks with many layers trained with backpropagation; “deep” means the number of hidden layers between input and output. (Rumelhart, Hinton, et al., 1986)

18.7.1 Convolutional neural networks

(CNNs) are designed for images and other data with spatial structure. A digital image is a grid of numbers; each pixel has a value for brightness or colour. The key idea is the convolutional filter (or kernel): a small grid of weights (for example, 3x3 or 5x5) that slides across the image, computing a weighted sum at each position. The same filter is applied at every location, so if the filter detects a vertical edge in the top-left corner, it can also detect vertical edges everywhere else. This weight sharing reduces the number of parameters compared to a fully connected network and makes the network tolerant of small position shifts: an edge is an edge, regardless of where it appears.

After convolution, pooling layers summarise local activity by taking the maximum or average value within small regions (e.g., 2x2 patches). This makes representations less sensitive to the exact position of features, a property called translation invariance that is important because objects can appear anywhere in an image.

CNNs learn feature hierarchies. Early layers detect simple features such as edges, corners, and colour gradients; middle layers combine those into textures, repeating patterns, and object parts; later layers respond to whole objects or categories. The network discovers this hierarchy on its own during training. This parallels Hubel and Wiesel’s work on the visual cortex in the 1960s, which showed that perception builds complex representations by composing simpler ones hierarchically. In the 2012 ImageNet challenge, AlexNet (designed by Krizhevsky, Sutskever, and Hinton) dramatically outperformed competing methods, reducing error by nearly half (Krizhevsky et al., 2012). Over the next few years, deep CNNs came to rival or surpass reported human baselines on some large-scale image-classification benchmarks, and that shift was a turning point for both research and industry.

For psychology, CNNs provide a testable model of visual processing. Researchers can ask: does the CNN confuse the same pairs of objects that humans confuse? Does it show the same difficulty with unusual viewpoints or poor lighting? Do activation patterns in its intermediate layers predict neural responses measured by fMRI? When answers are yes, it suggests the CNN has discovered computations similar to those used by the human visual system. When answers are no, it identifies where psychological theories of vision need further development. (Krizhevsky et al., 2012)

Network types (simplified diagrams)

18.7.2 Recurrent neural networks

(RNNs) are designed for sequences: ordered data where position matters. Examples include words in a sentence, notes in a melody, eye fixations, or phonemes in a spoken word. The meaning of the current item depends on what came before: “bank” means something different after “river” than after “savings.”

Unlike feedforward networks, which process each input independently, RNNs have connections that loop back. At each time step, the network receives the current input and a copy of its previous hidden state, a running summary of everything seen so far. This hidden state acts as a memory. As each new input arrives, the hidden state is updated by combining the new input with the old state.

Standard RNNs have a practical weakness: they struggle with long-range dependencies. If relevant context is many steps in the past, information tends to fade or distort. Technically, gradients used in backpropagation either shrink to zero () or grow explosively (exploding gradients), making it hard to learn relationships across long sequences.

(LSTM) networks, invented by Hochreiter and Schmidhuber in 1997, solve this. The key innovation is a set of gates: learned mechanisms controlling information flow. An LSTM unit has three gates: an input gate (how much new information to let in), a forget gate (how much old memory to keep), and an output gate (how much stored memory to reveal). These gates allow the LSTM to selectively remember or forget, preserving important context across long sequences. LSTMs were the dominant architecture for sequence tasks for nearly two decades, until transformers began to replace them. (Hochreiter & Schmidhuber, 1997)

Psychological relevance: RNNs and LSTMs connect directly to working memory: the hidden state is a computational analogue of the active contents of working memory, carrying forward a representation of recent context that influences how new inputs are processed. The LSTM’s gating mechanism provides a model of how working memory might decide what to maintain and what to discard (interference management). Researchers have used these models to simulate serial recall, sentence processing, and sequential decision making. An LSTM trained to process sentences can learn to maintain subject-verb agreement across intervening clauses, a task that requires working memory in humans; when the model makes errors, they tend to be the same kinds of errors humans make. (Hochreiter & Schmidhuber, 1997)

18.7.3 Transformers

, introduced by Vaswani and colleagues in 2017, replace recurrence with . In an RNN, information must pass step by step from one time point to the next, and information can be lost along the way. Transformers take a different approach: every element in the sequence can “look at” every other element directly, in a single step. There is no chain of hidden states to pass information through.

The core mechanism is self-attention. For each token in a sequence, the model computes how much attention it should pay to every other token. In “The cat sat on the mat because it was tired,” when processing “it,” the model needs to determine that “it” refers to “the cat” rather than “the mat.” Self-attention allows the model to assign high attention weight to “cat” and low weight to “mat.” This is computed simultaneously for all tokens, which is much faster than stepping through the sequence one token at a time. The original transformer paper demonstrated that this architecture matched or exceeded RNN performance on machine translation while training faster. Transformers now underlie virtually all large language models. (Vaswani et al., 2017)

Psychological relevance: The word “attention” in the transformer literature differs from how psychologists typically use it, but there is a genuine parallel. In psychology, selective attention refers to focusing on relevant information while ignoring irrelevant information. In a transformer, attention is a learned mechanism that determines which parts of the context are relevant for processing the current input. Researchers can examine the attention patterns of a trained transformer and compare them to human eye-tracking data or reading-time measurements, asking whether the model and human focus on the same parts of a sentence. (Vaswani et al., 2017)

18.7.4 Reinforcement learning

Reinforcement learning (RL) is a different use of neural networks. In supervised learning, the network is told the correct answer for each example. In RL, there are no correct answers. Instead, an agent interacts with an environment, takes actions, and receives rewards. The agent must figure out through trial and error which actions lead to the best outcomes.

Consider training a network to play a computer game. The network sees the screen, chooses an action (e.g., move the paddle left, right, or stay), and receives a reward (e.g., points for breaking bricks, penalty for missing the ball). The network is not told which action is correct; it must discover through thousands of games that moving the paddle to meet the ball leads to better outcomes. This is learning from the consequences of actions.

The connection to psychology is direct: this is the behaviourist framework of Skinner’s operant conditioning, perhaps made more mathematically precise. (Sutton & Barto, 2018)

There are two key concepts here: the policy (a learned rule specifying what action to take in each state) and the value function (the estimated long-term payoff of being in a particular state, not just the immediate reward). Many RL algorithms learn both simultaneously. The value function helps the agent evaluate its situation (“How good is this state?”), while the policy determines what it does (“What action should I take?”). This maps onto a psychological distinction between evaluation (assessing how good an outcome is) and choice (selecting an action), which are thought to involve different neural systems.

Exploration versus exploitation: the agent must balance trying new actions against using actions that already seem good. This is a precise version of a problem that arises throughout human cognition. (Sutton & Barto, 2018)

18.8 Representation learning and distributed coding

A central idea in modern networks is . To understand it, consider two ways a neural system might represent information. In a localist representation, each concept is represented by a single unit or small group of dedicated units. Evidence exists in favour of this proposition: Quiroga and colleagues (2005) reported individual neurons in the human medial temporal lobe that responded selectively to specific people, one neuron firing when the patient saw pictures of Jennifer Aniston regardless of angle or context. This “Jennifer Aniston neuron” (or “grandmother cell”) suggests a localist code.

In a distributed representation, each concept is represented by a pattern of activity across many units, and each unit participates in representing many different concepts. “Dog” might be represented by a pattern where units 3, 7, 12, and 45 are highly active, while “cat” might be represented by a pattern where units 3, 7, 14, and 50 are active. The patterns overlap; both “dog” and “cat” share active units 3 and 7, which might encode features they share (four legs, fur, pet). This overlap is why distributed representations capture similarity: items similar in meaning have similar activation patterns.

This idea was central to the parallel distributed processing (PDP) tradition of Rumelhart, McClelland, and colleagues in the 1980s. Distributed representations have several advantages over localist ones: (1) the system captures similarity through pattern overlap; (2) it shows graceful degradation when units are damaged (because information is spread across many units, losing one unit degrades performance gradually rather than catastrophically); (3) it generalises to new items that share features with known items; and (4) it can represent many concepts using combinatorial patterns across a modest number of units, just as 26 letters can represent millions of words. (Rumelhart, McClelland, et al., 1986)

Lashley’s work in the 1920s through the 1950s anticipated this. He spent decades removing brain regions from rats, trying to localise memory. His conclusion was that memory is not localised in any single region but distributed across the cortex: removing any region degraded performance somewhat, but no single region was indispensable. In connectionist models, similarly, a memory is stored across many connection weights; damage to some units degrades the memory gradually but does not erase it, matching real brain behaviour better than localist models. (Rumelhart, McClelland, et al., 1986)

. A modern application of distributed representation is the embedding. An embedding represents an item (a word, a sentence, an image) as a vector of numbers in a multi-dimensional space. For example, the word “king” might be represented as a vector of 300 numbers learned during training by predicting words from their context in large bodies of text. The property of learned embeddings is that geometric relationships in the vector space correspond to semantic relationships. Words similar in meaning end up close together. Consistent relationships appear as consistent directions: the vector from “king” to “queen” is approximately the same as the vector from “man” to “woman.” Modern large language models represent every word (or sub-word token) as an embedding, and these embeddings are the foundation on which the model builds its understanding of language. Embeddings are not a complete theory of meaning, but they provide a concrete, computational framework for thinking about how meaning might be represented and compared. (Landauer & Dumais, 1997)

18.9 Psychological models that used networks

18.9.1 Face recognition: Bruce, Young, and Burton

In 1986, Vicki Bruce and Andrew Young proposed an information-flow model of face recognition. An information-flow model (sometimes called a “box-and-arrow” model) describes a cognitive process as a series of stages, each performing a different function. Bruce and Young proposed the following stages: structural encoding creates a representation of the face’s physical configuration; face recognition units (FRUs) compare this to stored representations of known faces; person identity nodes (PINs) link the recognised face to stored information about the person; and name generation retrieves the name. Separate parallel pathways handle emotional expression and facial speech (lip-reading). The model was supported by neuropsychological patients with selective impairments at different stages, and by the common experience of recognising a face without being able to recall the name. (Bruce & Young, 1986)

Face recognition architecture (schematic)

The Bruce and Young model was a verbal, box-and-arrow description. It specified the stages of face recognition but not the mechanisms: how does a face recognition unit actually match a perceived face to a stored representation? How fast does information flow between stages?

In 1990, Andrew Burton, Vicki Bruce, and Rob Johnston addressed these questions by implementing key parts of the Bruce and Young framework as an interactive activation and competition (IAC) network, where units are connected by excitatory and inhibitory links and activation spreads through the network over time. The move to a network model was motivated by precision: a working computational model generates quantitative predictions that a verbal description cannot.

In the IAC model, FRUs are connected to PINs by excitatory connections. When a face is perceived, its FRU becomes active and activation spreads to the corresponding PIN, which activates connected name and semantic information nodes. Connections are bidirectional: activation flows not only from FRU to PIN but also back. This reciprocal activation produces several predictions:

Familiarity effects: familiar faces, which have strong FRU-to-PIN connections built through experience, produce faster and stronger activation than unfamiliar faces. This matches experimental findings.
Semantic priming: having just seen Prince William leaves his PIN partially active, sending activation to associated PINs (e.g., King Charles). This makes it easier to recognise King Charles immediately after, an effect demonstrated in experiments.
Competition: units at the same level inhibit each other, so the system settles on a single identification.

The specific architecture is less important than the general principle: a network model embodies a psychological theory, making its assumptions explicit and its predictions quantitative. A box-and-arrow model says “structural encoding feeds into face recognition units”; a network model specifies the connection strengths, the activation dynamics, and the time course, allowing researchers to simulate experiments and compare model behaviour to human data trial by trial. (Burton et al., 1990)

Interactive activation model (simplified)

18.9.2 Past tense learning

Rumelhart and McClelland (1986) trained a connectionist network on English verb pairs and showed it could learn regular and irregular past tenses from examples alone, producing over-regularisation errors (such as “goed”) resembling children’s errors. The model suggested that rule-like behaviour could emerge from distributed representations without any symbolic rules. The network is trained on many verb pairs (walk/walked, go/went); it adjusts weights so that the output phonology matches the target past tense; it then produces over-regularisations when the statistical structure of the data favours regular patterns. (Rumelhart & McClelland, 1986)

Past tense network (schematic)

Pinker and Prince (1988) challenged the model on several grounds: its phonological representations were unrealistic and contained encoding artefacts that inflated apparent success; the model could not handle certain linguistic distinctions (for example, the difference between “ring/rang” versus “wring/wringed”); and children’s over-regularisation errors do not follow the same statistical pattern as the model’s errors. Most fundamentally, Pinker and Prince argued that language requires rules: that regular past tenses are generated by a symbolic rule (“add -ed”) that is categorically different from memory-based retrieval of irregular forms. A single connectionist network, they argued, cannot capture this distinction. (Pinker & Prince, 1988)

The debate is instructive. The question is not merely “Does the model produce the right outputs?” but “Does it produce them for the right reasons, using mechanisms that match human behaviour in detail?” A model that gets answers right but makes the wrong kinds of errors is not a good psychological theory even if it is an effective engineering solution.

Symbolic versus connectionist: The past tense controversy reflects a broader debate. Symbolic theories propose that the mind operates on discrete symbols according to explicit rules; grammar, logical reasoning, and mathematical proof are naturally described this way. Connectionist theories propose that knowledge is stored implicitly in connection weights and that rule-like behaviour emerges from statistical regularities, without explicit rules. The dispute is not simply about which is correct; it is about what kind of explanation is adequate for different cognitive tasks. Many cognitive scientists now think the mind uses both types of processing, and the interesting question is which tasks use which kind of representation and how they interact. These models do not settle the debate, but a computational model forces a theory to specify mechanisms that can be tested against data. (Pinker & Prince, 1988; Rumelhart & McClelland, 1986)

18.10 Deep learning and scaling

The resurgence of neural networks depended on both algorithmic and hardware changes. Backpropagation provided a practical learning rule for multi-layer networks; later increases in computation made it possible to train larger models. Geoffrey Hinton, Yann LeCun, and Yoshua Bengio received the 2018 ACM Turing Award for their contributions to deep learning (Association for Computing Machinery, 2019a). Hinton and John Hopfield later shared the 2024 Nobel Prize in Physics for foundational discoveries and inventions that enable machine learning with artificial neural networks (Nobel Prize Outreach, 2024).

AlexNet’s 2012 ImageNet result was a turning point. Over the following years, training runs became much larger, datasets kept growing, and benchmark performance improved rapidly (OpenAI, 2018). This convinced both researchers and industry that deep learning was practically superior for many perceptual tasks, and investment in AI surged.

Video: Hinton Turing Award lecture

Source: ACM Turing Award lecture video. (Association for Computing Machinery, 2019b)

18.11 Transformers and large language models

18.11.1 Pretraining, fine-tuning, and tokens

To understand large language models (LLMs), several terms need to be defined.

A token is the basic unit the model processes. Tokens are not always whole words. Most LLMs use subword tokenisation, which breaks text into frequent chunks. For example, “unbelievable” might split into “un”, “believ”, and “able.” Common words like “the” are usually a single token. A rough rule of thumb is that one token is about three-quarters of a word in English.

Pretraining is the first and most expensive phase. The model is exposed to an enormous text corpus (billions or trillions of words scraped from books, websites, Wikipedia, code repositories, and other sources) and learns to predict text. Two main prediction objectives:

Next-token prediction (GPT-style): given all the tokens so far, predict the next token. For example, given “The capital of France is”, predict “Paris.” This is done trillions of times, and the model gradually acquires grammar, facts, reasoning patterns, and style.
Masked prediction (BERT-style): randomly hide some tokens in the input and predict the missing ones from the surrounding context. Because the model sees context on both sides of the masked token, this is a bidirectional objective.

is the second phase. The pre-trained model is further trained on a smaller, more specific dataset, for example a set of question-answer pairs to become a question-answering system, or on human feedback about helpfulness to become a better conversational assistant (a process called reinforcement learning from human feedback, RLHF). Fine-tuning is much cheaper and faster than pretraining because the model already knows a great deal about language. (Brown et al., 2020; Devlin et al., 2018)

18.11.2 Key models

BERT (Google, 2018): trained with bidirectional masked prediction; strong on comprehension tasks. Introduced the pre-training/fine-tuning pattern now standard across the field.

GPT-3 (OpenAI, 2020): 175 billion parameters, trained on next-token prediction. Its most notable feature was few-shot learning: given just a few examples in the prompt, it could perform a wide range of tasks without additional fine-tuning.

GPT-4 (OpenAI, 2023): multimodal (text and images), substantially better on benchmarks than earlier versions, including some professional and academic exams.

The landscape of LLMs has become crowded. Many other organisations have developed powerful models: Anthropic’s Claude, Meta’s LLaMA, Google’s Gemini, Mistral AI’s Mistral, and DeepSeek’s models. These differ in architecture details, training data, and design philosophy (for example, whether model weights are openly available), but all share the transformer architecture and the pre-training/fine-tuning approach. (Brown et al., 2020; OpenAI, 2023)

18.11.3 Prompting and in-context learning

GPT-3 demonstrated a notable capability: a model can adapt to new tasks from examples provided in the prompt, without any weight updates. This is called in-context learning. For example, including a few translation examples at the start of a prompt can cause the model to continue translating. It is not long-term learning (the model’s weights do not change), but it can adapt behaviour within a single conversation.

This gave rise to prompt engineering: the practice of crafting input prompts to elicit the best responses. Small changes in wording can produce different answers. Techniques include giving examples of desired outputs (“few-shot” prompting), asking the model to “think step by step” (chain-of-thought prompting), or assigning the model a role. (Brown et al., 2020)

18.11.4 Capabilities and limits

LLMs are effective at generating coherent text, summarising documents, translating, writing and debugging code, and answering questions. These abilities follow from the training objective: predicting the next token forces the model to represent grammar, facts, reasoning patterns, and style.

LLMs have a well-known failure mode: , producing fluent, confident text that is factually wrong. This is not a bug that can be easily patched; it is a consequence of how the models work. They generate the most probable continuation, which is not always the true one. A common example is fabricated academic references: a model asked to cite sources may generate realistic-looking author names, journal titles, and dates that correspond to no real publication.

“Temperature” controls how variable a model’s outputs are. At low temperature, the model tends to choose the most probable next token, producing safer, more predictable, and often more repetitive text. At high temperature, it more often selects less probable tokens, producing more varied and creative but also more error-prone text. The term does not refer to the machine’s physical temperature; it is simply the name given to this sampling parameter.

LLMs have a fixed context window: they can only use the tokens within it to generate responses. Long conversations or documents may exceed this limit. The models do not build long-term memories unless embedded in systems that store information externally. (Brown et al., 2020; OpenAI, 2023)

18.11.5 What LLMs offer psychology

LLMs provide a testbed for theories of language and cognition.

Surprisal and reading times: psycholinguists have established that less predictable words take longer to read. LLM token probabilities generate surprisal scores that predict human reading times well. This supports theories that human language processing is fundamentally predictive.

Semantic similarity: LLM embeddings predict how similar people judge two words or sentences to be. This supports distributed representation theories of meaning.

Syntactic processing: researchers test whether LLMs show the same sensitivity to grammatical structure as humans, for example subject-verb agreement across intervening clauses. When models succeed, it shows that statistical learning from text can capture syntactic structure; when they fail, it reveals what text alone cannot teach.

Cognitive biases: some studies have found that LLMs show human-like biases such as framing effects or anchoring. When they do, it suggests these biases may arise from statistical patterns in language rather than from specifically human cognitive architecture.

If a model trained only on text can approximate certain human behaviours, that tells us what can be learned from linguistic input alone. If it fails in systematic ways, that highlights what kinds of experience or structure might be missing: embodied experience, social interaction, visual grounding, or explicit reasoning. LLMs are best treated as baseline models: they show what statistical learning from text can achieve, which helps clarify what additional cognitive mechanisms humans might need. (Brown et al., 2020; OpenAI, 2023)

One approach is to treat LLMs as simulated experimental participants. Researchers present the model with the same stimuli used in human experiments (sentence completions, moral dilemmas, categorisation tasks) and compare model responses to human data. The point is not to claim the model is a person or has genuine understanding. Rather, the model serves as a null hypothesis: if statistical learning from text alone reproduces a human behavioural pattern, that pattern does not require more complex cognitive mechanisms. If the model fails, that tells us something important about what is missing from a purely statistical account.

18.12 Limits and evaluation

Neural networks are effective when the task is well specified and training data are sufficient. They do not constitute a general theory of cognition. They can approximate mathematical functions and produce impressive outputs, but they do not by themselves explain how humans understand meaning, set goals, or model other people’s beliefs and intentions. For psychology, neural networks are best treated as models to compare with human data, tools for generating precise predictions and identifying where theories succeed or fail, not as replacements for psychological theory. (OpenAI, 2023)

Models can be hard to interpret, and their outputs can be confident even when wrong. Evaluation means testing behaviour on new data and considering failure modes, not trusting the internal story the model seems to tell.

18.13 Criticisms of AI

The rapid development and deployment of AI systems has provoked serious criticism on multiple fronts.

Bias and discrimination. AI systems learn from data, and data reflect the world as it is, including its inequalities. If a facial recognition system is trained mainly on lighter-skinned faces, it will perform poorly on darker-skinned faces. If a hiring algorithm is trained on historical decisions, it may learn to discriminate against underrepresented groups because those groups were historically excluded. These are not hypothetical scenarios: studies have documented racial and gender biases in commercial facial recognition systems, language models, and hiring tools. The algorithm is not prejudiced; it faithfully reproduces statistical patterns in biased training data.

Environmental costs. Training large models consumes substantial energy and water. Training a single large language model can produce carbon emissions comparable to the lifetime emissions of several cars. As models grow larger and are trained more frequently, these costs increase.

Exploitative labour. Data labelling and content moderation, both necessary for AI systems, are often outsourced to workers in low-income countries who are paid low wages. Content moderation, reviewing harmful or disturbing material that AI systems produce or encounter, has been documented as causing psychological harm to the workers who perform it.

Intellectual property. LLMs and image generators are trained on material scraped from the internet, including copyrighted books, articles, artwork, and code. Artists, writers, and programmers have argued that using their work to train commercial AI systems without permission or compensation is ethically and legally wrong. Several lawsuits are testing whether this constitutes copyright infringement.

Deepfakes. AI can generate convincing fake images, audio, and video. These are already used for fraud, political disinformation, and non-consensual pornography.

Transparency and accountability. Deep neural networks are difficult to interpret. When a model makes a decision (denying a loan, flagging a person as high-risk, recommending a sentence), it may be impossible to explain why in terms a human can evaluate. Who is responsible when an AI system makes a harmful error?

Concentration of power. The development of the most capable AI systems requires enormous resources, concentrating influence in a small number of large companies, primarily in the United States and China. This raises concerns about monopolistic behaviour and the exclusion of smaller organisations and poorer countries from shaping a technology that will affect everyone.

These criticisms do not argue for abandoning AI. Many applications are genuinely beneficial. They do argue for critical engagement: technology is shaped by the decisions of the people who build, fund, and deploy it, and it carries the values and biases of its creators and their data.

18.14 Key terms

Activation function: A function applied to a unit’s weighted sum to determine its output. Non-linear activation functions give networks the ability to model complex relationships. Common types: sigmoid, ReLU, step function.

Attention mechanism: A component of neural networks (especially transformers) that allows each element in a sequence to selectively weight contributions from other elements by relevance. Enables modelling of long-range dependencies.

Backpropagation: The algorithm that computes how much each weight in a multi-layer network contributed to the overall error, using the chain rule from calculus. Allows gradient descent to update weights in all layers, not just the output layer.

Catastrophic forgetting: The tendency of a neural network to lose previously learned information when trained on new data. Contrasts with human memory, where new learning rarely erases old knowledge so completely.

Distributed representation: A scheme in which a concept is represented by a pattern of activity across many units rather than by a single dedicated unit. Allows similarity to be captured through pattern overlap and provides robustness to damage.

Embedding: A learned representation of an item (e.g., a word) as a vector of numbers in a multi-dimensional space. Items similar in meaning are represented by nearby vectors.

Gradient descent: An optimisation algorithm that adjusts each weight in the direction that reduces the loss function, using the derivative of the loss with respect to each weight.

Hebbian learning: A learning rule proposed by Donald Hebb (1949): when one neuron repeatedly contributes to the firing of another, the connection between them is strengthened. Often summarised as “neurons that fire together wire together.”

Large language model (LLM): A neural network (typically a transformer) with billions of parameters, pre-trained on vast text corpora to predict the next token. Examples: GPT-4, Claude, LLaMA, Gemini.

Perceptron: A single-layer neural network that multiplies inputs by weights, sums the results, and applies a threshold function to produce a binary output. Can learn linearly separable classification tasks. Invented by Frank Rosenblatt (1958).

Recurrent neural network (RNN): A neural network with connections that loop back, allowing it to maintain a hidden state across time steps. Designed for sequential data such as language, speech, and time series.

Reinforcement learning: A learning framework in which an agent interacts with an environment, takes actions, and receives rewards. The goal is to learn a policy that maximises long-term reward.

Transformer: A neural network architecture that uses self-attention instead of recurrence to process sequences. Underlies all modern large language models.

18.15 Short starting points

Textbook-style intro (AI/NN overview): Goodfellow, Bengio, and Courville, Deep Learning, Chapter 1 (open online). (Goodfellow et al., 2016)
Popular overview (non-technical): Mitchell, Artificial Intelligence: A Guide for Thinking Humans. (Mitchell, 2019)
Brief online primer: CS50x AI notes overview. (CS50, 2026)

18.16 Test Yourself

18.17 Open-answer Check-in

Allen, H., Brady, N., & Tredoux, C. (2009). Perception of ’best likeness’ to highly familiar faces of self and friend. Perception, 38(12), 1821–1830. https://doi.org/10.1068/p6424

Alogna, V. K., Attaya, M. K., Aucoin, P., Bahńik, Š., Birch, S., et al. (2014). Registered replication report: Schooler & Engstler-Schooler (1990). Perspectives on Psychological Science, 9(5), 556–578. https://doi.org/10.1177/1745691614545653

American Civil Liberties Union. (2020). ACLU files lawsuit in landmark case of wrongful arrest due to faulty face recognition technology. https://www.aclu.org/cases/williams-v-city-of-detroit-face-recognition-false-arrest

Association for Computing Machinery. (2019a). ACM announces 2018 Turing award recipients. ACM Bulletin. https://www.acm.org/articles/bulletins/2019/march/turing-award-2018

Association for Computing Machinery. (2019b). Geoffrey Hinton and Yann LeCun, 2018 ACM A.M. Turing award lecture: The deep learning revolution. YouTube video. https://www.youtube.com/live/VsnQf7exv5I

Bacci, N., Davimes, J. G., Steyn, M., & Briers, N. (2021). Forensic facial comparison: Current status, limitations, and future directions. Biology, 10(12), 1269. https://doi.org/10.3390/biology10121269

Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge University Press.

Benjamin, R. (2019). Race after technology: Abolitionist tools for the New Jim Code. Polity Press.

Benson, P. J., & Perrett, D. I. (1994). Visual processing of facial distinctiveness. Perception, 23(1), 75–93. https://doi.org/10.1068/p230075

Bentin, S., Allison, T., Puce, A., Perez, E., & McCarthy, G. (1996). Electrophysiological studies of face perception in humans. Journal of Cognitive Neuroscience, 8(6), 551–565. https://doi.org/10.1162/jocn.1996.8.6.551

Binet, A., & Simon, T. (1905). Méthodes nouvelles pour le diagnostic du niveau intellectuel des anormaux. L’Année Psychologique, 11, 191–244.

Bobak, A. K., Hancock, P. J. B., & Bate, S. (2016). Super-recognisers in action: Evidence from face-matching and face memory tasks. Applied Cognitive Psychology, 30(1), 81–91. https://doi.org/10.1002/acp.3170

Boring, E. G. (1950). A history of experimental psychology (2nd ed.). Appleton-Century-Crofts.

Bothwell, R. K., Deffenbacher, K. A., & Brigham, J. C. (1987). Correlation of eyewitness accuracy and confidence: Optimality hypothesis revisited. Journal of Applied Psychology, 72(4), 691–695. https://doi.org/10.1037/0021-9010.72.4.691

Brennan, S. E. (1985). Caricature generator: The dynamic exaggeration of faces by computer. Leonardo, 18(3), 170–178. https://doi.org/10.2307/1578048

Brewer, N., Caon, A., Todd, C., & Weber, N. (2006). Eyewitness identification accuracy and response latency. Law and Human Behavior, 30(1), 31–50. https://doi.org/10.1007/s10979-006-9002-7

Brewer, N., & Wells, G. L. (2006). The confidence–accuracy relationship in eyewitness identification: Effects of lineup instructions, foil similarity, and target-absent base rates. Journal of Experimental Psychology: Applied, 12(1), 11–30. https://doi.org/10.1037/1076-898X.12.1.11

Brigham, J. C., & Bothwell, R. K. (1983). The ability of prospective jurors to estimate the accuracy of eyewitness identifications. Law and Human Behavior, 7(1), 19–30. https://doi.org/10.1007/BF01045284

Brigham, J. C., Maass, A., Snyder, L. D., & Spaulding, K. (1982). Accuracy of eyewitness identifications in a field setting. Journal of Personality and Social Psychology, 42(4), 673–681. https://doi.org/10.1037/0022-3514.42.4.673

Broadbent, D. E. (1958). Perception and communication. Pergamon Press.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. arXiv. https://doi.org/10.48550/arXiv.2005.14165

Bruce, V., Burton, A. M., Hanna, E., Healey, P., Mason, O., Coombes, A., Fright, R., & Linney, A. (1993). Sex discrimination: How do we tell the difference between male and female faces? Perception, 22(2), 131–152. https://doi.org/10.1068/p220131

Bruce, V., Healey, P., Burton, A. M., Doyle, T., Coombes, A., & Linney, A. (1991). Recognizing facial surfaces. Perception, 20(6), 755–769. https://doi.org/10.1068/p200755

Bruce, V., Ness, H., Hancock, P. J. B., Newman, C., & Rarity, J. (2002). Four heads are better than one: Combining face composites yields improvements in face likeness. Journal of Applied Psychology, 87(5), 894–902. https://doi.org/10.1037/0021-9010.87.5.894

Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77(3), 305–327. https://doi.org/10.1111/j.2044-8295.1986.tb02199.x

Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. John Wiley & Sons.

Brunet, M., Taddei, A., Py, J., Paubel, P.-V., & Tredoux, C. G. (2022). Social contact, own-group recognition bias and visual attention to faces. British Journal of Psychology, 114(Suppl. 1), 112–133. https://doi.org/10.1111/bjop.12603

Burton, A. M., Bruce, V., & Johnston, R. A. (1990). Understanding face recognition with an interactive activation model. British Journal of Psychology, 81(3), 361–380. https://doi.org/10.1111/j.2044-8295.1990.tb02367.x

Burton, A. M., Kramer, R. S. S., Ritchie, K. L., & Jenkins, R. (2016). Identity from variation: Representations of faces derived from multiple instances. Cognitive Science, 40(1), 202–223. https://doi.org/10.1111/cogs.12231

Carragher, D. J., & Hancock, P. J. B. (2020). Surgical face masks impair human face matching performance for familiar and unfamiliar faces. Cognitive Research: Principles and Implications, 5(1), 59. https://doi.org/10.1186/s41235-020-00258-x

Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354–380. https://doi.org/10.1037/0033-2909.132.3.354

Chang, L., & Tsao, D. Y. (2017). The code for facial identity in the primate brain. Cell, 169(6), 1013–1028. https://doi.org/10.1016/j.cell.2017.05.011

Charman, S. D., Carlucci, M., Vallano, J., & Hyman Gregory, A. (2010). The selective cue integration framework: A theory of postidentification witness confidence assessment. Journal of Experimental Psychology: Applied, 16(2), 204–218. https://doi.org/10.1037/a0019495

Chevroulet, C., Paterson, H. M., Yu, A., Chew, E., & Kemp, R. I. (2021). The impact of recall timing on the preservation of eyewitness memory. Psychiatry, Psychology and Law, 29(3), 471–486. https://doi.org/10.1080/13218719.2021.1926366

Chiroro, P. M., Tredoux, C. G., Radaelli, S., & Meissner, C. A. (2008). Recognising faces across continents: The effect of within-race variations on the own-race bias in face recognition. Psychonomic Bulletin & Review, 15(6), 1089–1092. https://doi.org/10.3758/pbr.15.6.1089

Chiroro, P., & Muller, K. (2005). Child witnesses. In C. Tredoux, D. Foster, A. Allan, A. Cohen, & D. Wassenaar (Eds.), Psychology and law (pp. 226–253). Juta Academic.

Chomsky, N. (1959). A review of B. F. Skinner’s Verbal Behavior. Language, 35(1), 26–58.

Christianson, S.-Å. (1992). Emotional stress and eyewitness memory: A critical review. Psychological Bulletin, 112(2), 284–309. https://doi.org/10.1037/0033-2909.112.2.284

Cicerone, K. D., Langenbahn, D. M., Braden, C., Malec, J. F., Kalmar, K., Fraas, M., Felicetti, T., Laatsch, L., Harley, J. P., Bergquist, T., Azulay, J., Cantor, J., & Ashman, T. (2011). Evidence-based cognitive rehabilitation: Updated review of the literature from 2003 through 2008. Archives of Physical Medicine and Rehabilitation, 92(4), 519–530. https://doi.org/10.1016/j.apmr.2010.11.015

City and County of San Francisco. (2019). San Francisco administrative code, chapter 19B (surveillance technology), section 19B.2. https://codelibrary.amlegal.com/codes/san_francisco/latest/sf_admin/0-0-0-56057

Clark, A., & Chalmers, D. (1998). The extended mind. Analysis, 58(1), 7–19. https://doi.org/10.1093/analys/58.1.7

Clark, S. E. (2005). A re-examination of the effects of biased lineup instructions in eyewitness identification. Law and Human Behavior, 29(4), 395–424. https://doi.org/10.1007/s10979-005-5690-7

Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716

Craik, K. J. W. (1943). The nature of explanation. Cambridge University Press.

CS50. (2026). Artificial intelligence — CS50x 2026. Course notes. https://cs50.harvard.edu/x/notes/ai/

Cutler, B. L., & Penrod, S. D. (1988). Improving the reliability of eyewitness identification: Lineup construction and presentation. Journal of Applied Psychology, 73(2), 281–290. https://doi.org/10.1037/0021-9010.73.2.281

Cutler, B. L., Penrod, S. D., & Dexter, H. R. (1989). The eyewitness, the expert psychologist, and the jury. Law and Human Behavior, 13(3), 311–332. https://doi.org/10.1007/BF01067032

Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge University Press. https://doi.org/10.1017/CBO9780511524059

Darwin, C. (1872). The expression of the emotions in man and animals. John Murray.

Davis, J. P., Forrest, C., Treml, F., & Jansari, A. (2018). Identification from CCTV: Assessing police super-recogniser ability to spot faces in a crowd and susceptibility to change blindness. Applied Cognitive Psychology, 32(3), 337–353. https://doi.org/10.1002/acp.3405

Davis, J. P., & Valentine, T. (2009). CCTV on trial: Matching video images with the defendant in the dock. Applied Cognitive Psychology, 23(4), 482–505. https://doi.org/10.1002/acp.1490

Deffenbacher, K. A. et al. (2008). Forgetting the once-seen face: Estimating the strength of an eyewitness’s memory representation. Journal of Experimental Psychology: Applied, 14(2), 139–150. https://doi.org/10.1037/1076-898X.14.2.139

Deffenbacher, K. A., Bornstein, B. H., & Penrod, S. D. (2006). Mugshot exposure effects: Retroactive interference, mugshot commitment, source confusion, and unconscious transference. Law and Human Behavior, 30(3), 287–307. https://doi.org/10.1007/s10979-006-9008-1

Deffenbacher, K. A., Bornstein, B. H., Penrod, S. D., & McGorty, E. K. (2004). A meta-analytic review of the effects of high stress on eyewitness memory. Law and Human Behavior, 28(6), 687–706. https://doi.org/10.1007/s10979-004-0565-x

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/arXiv.1810.04805

Diamond, D. M., Campbell, A. M., Park, C. R., Halonen, J., & Zoladz, P. R. (2007). The temporal dynamics model of emotional memory processing: A synthesis on the neurobiological basis of stress-induced amnesia, flashbulb and traumatic memories, and the Yerkes-Dodson law. Neural Plasticity, 2007, 1–33. https://doi.org/10.1155/2007/60803

Diamond, J. (2005). Collapse: How societies choose to fail or succeed. Viking.

Diamond, R., & Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology: General, 115(2), 107–117. https://doi.org/10.1037/0096-3445.115.2.107

Duchaine, B. C., & Nakayama, K. (2006a). Developmental prosopagnosia: A window to content-specific face processing. Current Opinion in Neurobiology, 16(2), 166–173. https://doi.org/10.1016/j.conb.2006.03.003

Duchaine, B. C., & Nakayama, K. (2006b). The Cambridge Face Memory Test: Results for neurologically intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic participants. Neuropsychologia, 44(4), 576–585. https://doi.org/10.1016/j.neuropsychologia.2005.07.001

Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary Anthropology, 6(5), 178–190. https://doi.org/10.1002/(sici)1520-6505(1998)6:5<178::aid-evan5>3.0.co;2-8

Dunning, D., & Perretta, S. (2002). Automaticity and eyewitness accuracy: A 10-to-12-second rule for distinguishing accurate from inaccurate positive identifications. Journal of Applied Psychology, 87(5), 951–962. https://doi.org/10.1037/0021-9010.87.5.951

Ebbinghaus, H. (1885). Über das Gedächtnis: Untersuchungen zur experimentellen Psychologie. Duncker & Humblot.

Ebbinghaus, H. (1913). Memory: A contribution to experimental psychology (H. A. Ruger & C. E. Bussenius, Trans.). Teachers College, Columbia University.

Edwards, P. N. (1996). The closed world: Computers and the politics of discourse in Cold War America. MIT Press.

Ekman, P. (1992). An argument for basic emotions. Cognition & Emotion, 6(3-4), 169–200. https://doi.org/10.1080/02699939208411068

Ellis, H. D., Shepherd, J. W., & Davies, G. M. (1979). Identification of familiar and unfamiliar faces from internal and external features: Some implications for theories of face recognition. Perception, 8(4), 431–439. https://doi.org/10.1068/p080431

Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3), 215–251. https://doi.org/10.1037/0033-295X.87.3.215

Fawcett, J. M., Russell, E. J., Peace, K. A., & Christie, J. (2013). Of guns and geese: A meta-analytic review of the weapon focus literature. Psychology, Crime & Law, 19(1), 35–66. https://doi.org/10.1080/1068316X.2011.599325

Festinger, L. (1957). A theory of cognitive dissonance. Stanford University Press.

Fisher, R. P. (1995). Interviewing victims and witnesses of crime. Psychology, Public Policy, and Law, 1(4), 732–764. https://doi.org/10.1037/1076-8971.1.4.732

Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 47(6), 381–391. https://doi.org/10.1037/h0055392

Fitzgerald, R. J., Rubínová, E., & Juncu, S. (2021). Eyewitness identification around the world. In A. M. Smith, M. Toglia, & J. M. Lampinen (Eds.), Methods, measures, and theories in eyewitness identification tasks (pp. 294–316). Routledge. https://doi.org/10.4324/9781003138105-16

Fitzgerald, R. J., Tredoux, C. G., & Juncu, S. (2023). Estimation of eyewitness error rates in fair and biased lineups. Law and Human Behavior, 47(4), 463–483. https://doi.org/10.1037/lhb0000538

Flowe, H. D., Carline, A., & Karoğlu, N. (2018). Testing the reflection assumption: A comparison of eyewitness ecology in the laboratory and criminal cases. The International Journal of Evidence & Proof, 22(3), 239–261. https://doi.org/10.1177/1365712718782996

Forensic Science Regulator. (2025). Forensic science regulator’s code of practice and conduct. https://www.gov.uk/government/publications/forensic-science-regulators-code-of-practice-and-conduct

Frowd, C. D., Bruce, V., Gannon, C., Robinson, M., Tredoux, C., Park, J., Mcintyre, A., & Hancock, P. J. B. (2007). Evolving the face of a criminal: How to search a face space more effectively. 2007 ECSIS Symposium on Bio-Inspired, Learning, and Intelligent Systems for Security (BLISS 2007), 3–10. https://doi.org/10.1109/BLISS.2007.28

Frowd, C. D., Carson, D., Ness, H., McQuiston-Surrett, D., Richardson, J., Baldwin, H., & Hancock, P. J. B. (2005). Contemporary composite techniques: The impact of a forensically-relevant target delay. Legal and Criminological Psychology, 10(1), 63–81. https://doi.org/10.1348/135532504X15358

Gabbert, F., Hope, L., & Fisher, R. P. (2009). Protecting eyewitness evidence: Examining the efficacy of a self-administered interview tool. Law and Human Behavior, 33(4), 298–307. https://doi.org/10.1007/s10979-008-9146-8

Gabbert, F., Memon, A., & Allan, K. (2003). Memory conformity: Can eyewitnesses influence each other’s memories for an event? Applied Cognitive Psychology, 17(5), 533–543. https://doi.org/10.1002/acp.885

Galton, F. (1869). Hereditary genius: An inquiry into its laws and consequences. Macmillan.

Galton, F. (1883). Inquiries into human faculty and its development. Macmillan.

Garrett, B. L. (2011). Convicting the innocent: Where criminal prosecutions go wrong. Harvard University Press.

Garrett, B. L. (2020). Convicting the innocent: Where criminal prosecutions go wrong (2nd ed.). Harvard University Press.

Garry, M., Manning, C. G., Loftus, E. F., & Sherman, S. J. (1996). Imagination inflation: Imagining a childhood event inflates confidence that it occurred. Psychonomic Bulletin & Review, 3(2), 208–214. https://doi.org/10.3758/BF03212420

Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and birds recruits brain areas involved in face recognition. Nature Neuroscience, 3(2), 191–197. https://doi.org/10.1038/72140

Gauthier, I., & Tarr, M. J. (1997). Becoming a “Greeble” expert: Exploring mechanisms for face recognition. Vision Research, 37(12), 1673–1682. https://doi.org/10.1016/S0042-6989(96)00286-6

Gauthier, I., Tarr, M. J., Anderson, A. W., Skudlarski, P., & Gore, J. C. (1999). Activation of the middle fusiform “Face area” increases with expertise in recognizing novel objects. Nature Neuroscience, 2(6), 568–573. https://doi.org/10.1038/9224

Gering, M., Johnson, T., & Tredoux, C. (2023). Non-linear effects of stress on eyewitness memory. South African Journal of Science, 119(3/4), 1–8. https://doi.org/10.17159/sajs.2023/12102

Germine, L. T., Duchaine, B., & Nakayama, K. (2011). Where cognitive development and aging meet: Face learning ability peaks after age 30. Cognition, 118(2), 201–210. https://doi.org/10.1016/j.cognition.2010.11.002

Gibson, J. J. (1979). The ecological approach to visual perception. Houghton Mifflin.

Gilligan, C. (1982). In a different voice: Psychological theory and women’s development. Harvard University Press.

Goldstein, A. G., & Chance, J. E. (1980). Memory for faces and schema theory. Journal of Psychology, 105(1), 47–59. https://doi.org/10.1080/00223980.1980.9915131

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/contents/intro.html

Gould, S. J. (1981). The mismeasure of man. W. W. Norton.

Greathouse, S. M., & Kovera, M. B. (2009). Instruction bias and lineup presentation moderate the effects of administrator knowledge on eyewitness identification. Law and Human Behavior, 33(1), 70–82. https://doi.org/10.1007/s10979-008-9136-x

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. Wiley.

Greenspan, R. L., & Bergold, A. N. (2025). Can AI-generated faces serve as fillers in eyewitness lineups? Memory, 33(4), 416–429. https://doi.org/10.1080/09658211.2025.2467134

Greenspan, R. L., & Loftus, E. F. (2020). Eyewitness confidence malleability: Misinformation as post-identification feedback. Law and Human Behavior, 44(3), 194–208. https://doi.org/10.1037/lhb0000369

Griffin, J. W., Azu, M. A., Cramer-Benjamin, S., Franke, C. J., Herman, N., Iqbal, R., Keifer, C. M., Rosenthal, L. H., & McPartland, J. C. (2023). Investigating the face inversion effect in autism across behavioral and neural measures of face processing: A systematic review and Bayesian meta-analysis. JAMA Psychiatry, 80(10), 1026. https://doi.org/10.1001/jamapsychiatry.2023.2105

Grist, C., & Tredoux, C. G. (2013). Manufacturing foils for police lineups with an artificial face synthesizer. Paper presented at the annual meeting of the American Psychology-Law Society. https://doi.org/10.1037/e571212013-366

Gronlund, S. D., Wixted, J. T., & Mickes, L. (2014). Evaluating eyewitness identification procedures using receiver operating characteristic analysis. Current Directions in Psychological Science, 23(1), 3–10. https://doi.org/10.1177/0963721413498891

Gross, S. R., O’Brien, B., Hu, C., & Kennedy, E. H. (2014). Rate of false conviction of criminal defendants who are sentenced to death. Proceedings of the National Academy of Sciences, 111(20), 7230–7235. https://doi.org/10.1073/pnas.1306417111

Grother, P., Ngan, M., & Hanaoka, K. (2019). Face recognition vendor test part 3: Demographic effects (NISTIR 8280). National Institute of Standards; Technology. https://doi.org/10.6028/NIST.IR.8280

Harding, S. (1986). The science question in feminism. Cornell University Press.

Haw, R. M., & Fisher, R. P. (2004). Effects of administrator-witness contact on eyewitness identification accuracy. Journal of Applied Psychology, 89(6), 1106–1112. https://doi.org/10.1037/0021-9010.89.6.1106

Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–233. https://doi.org/10.1016/S1364-6613(00)01482-0

Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. Wiley.

Henderson, J. M., Williams, C. C., & Falk, R. J. (2005). Eye movements are functional during face learning. Memory & Cognition, 33(1), 98–106. https://doi.org/10.3758/BF03195300

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61–83. https://doi.org/10.1017/S0140525X0999152X

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Horry, R., Hughes, C., Sharma, A., Gabbert, F., & Hope, L. (2021). A meta-analytic review of the Self-Administered interview©: Quantity and accuracy of details reported on initial and subsequent retrieval attempts. Applied Cognitive Psychology, 35(2), 428–444. https://doi.org/10.1002/acp.3753

Horry, R., Wright, D. B., & Tredoux, C. G. (2010). Recognition and context memory for faces from own and other ethnic groups: A remember–know investigation. Memory & Cognition, 38(2), 134–141. https://doi.org/10.3758/MC.38.2.134

Hugenberg, K., Young, S. G., Bernstein, M. J., & Sacco, D. F. (2010). The categorization-individuation model: An integrative account of the other-race recognition deficit. Psychological Review, 117(4), 1168–1187. https://doi.org/10.1037/a0020463

Hutchins, E. (1995). Cognition in the wild. MIT Press.

Ienca, M., & Andorno, R. (2017). Towards new human rights in the age of neuroscience and neurotechnology. Life Sciences, Society and Policy, 13(1), 5. https://doi.org/10.1186/s40504-017-0050-1

Imai, M. (1986). Kaizen: The key to Japan’s competitive success. McGraw-Hill.

Innocence Project. (2025). DNA exonerations in the United States. https://innocenceproject.org/dna-exonerations-in-the-united-states/.

Innocence Project. (2026). Impact. https://innocenceproject.org/exonerations-data/

James, W. (1890). The principles of psychology. Henry Holt.

Jenkins, R., Dowsett, A. J., & Burton, A. M. (2018). How many faces do people know? Proceedings of the Royal Society B, 285(1888), 20181319. https://doi.org/10.1098/rspb.2018.1319

Jenkins, R., White, D., Van Montfort, X., & Burton, A. M. (2011). Variability in photos of the same face. Cognition, 121(3), 313–323. https://doi.org/10.1016/j.cognition.2011.08.001

Johnson, M. H., Dziurawiec, S., Ellis, H. D., & Morton, J. (1991). Newborns’ preferential tracking of face-like stimuli and its subsequent decline. Cognition, 40(1-2), 1–19. https://doi.org/10.1016/0010-0277(91)90045-6

Johnson, M. K., & Raye, C. L. (1981). Reality monitoring. Psychological Review, 88(1), 67–85. https://doi.org/10.1037/0033-295X.88.1.67

Jordan, D. T., Scott, A. J., & Thomson, D. M. (2023). Appearances can be deceiving: How naturalistic changes to target appearance impact on lineup-based decision-making. Psychology, Crime & Law, 31(4), 371–398. https://doi.org/10.1080/1068316x.2023.2243001

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291. https://doi.org/10.2307/1914185

Kalra, N., & Paddock, S. M. (2016). Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability? Transportation Research Part A: Policy and Practice, 94, 182–193. https://doi.org/10.1016/j.tra.2016.09.010

Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17(11), 4302–4311. https://doi.org/10.1523/JNEUROSCI.17-11-04302.1997

Kassin, S. M., Ellsworth, P. C., & Smith, V. L. (1989). The “general acceptance” of psychological research on eyewitness testimony: A survey of the experts. American Psychologist, 44(8), 1089–1098. https://doi.org/10.1037/0003-066X.44.8.1089

Kassin, S. M., Tubb, V. A., Hosch, H. M., & Memon, A. (2001). On the “general acceptance” of eyewitness testimony research: A new survey of the experts. American Psychologist, 56(5), 405–416. https://doi.org/10.1037/0003-066X.56.5.405

Kelly, D. J., Quinn, P. C., Slater, A. M., Lee, K., Gibson, A., Smith, M., Ge, L., & Pascalis, O. (2007). The other-race effect develops during infancy: Evidence of perceptual narrowing. Psychological Science, 18(12), 1084–1089. https://doi.org/10.1111/j.1467-9280.2007.02029.x

Kemp, R., Towell, N., & Pike, G. (1997). When seeing should not be believing: Photographs, credit cards and fraud. Applied Cognitive Psychology, 11(3), 211–222. https://doi.org/10.1002/(SICI)1099-0720(199706)11:3<211::AID-ACP430>3.0.CO;2-O

Kempen, K., & Tredoux, C. G. (2012). “Seeing” is believing: The effect of viewing and constructing a composite on identification performance. South African Journal of Psychology, 42(3), 434–445. https://doi.org/10.1177/008124631204200315

Kendrick, K. M., Costa, A. P. da, Leigh, A. E., Hinton, M. R., & Peirce, J. W. (2001). Sheep don’t forget a face. Nature, 414(6860), 165–166. https://doi.org/10.1038/35102669

Kocab, K., & Sporer, S. L. (2016). The weapon focus effect for person identifications and descriptions: A meta-analysis. In M. K. Miller & B. H. Bornstein (Eds.), Advances in psychology and law (Vol. 1, pp. 71–117). Springer. https://doi.org/10.1007/978-3-319-29406-3_3

Köhnken, G., Milne, R., Memon, A., & Bull, R. (1999). The cognitive interview: A meta-analysis. Psychology, Crime & Law, 5(1–2), 3–27. https://doi.org/10.1080/10683169908414991

Kovera, M. B. (2024). The role of suspect development practices in eyewitness identification accuracy and racial disparities in wrongful conviction. Social Issues and Policy Review, 18(1), 125–147. https://doi.org/10.1111/sipr.12102

Kramer, R. S. S., & Cartledge, C. (2024). Crowds improve human detection of AI-synthesised faces. Applied Cognitive Psychology, 38(5). https://doi.org/10.1002/acp.4245

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.

Kuhn, T. S. (1970). The structure of scientific revolutions (2nd ed.). University of Chicago Press.

Lamb, M. E., Sternberg, K. J., Orbach, Y., Hershkowitz, I., Horowitz, D., & Esplin, P. W. (2002). The effects of intensive training and ongoing supervision on the quality of investigative interviews with alleged sex abuse victims. Applied Developmental Science, 6(3), 114–125. https://doi.org/10.1207/S1532480XADS0603_2

Lamont, A. C., Stewart-Williams, S., & Podd, J. (2005). Face recognition and aging: Effects of target age and memory load. Memory & Cognition, 33(6), 1017–1024. https://doi.org/10.3758/bf03193209

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240. https://doi.org/10.1037/0033-295x.104.2.211

Lave, J. (1988). Cognition in practice: Mind, mathematics and culture in everyday life. Cambridge University Press.

Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human Factors, 46(1), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392

Lee, J. K., & Penrod, S. D. (2022). Three-level meta-analysis of the other-race bias in facial identification. Applied Cognitive Psychology, 36(5), 1106–1130. https://doi.org/10.1002/acp.3997

Lee, J., & Penrod, S. D. (2019). New signal detection theory-based framework for eyewitness performance in lineups. Law and Human Behavior, 43(5), 436–454. https://doi.org/10.1037/lhb0000343

Lee, K., Byatt, G., & Rhodes, G. (2000). Caricature effects, distinctiveness, and identification: Testing the face-space framework. Psychological Science, 11(5), 379–385. https://doi.org/10.1111/1467-9280.00274

Leopold, D. A., O’Toole, A. J., Vetter, T., & Blanz, V. (2001). Prototype-referenced shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4(1), 89–94. https://doi.org/10.1038/82947

Lewandowsky, S., Ecker, U. K. H., Seifert, C. M., Schwarz, N., & Cook, J. (2012). Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3), 106–131. https://doi.org/10.1177/1529100612451018

Lindsay, R. C. L., Mansour, J. K., Beaudry, J. L., Leach, A.-M., & Bertrand, M. I. (2009). Sequential lineup presentation: Patterns and policy. Legal and Criminological Psychology, 14(1), 13–24. https://doi.org/10.1348/135532508X382708

Lindsay, R. C. L., & Wells, G. L. (1985). Improving eyewitness identifications from lineups: Simultaneous versus sequential lineup presentation. Journal of Applied Psychology, 70(3), 556–564. https://doi.org/10.1037/0021-9010.70.3.556

Littlejohn, K. T., Cho, C. J., Liu, J. R., Silva, A. B., Yu, B., Anderson, V. R., Kurtz-Miott, C. M., Brosler, S., Kashyap, A. P., Hallinan, I. P., Shah, A., Tu-Chan, A., Ganguly, K., Moses, D. A., Chang, E. F., & Anumanchipalli, G. K. (2025). A streaming brain-to-voice neuroprosthesis to restore naturalistic communication. Nature Neuroscience, 28(4), 902–912. https://doi.org/10.1038/s41593-025-01905-6

Loftus, E. F. (1979). Eyewitness testimony. Harvard University Press.

Loftus, E. F., Miller, D. G., & Burns, H. J. (1978). Semantic integration of verbal information into a visual memory. Journal of Experimental Psychology: Human Learning and Memory, 4(1), 19–31. https://doi.org/10.1037/0278-7393.4.1.19

Loftus, E. F., & Pickrell, J. E. (1995). The formation of false memories. Psychiatric Annals, 25(12), 720–725. https://doi.org/10.3928/0048-5713-19951201-07

Luria, A. R. (1976). Cognitive development: Its cultural and social foundations. Harvard University Press.

Mackworth, N. H. (1948). The breakdown of vigilance during prolonged visual search. Quarterly Journal of Experimental Psychology, 1(1), 6–21. https://doi.org/10.1080/17470214808416738

Malpass, R. S. (1981). Effective size and defendant bias in eyewitness identification lineups. Law and Human Behavior, 5(4), 299.

Malpass, R. S., & Devine, P. G. (1981). Eyewitness identification: Lineup instructions and the absence of the offender. Journal of Applied Psychology, 66(4), 482–489. https://doi.org/10.1037/0021-9010.66.4.482

Marr, C., Otgaar, H., Sauerland, M., Quaedflieg, C. W. E. M., & Hope, L. (2021). The effects of stress on eyewitness memory: A survey of memory experts and laypeople. Memory & Cognition, 49(3), 401–421. https://doi.org/10.3758/s13421-020-01115-4

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W. H. Freeman.

Martschuk, N., & Sporer, S. L. (2018). Face recognition in old age: A meta-analytic review. Psychology and Aging, 33(6), 904–923. https://doi.org/10.1037/pag0000282

McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A proposal for the Dartmouth summer research project on artificial intelligence. https://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

McCorduck, P. (2004). Machines who think: A personal inquiry into the history and prospects of artificial intelligence (2nd ed.). A K Peters.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133. https://doi.org/10.1007/BF02478259

McKone, E., Crookes, K., Jeffery, L., & Dilks, D. D. (2012). A critical review of the development of face recognition: Experience is less important than previously believed. Cognitive Neuropsychology, 29(1-2), 174–212. https://doi.org/10.1080/02643294.2012.660138

Megreya, A. M., & Bindemann, M. (2018). Feature instructions improve face-matching accuracy. PLOS ONE, 13(3), e0193455. https://doi.org/10.1371/journal.pone.0193455

Megreya, A. M., & Burton, A. M. (2006). Unfamiliar faces are not faces: Evidence from a matching task. Memory & Cognition, 34(4), 865–876. https://doi.org/10.3758/BF03193433

Meissner, C. A., & Brigham, J. C. (2001). Thirty years of investigating the own-race bias in memory for faces: A meta-analytic review. Psychology, Public Policy, and Law, 7(1), 3–35. https://doi.org/10.1037/1076-8971.7.1.3

Meissner, C. A., Brigham, J. C., & Kelley, C. M. (2001). The influence of retrieval processes in verbal overshadowing. Memory & Cognition, 29(1), 176–186. https://doi.org/10.3758/BF03195751

Meissner, C. A., Sporer, S. L., & Susa, K. J. (2008). A theoretical review and meta-analysis of the description-identification relationship in memory for faces. European Journal of Cognitive Psychology, 20(3), 414–455. https://doi.org/10.1080/09541440701728581

Meissner, C. A., Tredoux, C. G., Parker, J. F., & MacLin, O. H. (2005). Eyewitness decisions in simultaneous and sequential lineups: A dual-process signal detection theory analysis. Memory & Cognition, 33(5), 783–792. https://doi.org/10.3758/BF03193074

Memon, A., & Higham, P. A. (1999). A review of the cognitive interview. Psychology, Crime & Law, 5(1-2), 177–196. https://doi.org/10.1080/10683169908415000

Memon, A., Hope, L., Bartlett, J., & Bull, R. (2002). Eyewitness recognition errors: The effects of mugshot viewing and choosing in young and old adults. Memory & Cognition, 30(8), 1219–1227. https://doi.org/10.3758/BF03213404

Memon, A., Meissner, C. A., & Fraser, J. (2010). The cognitive interview: A meta-analytic review and study space analysis of the past 25 years. Psychology, Public Policy, and Law, 16(4), 340–372. https://doi.org/10.1037/a0020518

Menne, N. M., Winter, K., Bell, R., & Buchner, A. (2023). Measuring lineup fairness from eyewitness identification data using a multinomial processing tree model. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-33101-6

Menon, N., White, D., & Kemp, R. I. (2015). Variation in photos of the same face drives improvements in identity verification. Perception, 44(11), 1332–1341. https://doi.org/10.1177/0301006615599902

Mickes, L., Flowe, H. D., & Wixted, J. T. (2012). Receiver operating characteristic analysis of eyewitness memory: Comparing the diagnostic accuracy of simultaneous versus sequential lineups. Journal of Experimental Psychology: Applied, 18(4), 361–376. https://doi.org/10.1037/a0030609

Miller, E. J., Steward, B. A., Witkower, Z., Sutherland, C. A. M., Krumhuber, E. G., & Dawel, A. (2023). AI hyperrealism: Why AI faces are perceived as more real than human ones. Psychological Science, 34(12), 1390–1403. https://doi.org/10.1177/09567976231207095

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. https://doi.org/10.1037/h0043158

Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press.

Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Penguin Random House. https://www.penguinrandomhouse.co.uk/books/294649/artificial-intelligence-by-mitchell-melanie/9780241404843

Mojtahedi, D., Ioannou, M., & Hammond, L. (2018). The dangers of co-witness familiarity: Investigating the effects of co-witness relationships on blame conformity. Journal of Police and Criminal Psychology, 33(4), 316–326. https://doi.org/10.1007/s11896-018-9254-4

Morris, J. P., Pelphrey, K. A., & McCarthy, G. (2007). Controlled scanpath variation alters fusiform face activation. Social Cognitive and Affective Neuroscience, 2(1), 31–38. https://doi.org/10.1093/scan/nsl023

Munsterberg, H. (1908). On the witness stand: Essays on psychology and crime. Doubleday, Page & Company.

Nadel, L., & Moscovitch, M. (1997). Memory consolidation, retrograde amnesia and the hippocampal complex. Current Opinion in Neurobiology, 7(2), 217–227. https://doi.org/10.1016/S0959-4388(97)80010-4

National Registry of Exonerations. (2022). Race and wrongful convictions in the United States. https://www.law.umich.edu/special/exoneration/Documents/Race_and_Wrongful_Convictions.pdf.

National Registry of Exonerations. (2025). Exonerations by year. https://www.law.umich.edu/special/exoneration/.

National Research Council. (2009). Strengthening forensic science in the United States: A path forward. National Academies Press. https://doi.org/10.17226/12589

National Research Council. (2014). Identifying the culprit: Assessing eyewitness identification. National Academies Press. https://doi.org/10.17226/18891

Neisser, U. (1967). Cognitive psychology. Appleton-Century-Crofts.

Neisser, U. (1976). Cognition and reality: Principles and implications of cognitive psychology. W. H. Freeman.

Neisser, U. (1982). Memory observed: Remembering in natural contexts. W. H. Freeman.

Newell, A., & Simon, H. A. (1956). The logic theory machine: A complex information processing system. IRE Transactions on Information Theory, 2(3), 61–79. https://doi.org/10.1109/TIT.1956.1056797

Newell, A., & Simon, H. A. (1972). Human problem solving. Prentice-Hall.

Nightingale, S. J., & Farid, H. (2022). AI-synthesized faces are indistinguishable from real faces and more trustworthy. Proceedings of the National Academy of Sciences, 119(8), e2120481119. https://doi.org/10.1073/pnas.2120481119

Nilsson, N. J. (2010). The quest for artificial intelligence: A history of ideas and achievements. Cambridge University Press.

Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic versus analytic cognition. Psychological Review, 108(2), 291–310. https://doi.org/10.1037/0033-295X.108.2.291

Nobel Prize Outreach. (2024). Press release: The nobel prize in physics 2024. NobelPrize.org. https://www.nobelprize.org/prizes/physics/2024/press-release/

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.

Norman, D. A. (1988). The design of everyday things. Basic Books.

Nortje, A., Tredoux, C. G., & Vredeveldt, A. (2020). Eyewitness identification of multiple perpetrators. South African Journal of Criminal Justice, 33(2), 348–381.

Nyman, T. J., Antfolk, J., Lampinen, J. M., Tuomisto, M., Kaakinen, J. K., Korkman, J., & Santtila, P. (2019). A stab in the dark: The distance threshold of target identification in low light. Cogent Psychology, 6(1), 1632047. https://doi.org/10.1080/23311908.2019.1632047

Nyman, T. J., Korkman, J., Lampinen, J. M., Antfolk, J., & Santtila, P. (2023). The masked villain: The effect of disguise on eyewitness identification accuracy. Psychology, Crime & Law, 31(3), 332–370. https://doi.org/10.1080/1068316X.2023.2242999

O’Toole, A. J., Castillo, C. D., Parde, C. J., Hill, M. Q., & Chellappa, R. (2018). Face space representations in deep convolutional neural networks. Trends in Cognitive Sciences, 22(9), 794–809. https://doi.org/10.1016/j.tics.2018.06.006

O’Toole, A. J., Deffenbacher, K. A., Valentin, D., & Abdi, H. (1994). Structural aspects of face recognition and the other-race effect. Memory & Cognition, 22(2), 208–224. https://doi.org/10.3758/BF03208892

Olaborede, A. O., & Meintjes-van der Walt, L. (2020). The dangers of convictions based on a single piece of forensic evidence. Potchefstroom Electronic Law Journal, 23, 1–38. https://doi.org/10.17159/1727-3781/2020/v23i0a6169

Oorsouw, K. van, Broers, N. J., & Sauerland, M. (2019). Alcohol intoxication impairs eyewitness memory and increases suggestibility: Two field studies. Applied Cognitive Psychology, 33(3), 439–455. https://doi.org/10.1002/acp.3561

Oosterhof, N. N., & Todorov, A. (2008). The functional basis of face evaluation. Proceedings of the National Academy of Sciences, 105(32), 11087–11092. https://doi.org/10.1073/pnas.0805664105

OpenAI. (2018). AI and compute. OpenAI. https://openai.com/index/ai-and-compute/

OpenAI. (2023). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774

Orbach, Y., Hershkowitz, I., Lamb, M. E., Esplin, P. W., & Horowitz, D. (2000). Assessing the value of structured protocols for forensic interviews of alleged child abuse victims. Child Abuse & Neglect, 24(6), 733–752. https://doi.org/10.1016/s0145-2134(00)00137-x

Parsons, T. D. (2015). Virtual reality for enhanced ecological validity and experimental control in the clinical, affective and social neurosciences. Frontiers in Human Neuroscience, 9, 660. https://doi.org/10.3389/fnhum.2015.00660

Pascalis, O., Haan, M. de, & Nelson, C. A. (2002). Is face processing species-specific during the first year of life? Science, 296(5571), 1321–1323. https://doi.org/10.1126/science.1070223

Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex (G. V. Anrep, Trans.). Oxford University Press.

Pennekamp, P. (2025). Verbal and numeric eyewitness confidence differentially affect decision-making. Applied Cognitive Psychology, 39(1). https://doi.org/10.1002/acp.70030

Phillips, P. J., Yates, A. N., Hu, Y., Hahn, C. A., Noyes, E., Jackson, K., Cavazos, J. G., Jeckeln, G., Ranjan, R., Sankaranarayanan, S., Chen, J.-C., Castillo, C. D., Chellappa, R., White, D., & O’Toole, A. J. (2018). Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms. Proceedings of the National Academy of Sciences of the United States of America, 115(24), 6171–6176. https://doi.org/10.1073/pnas.1721355115

Pickel, K. L. (1999). The influence of context on the “weapon focus” effect. Law and Human Behavior, 23(3), 299–311. https://doi.org/10.1023/a:1022356431375

Pickel, K. L., & Sneyd, D. E. (2018). The weapon focus effect is weaker with Black versus White male perpetrators. Memory, 26(1), 29–41. https://doi.org/10.1080/09658211.2017.1317814

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1-2), 73–193. https://doi.org/10.1016/0010-0277(88)90032-7

Pozzulo, J. D., & Lindsay, R. C. L. (1998). Identification accuracy of children versus adults: A meta-analysis. Law and Human Behavior, 22(5), 549–570. https://doi.org/10.1023/A:1025739514042

Pozzulo, J. D., & Lindsay, R. C. L. (1999). Elimination lineups: An improved identification procedure for child eyewitnesses. Journal of Applied Psychology, 84(2), 167–176. https://doi.org/10.1037/0021-9010.84.2.167

Racine, E., Bar-Ilan, O., & Illes, J. (2005). fMRI in the public eye. Nature Reviews Neuroscience, 6(2), 159–164. https://doi.org/10.1038/nrn1609

Read, J. D., Tollestrup, P., Hammersley, R., McFadzen, E., & Christensen, A. (1990). The unconscious transference effect: Are innocent bystanders ever misidentified? Applied Cognitive Psychology, 4(1), 3–31. https://doi.org/10.1002/acp.2350040103

Reason, J. (1990). Human error. Cambridge University Press.

Rhodes, G., Brennan, S., & Carey, S. (1987). Identification and ratings of caricatures: Implications for mental representations of faces. Cognitive Psychology, 19(4), 473–497. https://doi.org/10.1016/0010-0285(87)90016-8

Rhodes, G., & Jeffery, L. (2006). Adaptive norm-based coding of facial identity. Vision Research, 46(18), 2977–2987. https://doi.org/10.1016/j.visres.2006.03.002

Rhodes, G., Locke, V., Ewing, L., & Evangelista, E. (2009). Race coding and the other-race effect in face recognition. Perception, 38(2), 232–241. https://doi.org/10.1068/p6110

Rhodes, G., & Tremewan, T. (1994). Understanding face recognition: Caricauture effects, inversion, and the homogeneity problem. Visual Cognition, 1(2-3), 275–311. https://doi.org/10.1080/13506289408402303

Risinger, D. M. (2007). Innocents convicted: An empirically justified factual wrongful conviction rate. Journal of Criminal Law and Criminology, 97(3), 761–806.

Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in Cognitive Sciences, 20(9), 676–688. https://doi.org/10.1016/j.tics.2016.07.002

Robertson, D. J., Noyes, E., Dowsett, A. J., Jenkins, R., & Burton, A. M. (2016). Face recognition by metropolitan police super-recognisers. PLOS ONE, 11(2), e0150036. https://doi.org/10.1371/journal.pone.0150036

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. https://doi.org/10.1037/h0042519

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. https://doi.org/10.1038/323533a0

Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tense of English verbs. In J. L. McClelland, D. E. Rumelhart, & P. R. Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition, volume 2: Psychological and biological models (pp. 216–271). MIT Press.

Rumelhart, D. E., McClelland, J. L., & Group, P. R. (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition, volume 1: foundations. MIT Press. https://doi.org/10.7551/mitpress/5236.001.0001

Russell, R., Duchaine, B., & Nakayama, K. (2009). Super-recognizers: People with extraordinary face recognition ability. Psychonomic Bulletin & Review, 16(2), 252–257. https://doi.org/10.3758/PBR.16.2.252

Rust, A., & Tredoux, C. (1998). Identification parades: An empirical survey of legal recommendations and police practice in South Africa. South African Journal of Criminal Justice, 11, 196–218.

Sacks, O. (2010). The mind’s eye. Alfred A. Knopf.

Sauerland, M., & Sporer, S. L. (2009). Fast and confident: Postdicting eyewitness identification accuracy in a field study. Journal of Experimental Psychology: Applied, 15(1), 46–62. https://doi.org/10.1037/a0014560

Schooler, J. W., & Engstler-Schooler, T. Y. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive Psychology, 22(1), 36–71. https://doi.org/10.1016/0010-0285(90)90003-M

Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 815–823. https://doi.org/10.1109/CVPR.2015.7298682

Schweinberger, S. R., Pickering, E. C., Jentzsch, I., Burton, A. M., & Kaufmann, J. M. (2002). Event-related brain potential evidence for a response of inferior temporal cortex to familiar face repetitions. Cognitive Brain Research, 14(3), 398–409. https://doi.org/10.1016/S0926-6410(02)00142-8

Science Museum Group. (n.d.). Charles Babbage’s difference engines and the Science Museum. https://www.sciencemuseum.org.uk/objects-and-stories/charles-babbages-difference-engines-and-science-museum

Seale-Carlisle, T. M., Quigley-McBride, A., Teitcher, J. E. F., Crozier, W. E., Dodson, C. S., & Garrett, B. L. (2024). New insights on expert opinion about eyewitness memory research. Perspectives on Psychological Science, 20(5), 903–924. https://doi.org/10.1177/17456916241234837

Simons, D. J., Boot, W. R., Charness, N., Gathercole, S. E., Chabris, C. F., Hambrick, D. Z., & Stine-Morrow, E. A. L. (2016). Do "brain-training" programs work? Psychological Science in the Public Interest, 17(3), 103–186. https://doi.org/10.1177/1529100616661983

Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. D. Appleton-Century Company.

Skinner, B. F. (1971). Beyond freedom and dignity. Alfred A. Knopf.

Smalarz, L., Ireri, H., & Fink, J. A. (2021). Presumed-blind lineup administrators can influence eyewitnesses’ identification decisions and confidence. Psychology, Public Policy, and Law, 27(4), 466–478. https://doi.org/10.1037/law0000317

Sparrow, B., Liu, J., & Wegner, D. M. (2011). Google effects on memory: Cognitive consequences of having information at our fingertips. Science, 333(6043), 776–778. https://doi.org/10.1126/science.1207745

Sporer, S. L. (1992). Post-dicting eyewitness accuracy: Confidence, decision-times and person descriptions of choosers and non-choosers. European Journal of Social Psychology, 22(2), 157–180. https://doi.org/10.1002/ejsp.2420220205

Sporer, S. L. (2001). Recognizing faces of other ethnic groups: An integration of theories. Psychology, Public Policy, and Law, 7(1), 36–97. https://doi.org/10.1037/1076-8971.7.1.36

Sporer, S. L., Kaminski, K. S., Davids, M. C., & McQuiston, D. (2016). The verbal facilitation effect: Re-reading person descriptions as a system variable to improve identification performance. Memory, 24(10), 1329–1344. https://doi.org/10.1080/09658211.2015.1106561

Sporer, S. L., Penrod, S. D., Read, J. D., & Cutler, B. L. (1995). Choosing, confidence, and accuracy: A meta-analysis of the confidence–accuracy relation in eyewitness identification studies. Psychological Bulletin, 118(3), 315–327. https://doi.org/10.1037/0033-2909.118.3.315

Sporer, S. L., Tredoux, C. G., Vredeveldt, A., Kempen, K., & Nortje, A. (2020). Does exposure to facial composites damage eyewitness memory? A comprehensive review. Applied Cognitive Psychology, 34(5), 1166–1179. https://doi.org/10.1002/acp.3705

Steblay, N. K., Dysart, J. E., & Wells, G. L. (2011). Seventy-two tests of the sequential lineup superiority effect: A meta-analysis and policy discussion. Psychology, Public Policy, and Law, 17(1), 99–139. https://doi.org/10.1037/a0021650

Steblay, N. M. (1997). Social influence in eyewitness recall: A meta-analytic review of lineup instruction effects. Law and Human Behavior, 21(3), 283–297. https://doi.org/10.1023/A:1024890732059

Steblay, N. M., Dysart, J., Fulero, S., & Lindsay, R. C. L. (2003). Eyewitness accuracy rates in police showup and lineup presentations: A meta-analytic comparison. Law and Human Behavior, 27(5), 523–540. https://doi.org/10.1023/A:1025438223608

Suchman, L. A. (1987). Plans and situated actions: The problem of human-machine communication. Cambridge University Press.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.

Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition. The Quarterly Journal of Experimental Psychology Section A, 46(2), 225–245. https://doi.org/10.1080/14640749308401045

Terman, L. M. (1916). The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet-Simon intelligence scale. Houghton Mifflin.

Thompson, P. (1980). Margaret Thatcher: A new illusion. Perception, 9(4), 483–484. https://doi.org/10.1068/p090483

Todorov, A., Mandisodza, A. N., Goren, A., & Hall, C. C. (2005). Inferences of competence from faces predict election outcomes. Science, 308(5728), 1623–1626. https://doi.org/10.1126/science.1110589

Todorov, A., Said, C. P., Engell, A. D., & Oosterhof, N. N. (2008). Understanding evaluation of faces on social dimensions. Trends in Cognitive Sciences, 12(12), 455–460. https://doi.org/10.1016/j.tics.2008.10.001

Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189–208. https://doi.org/10.1037/h0061626

Topp-Manriquez, L. D., McQuiston, D., & Malpass, R. S. (2016). Facial composites and the misinformation effect: How composites distort memory. Legal and Criminological Psychology, 21(2), 372–389. https://doi.org/10.1111/lcrp.12054

Tredoux, C. (2002). A direct measure of facial similarity and its relation to human similarity perceptions. Journal of Experimental Psychology: Applied, 8(3), 180–193. https://doi.org/10.1037/1076-898X.8.3.180

Tredoux, C. G. (1998). Applied psychology: Application of psychological knowledge or nominalist error? In J. Mouton & J. Muller (Eds.), Knowledge, method and the public good. HSRC Press.

Tredoux, C. G., & Chiroro, P. (2005). Eyewitness testimony. In C. G. Tredoux, D. Foster, A. Allan, A. Cohen, & D. Wassenaar (Eds.), Psychology and law (pp. 193–225). Juta.

Tredoux, C. G., Fitzgerald, R. J., Allan, A., & Nortje, A. (2024). Identification parades in South Africa: Time for a change? South African Law Journal, 141(1), 84–111. https://doi.org/10.47348/SALJ/v141/i1a5

Tredoux, C. G., Frowd, C., Vredeveldt, A., & Scott, K. (2023). Construction of facial composites from eyewitness memory. In L. Shapiro & P. M. Rea (Eds.), Biomedical visualisation: Volume 13 – the art, philosophy and science of observation and imaging (Vol. 1392, pp. 149–190). Springer. https://doi.org/10.1007/978-3-031-13021-2_8

Tredoux, C. G., Meissner, C. A., Malpass, R. S., & Zimmerman, L. A. (2004). Eyewitness identification. In C. D. Spielberger (Ed.), Encyclopedia of applied psychology (Vol. 1, pp. 875–887). Elsevier Academic Press. https://doi.org/10.1016/B0-12-657410-3/00971-5

Tredoux, C. G., Nunez, D. T., Oxtoby, O., & Prag, B. (2006). An evaluation of ID: An eigenface-based construction system. South African Computer Journal, 37, 90–97.

Tredoux, C. G., & Py, J. (2020). Evidence of identification from eyewitnesses. In R. Bull & I. Blandón-Gitlin (Eds.), The routledge international handbook of legal and investigative psychology (pp. 268–286). Routledge.

Tredoux, C. G., Sporer, S. L., Vredeveldt, A., Kempen, K., & Nortje, A. (2021). Does constructing a facial composite affect eyewitness memory? A research synthesis and meta-analysis. Journal of Experimental Criminology, 17(4), 713–741. https://doi.org/10.1007/s11292-020-09432-z

Tredoux, C. G., Thomas, K. G. F., Malcolm-Smith, S., Schrieff-Brown, L., Njomboro, P., Lipinska, G., & Christ, B. (2023). Applied cognitive science in South Africa. Journal of Applied Research in Memory and Cognition, 12(4), 497–501. https://doi.org/10.1037/mac0000131

Tsao, D. Y., Freiwald, W. A., Tootell, R. B. H., & Livingstone, M. S. (2006). A cortical region consisting entirely of face-selective cells. Science, 311(5761), 670–674. https://doi.org/10.1126/science.1119983

Tuhiwai Smith, L. (2012). Decolonizing methodologies: Research and indigenous peoples (2nd ed.). Zed Books.

Tupper, N., Geisendörfer, A. K., Lorei, C., Sporer, S. L., Tredoux, C. G., & Sauerland, M. (2023). Police trainees versus laypeople: Identification performance and confidence–accuracy relationship for facial and body lineups. Applied Cognitive Psychology, 37(4), 845–860. https://doi.org/10.1002/acp.4085

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433–460. https://doi.org/10.1093/mind/LIX.236.433

Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86. https://doi.org/10.1162/jocn.1991.3.1.71

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124

Valentine, T. (1991). A unified account of the effects of distinctiveness, inversion, and race in face recognition. The Quarterly Journal of Experimental Psychology Section A, 43(2), 161–204. https://doi.org/10.1080/14640749108400966

Valentine, T., Harris, N., Colom Piera, A., & Darling, S. (2003). Are police video identifications fair to African-Caribbean suspects? Applied Cognitive Psychology, 17(4), 459–476. https://doi.org/10.1002/acp.880

VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221. https://doi.org/10.1080/00461520.2011.611369

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv. https://doi.org/10.48550/arXiv.1706.03762

Vredeveldt, A., Charman, S. D., Blanken, A. den, & Hooydonk, M. (2018). Effects of cannabis on eyewitness memory: A field study. Applied Cognitive Psychology, 32(4), 420–428. https://doi.org/10.1002/acp.3414

Vredeveldt, A., Groen, R. N., Ampt, J. E., & Koppen, P. J. van. (2017). When discussion between eyewitnesses helps memory. Legal and Criminological Psychology, 22(2), 242–259. https://doi.org/10.1111/lcrp.12097

Vredeveldt, A., & Koppen, P. J. van. (2018). Recounting a common experience: On the effectiveness of instructing eyewitness pairs. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.00284

Vredeveldt, A., Tredoux, C. G., Nortje, A., Kempen, K., Puljević, C., & Labuschagne, G. N. (2015). A field evaluation of the Eye-Closure Interview with witnesses of serious crimes. Law and Human Behavior, 39(2), 189–197. https://doi.org/10.1037/lhb0000113

Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard University Press.

Wade, K. A., Garry, M., Read, J. D., & Lindsay, D. S. (2002). A picture is worth a thousand lies: Using false photographs to create false childhood memories. Psychonomic Bulletin & Review, 9(3), 597–603. https://doi.org/10.3758/BF03196318

Wagenaar, W. A., & Schrier, J. H. van der. (1996). Face recognition as a function of distance and illumination: A practical tool for use in the courtroom. Psychology, Crime & Law, 2(4), 321–332. https://doi.org/10.1080/10683169608409787

Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20(2), 158–177. https://doi.org/10.1037/h0074428

Weber, N., Brewer, N., Wells, G. L., Semmler, C., & Keast, A. (2004). Eyewitness identification accuracy and response latency: The unruly 10–12-second rule. Journal of Experimental Psychology: Applied, 10(3), 139–147. https://doi.org/10.1037/1076-898X.10.3.139

Webster, M. A., & MacLin, O. H. (1999). Figural aftereffects in the perception of faces. Psychonomic Bulletin & Review, 6(4), 647–653. https://doi.org/10.3758/BF03212974

Wells, G. L. (1978). Applied eyewitness-testimony research: System variables and estimator variables. Journal of Personality and Social Psychology, 36(12), 1546–1557. https://doi.org/10.1037/0022-3514.36.12.1546

Wells, G. L., & Bradfield, A. L. (1998). "Good, you identified the suspect": Feedback to eyewitnesses distorts their reports of the witnessing experience. Journal of Applied Psychology, 83(3), 360–376. https://doi.org/10.1037/0021-9010.83.3.360

Wells, G. L., & Bradfield, A. L. (1999). Distortions in eyewitnesses’ recollections: Can the postidentification-feedback effect be moderated? Psychological Science, 10(2), 138–144. https://doi.org/10.1111/1467-9280.00121

Wells, G. L., Kovera, M. B., Douglass, A. B., Brewer, N., Meissner, C. A., & Wixted, J. T. (2020). Policy and procedure recommendations for the collection and preservation of eyewitness identification evidence. Law and Human Behavior, 44(1), 3–36. https://doi.org/10.1037/lhb0000359

Wells, G. L., Memon, A., & Penrod, S. D. (2006). Eyewitness evidence: Improving its probative value. Psychological Science in the Public Interest, 7(2), 45–75. https://doi.org/10.1111/j.1529-1006.2006.00027.x

White, D., Burton, A. M., Jenkins, R., & Kemp, R. I. (2014). Redesigning photo-ID to improve unfamiliar face matching performance. Journal of Experimental Psychology: Applied, 20(2), 166–173. https://doi.org/10.1037/xap0000009

White, D., Kemp, R. I., Jenkins, R., & Burton, A. M. (2014). Feedback training for facial image comparison. Psychonomic Bulletin & Review, 21(1), 100–106. https://doi.org/10.3758/s13423-013-0475-3

White, D., Kemp, R. I., Jenkins, R., Matheson, M., & Burton, A. M. (2014). Passport officers’ errors in face matching. PLOS ONE, 9(8), e103510. https://doi.org/10.1371/journal.pone.0103510

Willis, J., & Todorov, A. (2006). First impressions: Making up your mind after a 100-ms exposure to a face. Psychological Science, 17(7), 592–598. https://doi.org/10.1111/j.1467-9280.2006.01750.x

Wilson, B. M., Donnelly, K., Christenfeld, N., & Wixted, J. T. (2019). Making sense of sequential lineups: An experimental and theoretical analysis of position effects. Journal of Memory and Language, 104, 108–125. https://doi.org/10.1016/j.jml.2018.10.002

Wittwer, T., Tredoux, C. G., Py, J., & Paubel, P.-V. (2019). Training participants to focus on critical facial features does not decrease own-group bias. Frontiers in Psychology, 10, 2081. https://doi.org/10.3389/fpsyg.2019.02081

Wixted, J. T., Mickes, L., Dunn, J. C., Clark, S. E., & Wells, W. (2016). Estimating the reliability of eyewitness identifications from police lineups. Proceedings of the National Academy of Sciences, 113(2), 304–309. https://doi.org/10.1073/pnas.1516814112

Wixted, J. T., & Wells, G. L. (2017). The relationship between eyewitness confidence and identification accuracy: A new synthesis. Psychological Science in the Public Interest, 18(1), 10–65. https://doi.org/10.1177/1529100616686966

Wolpaw, J. R., Birbaumer, N., McFarland, D. J., Pfurtscheller, G., & Vaughan, T. M. (2002). Brain-computer interfaces for communication and control. Clinical Neurophysiology, 113(6), 767–791. https://doi.org/10.1016/S1388-2457(02)00057-3

Wright, D. B., Boyd, C. E., & Tredoux, C. G. (2003). Inter-racial contact and the own-race bias for face recognition in South Africa and England. Applied Cognitive Psychology, 17(3), 365–373. https://doi.org/10.1002/acp.898

Yerkes, R. M. (Ed.). (1921). Psychological examining in the United States army. Government Printing Office.

Yerkes, R. M., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology, 18, 459–482. https://doi.org/10.1002/cne.920180503

Yin, R. K. (1969). Looking at upside-down faces. Journal of Experimental Psychology, 81(1), 141–145. https://doi.org/10.1037/h0027474

Young, A. W., Hellawell, D., & Hay, D. C. (1987). Configurational information in face perception. Perception, 16(6), 747–759. https://doi.org/10.1068/p160747

Zeigarnik, B. (1927). Das Behalten erledigter und unerledigter Handlungen. Psychologische Forschung, 9, 1–85.

Zuboff, S. (2019). The age of surveillance capitalism: The fight for a human future at the new frontier of power. PublicAffairs.