Research describing the cellular coding of faces in non-human primates often provides the underlying physiological framework for our understanding of face processing in humans. Models of face perception, explanations of perceptual after-effects from viewing particular types of faces, and interpretation of human neuroimaging data rely on monkey neurophysiological data and the assumption that neurophysiological responses of humans are comparable to those recorded in the non-human primate. Here, we review studies that describe cells that preferentially respond to faces, and assess the link between the physiological characteristics of single cells and social perception. Principally, we describe cells recorded from the non-human primate, although a limited number of cells have been recorded in humans, and are included in order to appraise the validity of non-human physiological data for our understanding of human face and social perception.
The behaviour of single neurons is often incorporated into models of social perception and used to explain neuroimaging and psychological data. Many models of perceptual processing of social information in humans feature components that mimic the physiological properties of cells recorded in monkeys (e.g. [1–7]). Interpretation of functional magnetic resonance imaging (fMRI) data needs an understanding of how the measured blood oxygen level-dependent (BOLD) response relates to activity at the single-neuron level [8–10]. Psychophysical adaptation studies investigating social perception in humans make the explicit assumption that there are individual neurons in the brain that are selectively suppressed by prolonged exposure to perceptual stimuli [11–14]. The physiological characteristics of single cells are used to constrain the interpretation of these psychological data. Therefore, precisely defining the characteristics of sensory neurons is important for both the physiology and psychology of social cognition.
Perhaps the most important source of social information is the face. Observing another individual's face can provide information about the individual's identity, gender, age, health, emotions and intentions. This information can be used to form strategies of how to behave, and interact with that individual. In the early 1970s, single neurons were described that selectively responded to faces rather than other tested stimuli . While the characterization of such cells was limited to the brief statement ‘three units, complex coloured patterns (e.g. photographs of faces, trees) were more effective than standard stimuli, but the crucial features of these stimuli were never determined’ [15, p. 103], nonetheless debate started as to whether neurons, such as these, could form the basis of a brain system underlying face processing. Correspondingly, the behaviour of these neurons received considerable interest and studies followed detailing the properties of single cells coding faces in several distinct cortical regions. It was not until 2006 that a causal link between cell responses to faces and face perception was established. Monkeys were first trained to categorize different ambiguous images into face or non-face categories . Cells were recorded from the inferior temporal (IT) cortex during passive fixation, to establish selectivity for faces compared with other non-face stimuli. Face-selective cells were found to be grouped into clusters (as has been noted in a variety of reports: [17–21]), and direct micro-stimulation of these clusters during a subsequent face categorization task biased the monkeys' decisions towards the face category.
Here, we consider the link between single cells coding faces and social perception by assessing early and more recent findings documenting the physiological properties of cells responding selectively to faces. Principally, this information comes from single unit recordings in the non-human primate brain (and typically macaque monkeys: genus Macaca). There are, however, some studies investigating cell coding of faces in humans. Where possible we compare between non-human and human recordings in order to appraise the validity of non-human physiological data for our understanding of human face perception.
2. Face-sensitive cell general response properties
(a) Face selectivity
During typical physiological experiments, it is impossible to be entirely sure that one has found a ‘face-selective cell’. To establish this, it would be necessary to measure and compare the response of the cell to all visual stimuli, a clearly unfeasible approach. Despite these obvious practical difficulties, it has become well understood that there are individual neurons both in the non-human primate and human that respond to faces significantly more than other stimuli (figure 1).
The degree of response selectivity for faces (or ‘tuning’) has been investigated using a range of different sets of stimuli. The trade-off is between using a small stimulus set to isolate cells sensitive to faces quickly in order to investigate other properties of the cells, while knowing very little about the tuning of the cell, or indeed if the cell responds preferentially to faces when compared with other likely visual stimuli. Indeed, some cells will respond to faces, but also to other complex stimuli [22,23]. Alternatively, the time available to record from the cell can be spent determining the degree of selectivity for a face by comparing responses to faces against responses to a large range of non-face stimuli, at the cost of establishing other cell characteristics. By using a rapid serial visual presentation (RSVP) of large numbers (greater than 1000) of stimuli, Földiák et al.  attempted to establish an objective description of the functional characteristics of cells sensitive to faces in a viable timescale. From the sample of cells recorded in superior temporal sulcus (STS), a distribution was found between cells that responded to the images containing faces and cells that responded to the images without faces, confirming the existence of a population of cells in the cortex that respond ‘selectively’ for faces (e.g. figure 2; see also ).
The response of a face-selective cell represents one point within a multi-dimensional stimulus space (although see ). Rather than measuring the narrowness/broadness of cell tuning within that entire space, typically researchers assess cell tuning along a restricted dimension, for example, view of the face (e.g. ), by comparing the significance of responses to the preferred stimulus against other non-preferred stimuli and spontaneous activity using parametric statistics (analysis of variance (ANOVA)). Alternative approaches might use stimulus optimization procedures similar to those used successfully in identifying cell sensitivity in early visual cortex (e.g. ), unfortunately the visual elements contributing to responses to faces are more complex than pixels [30,31]. To date, such approaches for identifying tuning functions of face-sensitive cells in high-dimensional stimulus space have been limited by computational challenges . Nonetheless, design of efficient algorithms in order to optimally adapt experimental design during the collection of neurophysiological data will help maximize information about the overall stimulus tuning function of face-selective cells .
Single-cell recording indicates that there are six patches of cells along the monkey STS (three in the right hemisphere, three in the left hemisphere) that respond to faces. These patches have distinct connections with other brain areas (e.g. the parietal cortex, ). Functional brain imaging shows the patches to be active when faces are viewed and 99 per cent of the cells within the middle patches respond more to images of faces than to images of other objects .
In humans, functional brain imaging also indicates six face processing patches in posterior cortical areas (three per hemisphere); these occur in the fusiform gyrus which lies on the ventral surface of the temporal lobe, an occipital region , and a region within the STS (e.g. ). Of these the right fusiform area is the most consistently located and reliably activated by faces, and activity in the STS is most closely related to gaze signals. High-resolution mapping in humans indicates that the size of the fusiform patch in humans  is roughly equivalent to those described in the monkey STS [17,18]. The homology between the human and monkey face-responsive patches is not yet clear, but the similarities are striking.
(b) Latency and response profile
The time from the onset of a stimulus to the moment that a cell's firing rate raises significantly above its background rate is called the cell response latency. The magnitude of a significant response, however, can vary from a few spikes per second to over 50 Hz. Cell response latencies can be calculated separately for those stimuli to which they respond, although the latency of the response to the most effective stimulus is usually reported. For each stimulus, a spike density function (SDF) is first calculated by summing across trials and smoothing using a Gaussian function. Response latencies can be measured as the first 1 ms time bin, where the SDF exceeds 3 s.d. above the background firing rate (calculated within a time-window prior to stimulus onset, e.g. 100 ms) for a significant period (e.g. 15 ms) after stimulus onset [35–37]. There is some considerable variation in the latency of face-sensitive cell responses. First, different cells recorded in one cortical area can have different response latencies to the same stimulus. For example, large latency ranges have been reported for STS cell responses to the same face stimuli: 80–160 ms  and 56–171 ms .
Second, a single cell's response latency can vary depending upon the nature of the stimulus. Kiani et al.  measured the response latencies of face-sensitive cells in IT to a range of different human, monkey and animal faces. Responses to human faces were significantly earlier (mean latency 103 ms) than responses to animal faces (mean latency 118 ms); response latencies to monkey faces were not significantly different from those to human faces, but significantly earlier than responses to non-primate animal faces.
Image contrast has additional and independent effects on latency with cells responding 220 ms later to faint low-contrast images than to high-contrast versions of the same image . Remarkably, how much a given cell responds and how sluggishly it responds are largely independent: responses to low-contrast stimuli are invariably delayed but not necessarily decreased in magnitude. Many image transformations, such as change in the perspective view of the face, produce no change in latency [35,41] although they diminish cell responses.
Finally, response latencies depend upon where the cell is recorded (see figure 3). Cells sensitive to faces can be recorded from several different cortical areas. It is generally agreed that the initial selectivity for faces occurs in the temporal lobe via a hierarchical process where each stage feeds forward to the next. The first evidence of face selectivity occurs in IT cells. IT (or anterior-ventral TE; TEav) cells, in general, respond with a latency range 50–250 ms, mean ± s.d. 135.2 ± 92.1, median 106 . This estimate includes the subset of cells in IT cortex that respond to faces. In other areas, the latency range for cells that respond selectively to faces does not differ from the latency range of cells selective for other stimulus classes .
The upper bank, lower bank and fundus of the STS receive input from IT and response latencies of cells responsive to faces range from 56 to 171 ms . STS connections with IT cortex are bidirectional and it is difficult to be certain of the sequence of processing between the two areas, particularly as both the STS and IT cortex are composed of a variety of functionally and anatomically distinct areas along the anterior–posterior extent [43–45].
Both IT and STS provide feed-forward input to amygdala. In a comparison between cell response latencies to the same stimuli, STS cells showed earlier responses (90–140 ms) than amygdala cells (110–200 ms, ). Pre-frontal cortex also contains visually responsive cells that are selective for face stimuli; cells here can respond as early as 70 ms, mean latency is 138 ms, although cells have been found with latencies as late as 360 ms .
A few studies have measured the response latencies of face-sensitive cells recorded in humans. Face-selective cells were recorded from the medial temporal lobe (MTL: comprising the hippocampus, amygdala, entorhinal cortex and parahippocampal gyrus) and found to respond approximately 250–350 ms after stimulus onset . Much larger variances in response latencies have been reported for visually responsive cells in the amygdala (95–385 ms), entorhinal cortex (90–328 ms) and hippocampus (107–371 ms); indeed some MTL cells showed latencies as early as 52 ms .
3. Cell-receptive field size
For a long time, it has been assumed that face-sensitive neurons in temporal cortex have very large receptive fields. The earliest findings reported large (greater than 50°) receptive fields that always covered the fovea, which could be contralateral, or ipsilateral to the recorded hemisphere . Furthermore, many of these cells in the STS had receptive fields that extended both ipsilaterally and contralaterally , and were thought to be the first cells in the visual-processing hierarchy that could represent objects in either visual field. In temporal cortex cells selective for faces, the large receptive fields and broad tuning profiles (across low-level image statistics, e.g. size), were considered commensurate with their position at the latter end of the ventral visual-processing stream.
Along with large receptive fields, face-sensitive cells can show a relative tolerance to the position of the face within the receptive field and the size of the face itself, although typically, there is a foveal ‘hot spot’ where faces elicit maximal cell responses . Cell responses decline slowly as faces are presented progressively further away from this central region into the periphery. Tovee et al.  studied the sensitivity of a range of IT and STS cells, which were selective for different face identities, to changes in the size of the face and the position within the receptive field. Receptive fields extended at least 5° into the ipsilateral field, and the greatest cell response was observed when the face was presented at the fovea. These cells could tolerate quite large shifts in the position of the face without a significant decrease in the cell response. Responses to images of ‘large’ faces subtending an angle of 17° showed no significant diminution in response even when fixation was beyond the edge of the face itself. The cell firing rate, which provided information about the stimulus, predominantly coded facial identity rather than face position. Relative position invariance, as well as slow decline in cell responses towards the periphery, seems to underlie and support findings of position invariance for face adaptation in human studies [54,55].
Since the early descriptions of the receptive field profiles of face-sensitive cells, there have been an increasing number of reports of much smaller receptive fields (discussed in Afraz & Cavanagh ). Furthermore, the concept of the face-sensitive cell receptive field as a static filtering device for faces might be questioned. Rolls & Tovee  describe STS cell receptive field position sensitivity changing depending upon the presence of other non-face stimuli in the visual scene. STS cells responses to images of faces away from the fixation point were markedly reduced when a non-effective stimulus was presented at the fovea. Thus, the typical translation invariance observed in these cells  reduces with the presence of other stimuli. Shrinking of the effective receptive field size and weighting of the response to the stimulus present at the fovea allow these cells to effectively represent the face that is being fixated, rather than responding when the face occurs anywhere in the receptive field.
We have found, testing at multiple locations in the visual field, that STS cells selective for faces (and those cells selective for other social stimuli such as hand actions) can have restricted and eccentrically located fields (D.-K. Xiao, N. E. Barraclough & D. I. Perrett 2004, unpublished data). A response field would include the fovea, but the maximally sensitive receptive field position could lie away from the fovea by 3–5°. Considering cells collectively, the fovea would be the most effective single location for responses, but individual cells would have receptive fields centred away from the fovea. This finding is particularly relevant for understanding how face-sensitive cells operate in naturalistic environments. Faces are very rarely experienced in isolation, being only one part of our rich and cluttered social scene. If cells responded to faces almost anywhere within a large scene (showing complete translational invariance across central vision), then a large number of face-sensitive cells could be simultaneously active. This large population could not be used to determine where a face lies exactly in relation to the fixation point. With a population of cells that are selective for both pattern (be it a face or a hand) and position, then it is possible to define from the population the presence of different objects and their locations. Indeed, conjoint tuning for object and location (within moderately large receptive fields 5–10° across) makes it possible to derive the relation of objects to one another, for example, how a hand or face is interacting with another object .
4. Effects of adaptation on face-sensitive cells
In the past decade, there has been considerable use of adaptation, employed both during psychophysical experiments and neuroimaging experiments, as a technique to investigate the brain mechanisms underlying face processing [55,59–63]. In its most basic form, adaptation results from prolonged exposure to a stimulus that causes a selective suppression of the neurons that code that particular stimulus, sparing neurons that code different stimuli. This short period of selective suppression can result in a period of imbalance in activity across the perceptual system causing ‘after-effects’ in which perception is biased.
During the 1970s and 1980s, there was a proliferation of experiments demonstrating adaptation of both neural responses in monkey neurons, and monkey and human perception, after exposure to a range of simple stimuli (e.g. colour, oriented lines and moving dots; for a recent review see ). These experiments suggested that it was possible to investigate the responses of individual neurons coding particular stimuli in humans using psychophysical adaptation techniques. Since that time, it has become clear that there is not a simple link between exposure to a stimulus and the resultant response decrease in a select population of neurons. Adaptation can have differing effects on neurons recorded in different cortical areas [65,66], and more than one mechanism of adaptation might be acting after exposure to a stimulus (e.g. ), resulting in different post-adaptation perceptual effects.
Despite increasing evidence that adaptation has a complex effect on neural mechanisms underlying perception, and the reliance of studies investigating face processing on adaptive techniques, there are few reports of how adaptation changes neuronal coding of faces. Much of the early evidence of adaptation to repeated presentations of complex stimuli comes from studies recording from the IT cortex. Cells in IT respond to a range of different complex stimuli (shapes, pictures or faces) and after an initial presentation of the novel stimulus, a subsequent presentation of that same stimulus can result in a smaller cell response [26,68–71]. In a study of adaptation in IT cells (referred to in the paper as a habituation-like response), responsiveness of cells immediately declined following the initial presentation of the stimulus . Cell sensitivity slowly increased with time following this initial presentation; the effect of adaptation was evident 12 s after stimulus exposure, but not after 20 s. These results indicated that the effect of adaptation on IT cells lasted up to about 12 s, but normal sensitivity resumed afterwards.
The above studies did not explicitly describe faces as part of their stimulus sets. Rolls et al. , however, did test the sensitivity of cells in IT to repeated presentations of face stimuli. Many cells selective for faces showed larger responses to the initial presentation of a novel face stimulus than for a subsequent presentation; a few other face-sensitive cells were observed that showed greater responses to the subsequent presentation of the face . In an investigation of IT cell responses to familiarity, Li et al.  included faces in their stimulus set, although did not distinguish face-sensitive cells from cells preferentially sensitive to other complex stimuli. Again, in this study, responses to repeated presentation of the stimulus declined to a level about 40 per cent of the response to the initial presentation of that stimulus. This decline in sensitivity was observed even when up to 150 intervening stimuli were presented between the first and second presentation of the stimulus . In contrast to Rolls et al. , no cells showed a larger response on subsequent presentations of the same stimulus, although this may be owing to an influence of the delayed match to sample task in which the monkeys were engaged during Li et al.'s  experiment. Li et al.'s study indicates that cellular adaptation resulting from the presentation of a face can last a significant period of time, considerably longer than seen during psychophysical adaptation experiments .
Psychophysical adaptation experiments and functional magnetic resonance-adaptation (fMR-A; [75,76]) studies of face processing make assumptions about the nature of face-sensitive neuronal adaptation. Often the goal of these studies in the human is to determine the stimulus sensitivity of the mechanisms underlying face processing. The explicit assumption here is that the magnitude of a cell's response to different stimuli is directly and proportionally related to the degree of adaptation of the cell's response with increased exposure to that stimulus. By studying the degree of adaptation (in terms of perceptual after-effects or decreases in the BOLD responses), it would therefore be possible to investigate cellular sensitivity indirectly.
Concern for the relative lack of evidence on the relationship between cell tuning to complex stimuli and the stimulus selectivity of adaptation has led Sawamura et al. to investigate this directly . As for many of the earlier studies of responses to repeated stimulation, IT cells were presented with a range of complex images (objects and animals) without specifically studying face-selective cells. Neurons were observed with distinct tuning profiles, where a discrete selection of images would generate responses. The range of stimuli that generated response suppression, however, was more restricted, illustrating tighter tuning profiles. For these IT cells, the response tuning profile and the adaptation tuning profile did not overlap. So, if one was to measure adaptation tuning and then use this information to infer the response selectivity of the neurons, one would underestimate the broadness of the response tuning profile. Psychophysical adaptation and fMR-A  experiments typically provide data that show cross adaptation between some stimuli, but not others. For example, adaptation to stimulus A may not only cause an after-effect or suppression of response to repeated instances of stimulus A but also stimulus B. Adaptation to stimulus A, however, may cause very little or no after-effect in, or response to stimulus C. A conclusion might be drawn that the adapted neurons respond equally to stimuli A and B, and not to stimulus C. Sawamura et al.'s data indicate caution should be taken in such a conclusion as the adaptive technique may be underestimating the true sensitivity of the adapted neurons to stimulus C.
Is this a problem for studies of face processing using adaptation? First, despite the above evidence, it is still not clear how face-sensitive cells adapt to repeated presentation of face stimuli. Different cortical areas contain neurons that adapt differently ; areas that contain significant proportions of neurons that respond selectively to faces  may show different effects of adaptation. Second, Sawamura et al.'s results  pertain to the precise relationship between response and adaptation tuning to stimuli that lie close to each other in stimulus space. Investigators should bear in mind that psychological and fMR-A experiments involving faces and other complex objects may be underestimating the breadth of tuning of neurons responsible.
(a) Short-term adaptation
The effects of stimulus repetition may well involve several different mechanisms operating over different timescales. Repetition of a similar image after a short time interval (less than 1 s) produces a marked reduction of response (to the second presentation). This response short-term suppression is found for all STS cells responsive to faces and other objects . It is likely to be the counterpart to the forward masking that occurs in human perception when the presentation of one image disturbs the perception or detection of similar images presented after a brief interval .
The disruptive effect that one stimulus has on following stimuli is somewhat paradoxical given that experience of visual sequences allows anticipation and faster reactions to upcoming stimuli . To understand this paradox, a comparison was made of cell responses to body images presented individually, in pairs and in action sequences. Responses to one image did suppress responses to similar images for about 500 ms. This suppression led to responses peaking 100 ms earlier to image sequences than to isolated images (e.g. during head rotation, face-selective activity peaks before the face confronts the observer). Thus, forward masking has an unrecognized benefit for perception because it can transform neuronal activity to make it predictive during natural changes in view.
The anticipatory responses that occur in natural sequences of images parallel the speeding up of reaction times to images that are about to occur. Observers detect a face view faster in a film of a head turning to confront a camera than when it is presented in isolation or in a sequence in which the images comprising the film have been randomly reordered . The advantage in reaction times to views anticipated in natural sequences also has a hidden cost: when the natural sequence is violated and the anticipated view does not occur, there is a bias to report (erroneously) its presence. Hence, interactions between successive stimuli allow anticipation which has benefits in speeding up reactions to stimuli that are predictable given the continuation of natural and familiar sequences, but the brain's anticipation can mean observers ‘jump the gun’ and respond to events and stimuli before they actually occur.
(b) Long-term adaptation
As noted above there may be several different mechanisms, operating over different time courses, that contribute to adaptation at the neural and perceptual levels. Most perceptual experiments on adaptation and after-effects document rather short duration effects lasting over seconds or minutes . One type of face (e.g. expanded) is shown repeatedly and subsequently a negative after-effect is induced, whereby normal-shaped faces now appear contracted (e.g. ). Such effects can be contingent on facial characteristics (e.g. age, sex, race, expression, species, view, orientation) and it is possible to adapt different classes of faces in opposite ways such that simultaneous after-effects are induced in opposite directions, for example, after looking at male and female faces transformed in opposite directions (e.g. ). These effects are often interpreted as evidence for separate populations of cells encoding different categories of faces (e.g. [61,81,82]). It is possible that category contingent perceptual after-effects with faces could have long duration, much like contingent after-effects reported for simple visual parameters . Long duration after-effects may be missed because participants return to normal face experience following experiments.
There are reports that perceptual after-effects from viewing faces can be detected after much longer periods of more than 24 h [63,84,85]. These after-effects must depend on neuronal effects of repetition-lasting days, and such effects have been described. For example, in reticular nucleus of the thalamus and in the mamillary bodies , cell responses change when a stimulus becomes familiar after being viewed for a brief period of 1–5 s. The next time the same stimulus is seen the responses are augmented or depressed depending on the particular cell . Several cells studied showed stimulus effects of stimulus familiarity lasting days, and estimates of the duration of the ‘memory’ of individual cells indicate that it can exceed more than 700 intervening stimuli between the first and subsequent presentations . The thalamic cells exhibit familiarity effects with all types of visual stimuli and show limited generalization over image transformations (such as rotation in the image plane). The thalamic cells could be pooling long-term perceptual learning and familiarity effects generated in the visually responsive cells within temporal cortex (longer lasting than those demonstrated so far, ); or they may be an independent memory effect generated in the entorhinal–hippocampal system and passed to the mamillary bodies and thalamus via the fornix. Exactly how these neural mechanisms for long-term ‘familiarity’ interact with sensory encoding of faces and other objects is unclear, but they could well be involved in the phenomenology of many perceptual after-effects where unusual faces (or other objects) come to look ‘normal’ or ‘familiar’ after prolonged viewing.
5. Coding of identity
Accurate recognition of other individuals is critical to successful social functioning. There are clear advantages to being able to dissociate familiar individuals, with whom one may have had a previous positive or negative interaction, from other individuals on whom no information is available. Face structure varies widely, and this information can be used to identify specific individuals. Several reports have described that a significant proportion of cells selective for faces can also distinguish between different facial identities [18,23,28,88–92]. These studies have received considerable interest, as the cells described could be part of a system underlying the recognition of individuals.
In a facial recognition system, the ability to distinguish faces under varying conditions is important. Neurons selective for faces are relatively insensitive to changes in low-level stimulus attributes, like contrast, size and colour [38,93], which may occur under different viewing circumstances. Some neurons in IT and the STS will respond selectively to the faces of specific monkeys, irrespective of other facial characteristics  and irrespective of the viewing conditions with changes in lighting, face orientation, size and distance and in some cases change in perspective view [18,23, 88–90]. Monkeys make different facial expressions under different behavioural circumstances, and these expressions change the facial shape considerably. Despite these changes in facial shape, facial identity can be preserved . Thirty-three per cent of recorded neurons in Hasselmo et al.'s  study showed significant differences recorded from face-selective neurons while systematically varying facial identity and facial expression in their responses to different monkeys, independent of the expression of the face itself. Very few of these neurons in IT showed an interaction between identity and expressions, a very different coding from that observed in amygdala cells (see §7).
The selective coding of individual faces by single neurons provides a basis from which a system of facial recognition can be built. This system is unlikely to consist of a single-neuron coding a specific individual (so-called ‘grandmother’ cell coding), as this brings some considerable practical and computational problems (for discussion see: [28,89,95–97]). A more likely coding system for facial identity is to involve populations of neurons that are, to a lesser or greater degree, selective for individual faces. Young & Yamane  recorded from a large population of anterior IT cells and STS cells while the monkeys performed a facial discrimination task. By applying multi-dimensional scaling (MDS) to the two populations of cells, they quantified the respective population responses, and compared them with measurements of the physical characteristics of the faces and relative familiarity of the faces. The IT population response showed a significant relationship with the physical characteristics of the face, and the STS population response, a significant relationship with the familiarity of the face. Young & Yamane  also suggest that a population of cells coding facial characteristics does not have to be particularly large (‘sparse coding’); only a few tens of cells were necessary to form a precise code sufficient to identify individual face images from their collection of faces. From the available evidence, it seems that to recognize any familiar face across a variety of viewing situations requires a population of cells, but the population does not need to be vast since individual cells respond reliably.
Investigations of MTL cell coding of faces in humans also indicate that identity may be coded by a sparse population of neurons, despite the apparent precise tuning of a few recorded cells. Some cells in MTL show tuning for a single individual, while generalizing across incidental visual properties of the stimulus, and responding to very different pictures of that specific individual . Quiroga et al. , however, argue that the probability of finding neurons that respond only to the individuals to which they are apparently tuned is too great, and indeed point out that many cells have specific tuning to more than one individual. In a probabilistic analysis of the data available from several recordings of selective cells in the human MTL, Waydo et al.  argue that each recorded neuron is likely to respond to 50–150 different images. This analysis indicates a sparse code in which faces and objects are represented by a small subset of neurons rather than either a large population of cells or by individual cells. The selectivity of MTL cells and sparseness of representation are likely to be underestimated because interviews with patients prior to testing allows the experimenters to focus testing on faces, and places that are well known and of particular interest to the patients; hence the probability of finding specific responses is greatly magnified when compared with a random unguided search.
In monkey STS and IT, different parts of the cellular response can carry different information about the face . The early part of the response (peak transmission 117 ms following stimulus onset) appears to distinguish between global categories of stimuli, for example, whether they are monkey faces, human faces or objects. The later part of the cellular response (peak transmission 165 ms following stimulus onset), is sensitive to much finer analysis of the face and can distinguish between different facial expressions and identities within the category . Sugase et al.  argue that the initial response of these face-sensitive neurons is likely to be owing to a feed-forward process distinguishing between broad categories of faces and may provide an initial orienting of the face-processing network. The later response could be guided by feedback from the amygdala containing face-sensitive neurons that distinguish between monkey face identity and expression.
Recently, psychophysical experiments in humans have indicated that faces with ‘average’ identities play a particularly important role as a reference to which other face identities are coded [55,100]. The importance of the average identity face is also apparent in monkey IT neurons coding faces . Leopold et al. [55,74,101] created a range of different face stimuli by taking four individuals' faces and morphing between them to generate a multi-dimensional ‘face-space’. At the centre of this face space is a face with the average of the four identities: the ‘average face’. Cells with tuning centred on this average face were particularly well represented; cells with tuning centred on an identity at the periphery of the face-space were less common. For cells most responsive to identity, their responses incremented as the face configuration was moved from the average along the identity axis.
6. Opponent and population coding
A recent study of cells responsive to faces in the middle temporal patch confirms that cells are tuned to several facial features and to their configuration (as had been indicated in early recordings, e.g. ). The systematic exploration of responses to 16 real and cartoon faces and to their component features revealed that cell tuning to values of individual face features (e.g. eye separation) was ramp shaped (with maximal response to one end of the feature continuum) . Thus, cells were tuned to faces with an aspect ratio making them slender, or to an aspect ratio making them broad; either cells were tuned to long hair or they were tuned to short hair; likewise they were tuned to thick or to thin hair, to widely separated or to closely separated eyes; to large or small eyes; to eyes fixating centrally or eccentrically; to high- or low-feature positioning within the face outline and to slanting or to non-slanting eyebrows.
Psychophysics of perceptual adaptation to faces is often explained in terms of underlying coding of a face-space centred on average facial values. Within this space, it is argued that perception of particular facial values relies on opponent coding [62,103,104]. This mode of coding is akin to that assumed to underlie perception of motion, where movement to the left is contrasted with motion to the right; and motion up contrasted with motion down. Likewise, opponent processing is assumed to underlie colour perception with red being contrasted with green, and blue contrasted with yellow , viewing of downward motion produces a familiar motion after-effect of upward motion (the waterfall illusion) and prolonged viewing of red gives a green after-effect. Adaptation to faces distorted in one way, e.g. features lowered, makes the features of a normal face appear distorted in the opposite way, e.g. raised. The cell tuning for individual facial parameters described by Freiwald et al.  fits the predictions for opponent processing of facial features with linear coding .
Not all dimensions of facial coding are binary or opponent. For example, with respect to head view, there are cells tuned to many different views in the horizontal plane, although there is a preponderance of coding of face, profile and the back of the head views [28,90]. There are further cells tuned to the vertical posture of the head coding head raised towards the sky or head lowered towards the ground. Perceptual adaptation to different gaze directions (to the left, right and directly at the observer) provides evidence that human gaze directions reflect coding by at least three populations rather than relying on a binary opponent-processing mechanism contrasting gaze left and right .
At the cellular level, there is evidence of a hierarchy in the encoding of gaze, head and body direction. Many cells are sensitive to two parameters and some are sensitive to all three parameters. Where conjoint sensitivity is found, the cells give most weight to gaze direction, followed by the head direction [23,90]. Given the interactions evident at the cellular level (particularly those between head and gaze), one would expect psychophysics to provide evidence that gaze, head and body direction interact and perhaps cross-adapt.
It is not clear with ‘norm’ or average-based face coding how many norms there are, or how perceptual judgements are referenced to the appropriate face category average value. Perceptual adaptation effects are indicated for face species, view, race and for age [63,106–108]. For each of these dimensions, it is possible to think that several categories of cells exist depending on an observer's experience. Take age for example. We distinguish readily between infants, toddlers, children, teens, young adults and old adults. It is insufficient to conceive of a space of faces centred on a single average of all faces experienced. Instead, it is more likely that experience of particular face categories builds up cells tuned to the dimensions of this category (e.g. ). It should be possible to define adaptations dependent on such perceptual expertise.
7. Coding emotion and communicative gestures
One's face can convey information about internal states, whether this is involuntary or under conscious control. Involuntary facial movements of an agent can indicate that the agent is in pain, or convey the emotions of the agent, like anger or surprise. Conscious control over the movement of the face allows voluntary communication with other individuals. A sophisticated facial recognition system should be able to distinguish between different facial configurations, and use this information to read another individual's behaviour and intentions.
There is considerable evidence that monkeys, like humans, are able to use the facial gestures of other conspecifics (e.g. [109,110]). Monkey facial expressions and facial communicative gestures, however, are considerably different from human facial gestures. Properties of neurons recorded in the monkey that code facial gestures and expressions need to be interpreted in the light of what is known about monkey behaviour rather than human behaviour. There is not an immediate and obvious translation from the monkey to human model in this case, although in more closely related non-human primates, chimpanzees, it has been suggested that homologous emotional expressions to those observed in humans are found .
Monkeys have unique and clear facial expressions that convey positive and negative emotions. They also use oro-facial gestures during social interactions with other monkeys potentially subserving a communicative role (figure 4). Cells responding selectively to static images, movies or real-life monkey facial gestures can be found in many cortical regions, including the upper bank, lower bank and fundus of the STS [18,94], amygdala [46,112] and area F5 of pre-motor cortex .
STS cells will distinguish between images of faces showing neutral expressions, open-mouthed mild and strong threat gestures, and lip-smacking (an affiliative gesture [94,99]). The cells also distinguish between mouth opening made during emotional expressions and chewing or yawning [26,114]. Cells distinguishing between similar categories of expression are also found within several nuclei within the amygdala .
Coding of facial expressions appears different within different areas. Within areas IT and STS, neurons separately code either facial expression or facial identity; Hasselmo et al.  found that only 7 per cent of neurons expressed an interaction between expression and identity. Furthermore, selectivity for facial expressions in IT and STS neurons is always expressed as an increase in the firing rate of the neuron to a particular expression . The amygdala shows a different coding pattern. Here, cells are often more selective, coding for both facial expression and facial identity [46,112]. Gothard et al.  found 64 per cent of amygdala cells showed coding of unique combinations of identity and expression, underlining the role of amygdala cells in the interpretation of complex social interactions. In contrast to neurons recorded in more posterior areas of the temporal lobe, amygdala neuronal selectivity is not always exemplified as an increase in firing rate; many cells will also show a selective decrease in firing rate to a particular expression .
Information about the proportion of cells coding a particular expression in one area will of course be subject to the constraints of the search set and methodology used. In the amygdala, there does not appear to be a particular predominance of cells coding one expression over another, but Gothard et al.  found that there are some differences in the nature of the response. Cells selective for threat expressions typically increased their response compared with responses to other non-threat expressions. Cells selective for lip-smacking typically decreased their response compared with responses to other facial gestures. Expression-selective neurons that responded preferentially to threatening faces also tended to have higher firing rates than neurons selective for other expressions. This difference in population response was found only during the period 120–250 ms following stimulus onset.
In contrast to temporal cortex coding of facial expression, neurons in pre-motor cortex code facial expressions in a different fashion. Ferrari et al.  analysed the visual responses of motor neurons responding to movements of the face. These neurons express ‘mirror’ properties, responding to both execution and observation of actions, and show many similar properties to the hand action ‘mirror neurons’ often cited [115,116]. Ferrari et al.'s neurons demonstrated motor properties, responding during the execution of mouth movements alone (e.g. pursing of the lips) or during both mouth and hand actions (e.g. bringing food to the mouth to eat). The neurons often responded selectively to the sight of faces, most (85%) responded to the sight of ingestive mouth actions, but 15 per cent responded to the sight of communicative facial gestures (e.g. lip-smacking). The range of different facial stimuli to which these neurons responded was quite large, including grasping with the mouth, chewing, sucking, lip-smacking, lip protrusion, tongue protrusion and teeth-chattering. All the neurons responded during the execution of ingestive acts. The clear visual selectivity indicates a potentially significant role in the understanding of other agents. The mismatch between the visual selectivity and motor selectivity in those neurons that respond to communicative facial gestures leads to some problems in the interpretation of their role. Ferrari et al.  argue that this link between communicative gestures and ingestion in single neurons may result from similar motor acts that occur during positive social interactions like grooming and eating.
8. Face-sensitive cell response modulation by other modalities
Facial movements are often accompanied by sounds. For example, an individual expressing surprise will not only move their face in a distinctive fashion, but this may also be accompanied by an audible intake of air. Furthermore, speech occurs with specific facial movements that, when observed, help disambiguate the auditory signal . The common pairing of specific facial movements with particular sounds provides ‘matched’ audiovisual information that is available to the brain in order to decode a particular facial gesture.
There is evidence that neurons in the STS are sensitive to audiovisual oro-facial stimuli and respond in a way that allows them to use both auditory and visual signals to distinguish specific facial communicative gestures from other possibilities . Barraclough et al.  recorded from single neurons in the upper bank, lower bank and fundus of the STS while searching for neurons that responded to different audiovisual facial gestures performed by monkeys and human (as well as hand and body actions). We found that 23 per cent of cells recorded had their visual response significantly modulated by the concurrent presentation of the ‘matching’ sound: in half of those cells, the response increased with the addition of the sound; in the other half responses decreased with the sound. In those cells that responded with an increase in the visual responses with the addition of an auditory signal, this was dependent upon sound-matching the vision. Non-matching sounds had little effect on the visual response (figure 5 illustrates the responses of such a cell). Neurons with such specific audiovisual characteristics could form the basis of a multimodal representation of social stimuli  and underlie the ability to recognize facial gestures including communication signals (cf. ).
In contrast, for cells that showed a decrease in the visual responses with the addition of an auditory signal, this was not dependent upon the vision and sound matching; in fact, any auditory signal appeared to reduce the visual response. What role these neurons have in multimodal perception is not clear but they appear prevalent in other areas of cortex that show multimodal integration. For example, neurons that respond selectively to monkey vocalizations in lateral belt auditory cortex can show response augmentation or attenuation with the addition of visual signals (e.g. ).
9. Concluding remarks
In this paper, we have reviewed cell properties in relation to face perception. There has been a rapid development of adaptation experiments on face perception and of functional brain imaging studies using adaptation effects. The interplay between these experiments and the physiological studies of cells responsive to faces has been increasingly convergent, although there are limitations between the correspondences. One domain that will need to be clarified concerns the time-courses of adaptation and the duration of after-effects. It is most likely that there are multiple cellular mechanisms at play. It will be important to differentiate separate adaptation effects that are dependent on different levels of visual processing and to differentiate these from adaptation effects that are dependent on brain systems involved in familiarity and memory. Yet, other effects of long-term experience (e.g. [63,107]) may produce shifts in criteria, so that particular nuances in expression, age or mix of ethnicity come to be regarded as typical. It is not unreasonable to speculate that long-term experience of mood disorders (either those affected themselves or close family thereof) may change the level of facial expressions that come to be regarded as normal, which could have consequences for social interactions that are clinically significant.
That cells responsive to faces are sensitive to multiple modalities imply that there should be psychological interactions that depend on the congruency of visual and auditory cues. While some such interactions have been demonstrated (e.g. ), future psychophysics, perceptual and fMRI adaptation studies could well tap into these strong cross-modal links evident at the cellular level. High-level cross-adaptations between faces and other conceptually linked visual stimuli should be expected (e.g. between hand and face, between face and body, and between face and gaze).
One contribution of 10 to a Theme Issue ‘Face perception: social, neuropsychological and comparative perspectives’.
- This journal is © 2011 The Royal Society