The fusiform face area: a cortical region specialized for the perception of faces

Nancy Kanwisher, Galit Yovel

Abstract

Faces are among the most important visual stimuli we perceive, informing us not only about a person's identity, but also about their mood, sex, age and direction of gaze. The ability to extract this information within a fraction of a second of viewing a face is important for normal social interactions and has probably played a critical role in the survival of our primate ancestors. Considerable evidence from behavioural, neuropsychological and neurophysiological investigations supports the hypothesis that humans have specialized cognitive and neural mechanisms dedicated to the perception of faces (the face-specificity hypothesis). Here, we review the literature on a region of the human brain that appears to play a key role in face perception, known as the fusiform face area (FFA).

Section 1 outlines the theoretical background for much of this work. The face-specificity hypothesis falls squarely on one side of a longstanding debate in the fields of cognitive science and cognitive neuroscience concerning the extent to which the mind/brain is composed of: (i) special-purpose (‘domain-specific’) mechanisms, each dedicated to processing a specific kind of information (e.g. faces, according to the face-specificity hypothesis), versus (ii) general-purpose (‘domain-general’) mechanisms, each capable of operating on any kind of information. Face perception has long served both as one of the prime candidates of a domain-specific process and as a key target for attack by proponents of domain-general theories of brain and mind. Section 2 briefly reviews the prior literature on face perception from behaviour and neurophysiology. This work supports the face-specificity hypothesis and argues against its domain-general alternatives (the individuation hypothesis, the expertise hypothesis and others).

Section 3 outlines the more recent evidence on this debate from brain imaging, focusing particularly on the FFA. We review the evidence that the FFA is selectively engaged in face perception, by addressing (and rebutting) five of the most widely discussed alternatives to this hypothesis. In §4, we consider recent findings that are beginning to provide clues into the computations conducted in the FFA and the nature of the representations the FFA extracts from faces. We argue that the FFA is engaged both in detecting faces and in extracting the necessary perceptual information to recognize them, and that the properties of the FFA mirror previously identified behavioural signatures of face-specific processing (e.g. the face-inversion effect).

Section 5 asks how the computations and representations in the FFA differ from those occurring in other nearby regions of cortex that respond strongly to faces and objects. The evidence indicates clear functional dissociations between these regions, demonstrating that the FFA shows not only functional specificity but also area specificity. We end by speculating in §6 on some of the broader questions raised by current research on the FFA, including the developmental origins of this region and the question of whether faces are unique versus whether similarly specialized mechanisms also exist for other domains of high-level perception and cognition.

Keywords:

1. Face perception: domain-specific versus domain-general hypotheses

One of the longest running debates in the history of neuroscience concerns the degree to which specific high-level cognitive functions are implemented in discrete regions of the brain specialized for just that function. The current consensus view was recently synopsized in a respected textbook of neuroimaging as follows: ‘unlike the phrenologists, who believed that very complex traits were associated with discrete brain regions, modern researchers recognize that … a single brain region may participate in more than one function’ (Huettel et al. 2004). Despite this currently popular view that complex cognitive functions are conducted in distributed and overlapping neural networks, substantial evidence supports the hypothesis that at least one complex cognitive function—face perception—is implemented in its own specialized cortical network that is not shared with many if any other cognitive functions. Here, we review the evidence for this hypothesis, focusing particularly on functional magnetic resonance imaging (fMRI) investigations of a region of human extrastriate cortex called the fusiform face area (FFA; Kanwisher et al. 1997).

Face perception has long served as a parade case of functional specificity, i.e. as a process that is implemented in specialized cognitive and neural mechanisms dedicated to face perception per se. This ‘face-specificity hypothesis’ has a certain intuitive appeal, given the enormous importance of face perception in our daily lives (and in the lives of our primate ancestors), the unique computational challenges posed by the task of face recognition and the processing advantages that can result from the use of dedicated neural hardware specialized for a specific task. Yet the face-specificity hypothesis has remained controversial, and many researchers have favoured alternative ‘domain-general’ hypotheses which argue that the mechanisms engaged by faces are not specific for a particular stimulus class (i.e. faces), but for a particular process that may run on multiple stimulus classes.

For example, according to the individuation hypothesis, putative face-specific mechanisms can be engaged whenever fine-grained discriminations must be made between exemplars within a category (Gauthier et al. 1999a, 2000b). The idea here is that when we look at faces, we do not merely decide that the stimulus is a face, but we also automatically identify which face it is, whereas with cars or tables or mugs we may often extract only the general category of each stimulus (car versus table) without identifying the specific individual (which car). Thus, according to the individuation hypothesis, faces automatically recruit a domain-general mechanism for individuating exemplars within a category, which can be recruited in a task-dependent fashion by non-faces.

According to the expertise hypothesis (which is a special case of the individuation hypothesis), putative face-specific mechanisms are specialized not for processing faces per se, but rather for distinguishing between exemplars of a category that share the same basic configuration and for which the subject has gained substantial expertise. The idea here is that we are all experts at recognizing faces, and if we had similar expertise discriminating exemplars of a non-face category, then the same processing mechanisms would be engaged. This idea originates from a seminal study by Diamond & Carey (1986) who reported that people with many years of experience judging dogs (‘dog experts’) exhibit behavioural signatures of face-like processing when perceiving dogs, as well as from more recent studies in which it has been claimed that just 10 h of laboratory training on novel stimuli can lead to ‘face-like’ processing of those stimuli (Gauthier et al. 1998; Tarr & Gauthier 2000).

We argue here that substantial evidence favours the face-specificity hypothesis over these and other domain-general alternatives. Before reviewing the relevant literature on the FFA, we briefly synopsize the evidence for the face-specificity hypothesis from other methods.

2. Specialized mechanisms for face perception: evidence from neuropsychology, behaviour and electrophysiology

Evidence from neuropsychology, behaviour and electrophysiology has long been marshalled in the debate over the nature of face-processing mechanisms.

(a) Evidence from neuropsychology: prosopagnosia and agnosia

The first evidence that face perception engages specialized machinery distinct from that engaged during object perception came from the syndrome of acquired prosopagnosia, in which neurological patients lose the ability to recognize faces after brain damage. Prosopagnosia is not a general loss of the concept of the person, because prosopagnosic subjects can easily identify individuals on the basis of their voice or a verbal description of the person. Impairments in face recognition are often accompanied by deficits in other related tasks such as object recognition, as expected, given the usually large size of lesions relative to functional subdivisions of the cortex. However, a few prosopagnosic patients have been described who show very selective impairments in which face-recognition abilities are devastated despite the lack of discernible deficits in the recognition of non-face objects (Wada & Yamamoto 2001). Some prosopagnosic subjects have preserved abilities to discriminate between exemplars within a category (McNeil & Warrington 1993; Henke et al. 1998; Duchaine et al. 2006), arguing against the individuation hypothesis. Normal acquisition of expertise for novel stimuli (‘Greebles’) was found in an individual with ‘developmental prosopagnosia’ (Duchaine et al. 2004), a lifelong impairment in face recognition (Behrmann & Avidan 2005) with no apparent neurological lesion (see §6a). A recent report tested each of the domain-general hypotheses that have been discussed in the literature in a highly selective case of developmental prosopagnosia. Findings from six experiments ruled out each of the domain-general hypotheses in favour of the face-selective hypothesis (Duchaine et al. 2006). Taken together, studies of prosopagnosic individuals support the face-specificity hypothesis.

Is face recognition just the most difficult visual recognition task we perform, and hence the most susceptible to brain damage? Apparently not: the striking case of patient CK (Moscovitch et al. 1997; see also McMullen et al. 2000) showed severe deficits in object recognition, but normal face recognition, indicating a double dissociation between the recognition of faces and objects. Further, patient CK, who had been a collector of toy soldiers, lost the ability to discriminate these stimuli, showing a further dissociation between face recognition (preserved) and visual expertise (impaired). Thus, taken together, these selective cases of prosopagnosia and agnosia support the face-specificity hypothesis and are inconsistent with its domain-general alternatives.

(b) Behavioural signatures of face-specific processing

Classic behavioural work in normal subjects has also shown dissociations between the recognition of faces and objects by demonstrating a number of differences in the ways that faces and objects are processed. Best known among these signatures of face-specific processing is the face-inversion effect, in which the decrement in performance that occurs when stimuli are inverted (i.e. turned upside-down) is greater for faces than for non-face stimuli (Yin 1969). Other behavioural markers include the ‘part–whole’ effect (Tanaka & Farah 1993), in which subjects are better able to distinguish which of two face parts (e.g. two noses) appeared in a previously shown face when they are tested in the context of the whole face than when they are tested in isolation, and the ‘composite effect’ (Young et al. 1987), in which subjects are slower to identify one-half of a chimeric face, if it is aligned with an inconsistent other half-face than if the two half-faces are misaligned. Consistent with the holistic hypothesis, Yovel et al. (2005a) have found that the probability of correctly identifying a whole face is greater than the sum of the probabilities of matching each of its component face halves. Taken together, these effects suggest that upright faces are processed in a distinctive ‘holistic’ manner (McKone et al. 2001; Tanaka & Farah 2003), i.e. that faces are processed as wholes rather than processing each of the parts of the face independently. All the holistic effects mentioned above are either absent or reduced for inverted faces and non-face objects (Tanaka & Farah 1993; Robbins 2005), indicating that this holistic style of processing is specific to upright faces.

According to the expertise hypothesis, it is our extensive experience with faces that leads us to process them in this distinctive holistic and orientation-sensitive fashion. The original impetus for this hypothesis came from Diamond & Carey's (1986) classic report that dog experts show inversion effects for dog stimuli. However, there have been no published replications of this result since it was published 30 years ago, and one careful and extensive recent effort completely failed to replicate the original result (Robbins 2005). Another recent study also failed to find a significant inversion effect for objects of expertise (fingerprints in fingerprint experts), although this study argues for holistic processing of these stimuli by experts based on superadditive contributions to performance accuracy from the two halves of the stimulus (Busey & Vanderkolk 2005). Other studies have investigated much shorter term cases of visual expertise, claiming that a mere 10 h of laboratory training can produce ‘face-like’ processing of non-face stimuli (Gauthier et al. 1998). However, an examination of the actual data in those studies in fact reveals little or no evidence for disproportionate inversion effects, part–whole effects or composite effects for laboratory-trained stimuli (McKone & Kanwisher 2005; McKone et al. in press). Even 10 h of training on inverted faces does not lead to holistic processing of inverted faces (Robbins & McKone 2003). Thus, despite widespread claims to the contrary, behavioural data from normal subjects do not support the expertise hypothesis. Instead, behavioural signatures of configural/holistic processing are either reduced (as in the inversion effect and the part–whole effect) or absent (in the composite effect) for non-face stimuli, including objects of expertise. These findings support the face-specificity hypothesis and argue against each of its domain-general alternatives.

(c) Electrophysiology in humans

Face-selective electrophysiological responses occurring 170 ms after stimulus onset have also been measured in humans using scalp electrodes (Bentin et al. 1996; Jeffreys 1996). Although it has been claimed that this face-selective N170 response is sensitive to visual expertise with non-face stimuli (Tanaka & Curran 2001; Rossion et al. 2002; Gauthier et al. 2003), no study has demonstrated the basic result that would support this finding: an event-related potential (ERP) response that is higher both for faces than non-faces (thus demonstrating face selectivity) and objects of expertise than control objects (thus demonstrating a role for expertise; McKone & Kanwisher 2005). Showing the selectivity of the N170 for faces in each experiment is important because the N170 is not face selective at all electrode locations (and not even necessarily at the canonical face-selective locations of T5 and T6), so this face selectivity must be demonstrated in each study. One study did show a delay of the N170 for inverted compared with upright fingerprints in fingerprint experts, resembling the similar delay seen in the N170 to inverted versus upright faces (Busey & Vanderkolk 2005). However, in the same study, the behavioural inversion effect for these stimuli was not significant, and as the authors of this study note, the delay of the N170 for inverted stimuli has been found for cars (in non-experts; Rossion et al. 2003b), and it is therefore not a specific marker of face-like processing. Finally, a magnetoencephalography (MEG) study investigating the similarly face-selective magnetic ‘M170’ response (Halgren et al. 2000; Liu et al. 2002) found no elevated response to cars in car experts and no trial-by-trial correlation between the amplitude of the M170 response and successful identification of cars by car experts (Xu et al. 2005). Thus, the N170 and M170 appear to be truly face selective and at least the M170 response is not consistent with any of the domain-general hypotheses discussed above.

Although the spatial resolution of ERP and MEG are limited, subdural ERP measurements in epilepsy patients have shown strongly face-selective responses in discrete patches of the temporal lobe (Allison et al. 1994, 1999). A powerful demonstration of the causal role of these regions in face perception comes from two studies demonstrating that electrical stimulation of these ventral temporal sites can produce a transient inability to identify faces (Puce et al. 1999; Mundel et al. 2003).

(d) Neurophysiology and fMRI in monkeys

Data from monkeys show stunning face specificity at both the single-cell level and the level of cortical regions. Numerous studies dating back decades have reported face-selective responses from single neurons (‘face cells’) in the temporal lobes of macaques (Desimone et al. 1984; Tsao et al. 2003). More recently, face-selective regions have been reported in macaques using fMRI (Tsao et al. 2003; Pinsk et al. 2005) and in vervets using a novel dual-activity mapping technique based on induction of the immediate early gene zif268 (Zangenehpour & Chaudhuri 2005).

Strong claims of face selectivity entail the prediction that no non-face stimulus will ever produce a response as strong as a face; since the set of non-face stimuli is infinite, there is always some possibility that a future study will show that a putative face-selective cell or region actually responds more to some previously untested stimulus (say, armadillos) than to faces. However, recent advances in neurophysiology have addressed this problem about as well as can practically be hoped for. Foldiak et al. (2004) used rapid serial visual presentation to test each cell on over 1000 natural images and found some cells that were truly face selective: for some cells, the 70 stimuli producing the strongest responses all contained faces, and the next ‘best’ stimuli produced less than one-fifth the maximal response.

Although these data demonstrate individual cells that are strikingly face selective, they do not address the face selectivity of whole regions of cortex. However, a new study demonstrates a spectacular degree of selectivity of whole regions of cortex: Tsao et al. (2003) directed electrodes into the face-selective patches they had previously identified with fMRI and found that 97% of the visually responsive cells in this region responded selectively (indeed, for most cells, exclusively) to faces (figure 1). These stunning data suggest that the weak responses of the FFA to non-face stimuli may result from ‘partial voluming’, i.e. from the inevitable blurring of face-selective and non-face-selective regions that arise when voxel sizes are large relative to the size of the underlying functional unit. Thus, these data suggest an answer to the question of whether ‘non-preferred’ responses carry discriminative information about non-preferred stimuli (Haxby et al. 2001; see §3f): at least in face-selective regions in macaques, non-preferred responses cannot carry much information because these responses are close to zero.

Figure 1

Tsao et al. recorded the response of single cells within an fMRI identified face-selective patch of cortex. The figure shows the average response across all 320 visually responsive neurons in the face-selective patches of two monkeys, to 96 different stimulus images, indicating very high selectivity for faces by the cells in this patch.

(i) Section summary

Taken together, these lines of research make a compelling case for the existence of specialized cognitive and neural machinery for face perception per se (the face-specificity hypothesis), and argue against the individuation and expertise hypotheses. First, neuropsychological double dissociations exist between face recognition and visual expertise for non-face stimuli, casting doubt on the claim that these two phenomena share processing mechanisms. Second, behavioural data from normal subjects show a number of ‘signatures’ of holistic face processing that are not observed for other stimulus classes, such as inverted faces and objects of expertise. Third, electrophysiological measurements indicate face-specific processing at or before 200 ms after stimulus onset (N170). Fourth, fMRI and physiological investigations in monkeys show strikingly selective (and often exclusive) responses to faces both within individual neurons and more recently also within cortical regions. Against this backdrop, one might have expected that fMRI studies demonstrating face-selective responses in the human temporal lobe (Kanwisher et al. 1997; McCarthy et al. 1997b) would be considered relatively uncontroversial. As we see next, this expectation would have been wrong (Gauthier et al. 2000a; Haxby et al. 2001).

3. Evidence from fMRI: functional specificity of the FFA

In the early 1990s, PET studies demonstrated activation of the ventral visual pathway, especially the fusiform gyrus, in a variety of face perception tasks (Haxby et al. 1991; Sergent et al. 1992). fMRI studies of the specificity of these cortical regions for faces per se began in the mid-1990s, with demonstrations of fusiform regions that responded more strongly to faces than to letter strings and textures (Puce et al. 1996), flowers (McCarthy et al. 1997a), and other stimuli, including mixed everyday objects, houses, and hands (Kanwisher et al. 1997). Although face-specific fMRI activations could also be seen in many subjects in the region of the superior temporal sulcus (fSTS) and in the occipital lobe in a region named the ‘occipital face area’ (OFA), the most consistent and robust face-selective activation was located on the lateral side of the mid-fusiform gyrus in a region we named the ‘fusiform face area’ or FFA (Kanwisher et al. 1997; figure 2). With the methods currently used in our laboratory, we can functionally identify this region in almost every normal subject in a short ‘localizer’ fMRI scan contrasting the response to faces versus objects. In the ‘functional region of interest’ (fROI) approach, the FFA is first functionally localized in each individual, then its response magnitude is measured in a new set of experimental conditions. This method enables the FFA to be studied directly despite its anatomical variability across subjects, in a statistically powerful yet unbiased fashion (Saxe et al. 2006). Since the FFA is the most robust of the three face-selective regions (Kanwisher et al. 1997; Yovel & Kanwisher 2004), it has been investigated most completely and will be the focus of this review, although later in §5b we contrast the functional properties of the FFA with those of the other two face-selective regions. Here, we review evidence bearing on five of the most widely proposed alternatives to the face-specificity hypothesis for the FFA.

Figure 2

Face-selective activation (faces > objects, p<0.0001) on an inflated brain of one subject, shown from lateral and ventral views of the right and left hemispheres. Three face-selective regions are typically found: the FFA in the fusiform gyrus along the ventral part of the brain, the OFA in the lateral occipital area and the fSTS in the posterior region of the superior temporal sulcus.

(a) Is the FFA selective for simple visual features?

Three lines of evidence indicate that the FFA responds specifically to faces, and not to lower level stimulus features usually present in faces (such as a pair of horizontally arranged dark regions). First, the FFA responds strongly and similarly to a wide variety of face stimuli that would appear to have few low-level features in common, including front and profile photographs of faces (Tong et al. 2000), line drawings of faces (Spiridon & Kanwisher 2002), cat faces (Tong et al. 2000) and two-tone stylized ‘Mooney faces’. Second, the FFA response to upright Mooney faces is almost twice as strong as the response to inverted Mooney stimuli in which the face is difficult to detect (Kanwisher et al. 1998; Rhodes et al. 2004), even though most low-level features (such as spatial frequency composition) are identical in the two stimulus types. Finally, for bistable stimuli such as the illusory face–vase (Hasson et al. 2001; Andrews et al. 2002), or for binocularly rivalrous stimuli in which a face is presented to one eye and a non-face is presented to the other eye (Tong et al. 1998; Pasley et al. 2004; Williams et al. 2004), the FFA responds more strongly when subjects perceive a face than when they do not see a face even though the retinal stimulation is unchanged. For all these reasons, it is difficult to account for the selectivity of the FFA in terms of lower level features that covary with faceness. Nonetheless, the face-specificity hypothesis of the FFA has been challenged with a number of other alternatives that we discuss next (Gauthier et al. 2000a; Haxby et al. 2001).

(b) The individuation hypothesis applied to the FFA

Is the FFA engaged not simply during face perception, but whenever subjects must discriminate between similar exemplars within a category (Gauthier et al. 1999a)? Early evidence against this hypothesis was presented in our first paper on the FFA (Kanwisher et al. 1997), in which the FFA responded much less strongly when subjects performed a 1-back (consecutive matching) task on blocks of house stimuli or hand stimuli (see also McCarthy et al. 1997b). Although this task was not matched for difficulty, a more recent experiment from our laboratory carefully adjusted the difficulty of within-category discrimination for faces and houses (see figure 3) and still found about three times the FFA response during face discrimination as house discrimination (Yovel & Kanwisher 2004). Thus, the FFA does not simply respond strongly whenever subjects make a difficult discrimination between exemplars of any category. The individuation hypothesis is thus not a viable account of the operations conducted in FFA.

Figure 3

Face and house stimuli designed to test the face-specificity hypothesis, from a study by Yovel & Kanwisher (Kanwisher et al. 1997; Yovel & Kanwisher 2004). House stimuli were constructed in exactly the same way as the face stimuli: the faces or houses differed in either their parts (eyes and mouth for faces, and windows and door for houses) or the spacing among these parts. Subjects performed a discrimination task on pairs of faces or houses that differed in either spacing or parts. Performance was matched across the stimuli and the spacing and part conditions. Thus, discrimination of the faces and of the houses are very similar in overall difficulty and in the nature of the perceptual discriminations required. Thus, the threefold higher FFA response for the face tasks than the house tasks (Kanwisher et al. 1997; Yovel & Kanwisher 2004) provides strong support to the face-specificity hypothesis and is inconsistent with the individuation hypothesis and with the hypothesis that the FFA conducts domain-general processing of configuration/spacing information.

(c) The expertise hypothesis applied to the FFA

According to the expertise hypothesis, the FFA responds when subjects view stimuli for which they have gained substantial perceptual expertise. This hypothesis has been argued for vigorously by Gauthier, Tarr and colleagues (Gauthier & Tarr 1997) on the basis of fMRI studies in which subjects undergo extensive training in the laboratory on novel stimuli called ‘Greebles’, as well as other studies of real-world expertise for cars and birds (akin to the dog experts tested in the original Diamond & Carey study). We discuss these two kinds of studies in turn.

Gauthier et al. (1999b) scanned subjects looking at faces and Greebles, and report that activation for upright minus inverted Greebles in the FFA region increased throughout Greeble training. While Gauthier et al. (1999b) interpreted their data as evidence for an expertise effect in the FFA, there are several problems with this conclusion. First, rather than measuring the per cent signal change from baseline for each stimulus type, they reported only the difference between upright and inverted orientations; this tells us nothing about the crucial question of the magnitude of response to upright Greebles and upright faces after training. Second, since Greebles resemble faces (and/or bodies), they are a poor choice of stimulus to distinguish between the face-specificity and expertise hypotheses. Third, the ‘FFA’ was defined as a large square ROI, over a centimetre on a side, a method that guarantees the inclusion of voxels neighbouring but not in the FFA. Thus, it is possible, for example, that any training effects on Greebles may arise from the body-selective ‘fusiform body area’ (FBA; Peelen & Downing 2005; Schwarzlose et al. 2005) which is adjacent to the FFA (see §3e) rather than from the FFA itself. Finally, ‘activation’ was defined as the sum across the 64 voxels in the ROI of t-values resulting from a comparison of upright to inverted responses within each voxel (after excluding all t-values less than 0.1). This truncated ‘sum-of-ts’ measure (see also Gauthier & Tarr 2002) confounds an increase in signal change for upright versus inverted stimuli after training with a reduction in variance of this measure after training. Further, the authors failed to separately report the per cent signal change values for the upright and inverted conditions, which is standard in both behavioural and neural investigations of inversion effects. These problems leave the results of this study difficult to interpret.

In three recent studies that avoid these problems (Moore et al. 2006; Yue et al. in press; Op de Beeck et al. submitted), subjects were trained for many hours on fine-grained discrimination between exemplars of novel stimuli that do not resemble faces or bodies. None of these studies found a significant increase in the response of the FFA for trained compared to untrained object classes after training, but all three found significant training-induced increases in response in a nearby region called the lateral occipital complex (LOC), which is responsive to object shape in general, not faces in particular. Thus, laboratory training studies to date provide no evidence for the expertise hypothesis, instead supporting the face-specificity hypothesis.

Of course, 10 h of laboratory training is a far cry from the decades of expertise involved in face recognition or real-world expertise for dogs, cars or birds. Gauthier et al. (2000a) reported a greater increase in the right FFA response for cars and birds versus control objects in car and bird experts, respectively. This result has been replicated in one study (Xu 2005), but produced only a marginally significant trend in another study (Rhodes et al. 2004), and no effect at all in another (Grill-Spector et al. 2004). Note that even in the studies that do find expertise effects in the FFA, the effect size is very small and the response to faces (in per cent signal increase from fixation) remains at least twice as high as to any objects of expertise. Further, although Gauthier et al. emphasize as their strongest finding the correlation across subjects between behavioural expertise for cars/birds and the FFA response to cars/birds (Gauthier et al. 2000a), this correlation was in fact not found in the very task where the expertise hypothesis would predict it, namely during a task requiring discrimination of objects of expertise, but only when subjects were performing a location discrimination task on the same objects. The observed pattern is hard to account for within the expertise hypothesis, but is accounted for naturally by the alternate hypothesis that ‘expertise effects’ merely reflect increased attentional engagement (Wojciulik et al. 1998) of an expert on their objects of expertise, an effect that would be expected to be larger in the context of an orthogonal location task than an object discrimination task which forces attention onto object shape anyway. Consistent with the idea that the elevated activation for objects of expertise is simply due to greater attentional engagement by these objects, the available evidence suggests that any increased responses with expertise are not restricted to the FFA. Indeed, Rhodes et al. (2004) found significantly larger expertise effects outside the FFA than inside, and although Gauthier et al. (2000a) emphasize expertise effects in the FFA, their fig. 6 shows what appears to be substantially larger effects of expertise in parahippocampal cortex. Thus, real-world expertise effects are not restricted to the FFA, and when they are found in the FFA they are small in magnitude and uncorrelated with behavioural performance on expert object individuation.

Taken together, laboratory training studies and real-world expertise studies do not provide convincing evidence for the expertise hypotheses.

(d) Domain-general processing of configuration/spacing in the FFA

Although the individuation and expertise hypotheses have received the greatest attention in the literature, other domain-general accounts of the function of FFA are possible. Given the behavioural evidence that we are highly sensitive to the particular location of face parts (spacing) in upright faces (Haig 1984; Kemp et al. 1990), is it possible that this processing of configuration/spacing information could be applied to non-faces, and if so, might it engage the FFA? We tested this hypothesis by attempting to force subjects to process houses in the same way they process faces (Yovel & Kanwisher 2004; see figure 3). To do this, we constructed house stimuli that varied in the relative positions of the windows and doors, and a parallel set of faces was constructed that varied in the positions of eyes and mouths. These stimuli were carefully adjusted until performance in same–different discrimination of successively presented stimulus pairs was exactly matched across pairs of faces and of houses. Subjects were further informed that when two faces, or two houses, differed, it would be in the relative position of the parts of the face/house. Thus, we did everything possible to induce the same kinds of processing on the faces and houses. Nonetheless, the FFA response to faces was about three times as strong as the FFA response to houses in this task. Evidently, it is not possible to engage the FFA on non-face stimuli by inducing subjects to process those stimuli like faces. However, note that it remains an open question whether there is any way to induce face-like holistic processing on non-face stimuli, and whether such processing would recruit the FFA (for review, see Tanaka & Farah 2003).

(e) Is the FFA specific not only for faces but also for bodies?

Several recent studies have reported strong FFA responses to stimuli depicting headless bodies or body parts (Cox et al. 2004; Peelen & Downing 2005; Spiridon et al. 2005), challenging the specificity of the FFA for faces. Does the FFA actually respond strongly to body parts or is this apparently high response instead due to spillover activation from the adjacent body and face-selective FBA described by Peelen & Downing (2006)? To find out, we scanned subjects with relatively high-resolution fMRI (1.4×1.4×2 mm voxels instead of the more standard 3×3×4 mm voxels; Schwarzlose et al. 2005). We found that at high resolution, two distinct regions can be identified, one exclusively selective for faces but not bodies (the FFA*) and another exclusively selective for bodies but not faces (the FBA*). Thus, the apparently strong FFA response to body stimuli seen at standard scanning resolution apparently reflects the pooling of responses from two distinct regions (‘partial voluming’), one truly face selective and the other truly body selective. Interestingly, regions selective for faces and bodies are also nearby or adjacent in the region of the STS in humans (Downing et al. 2001), and they are also adjacent in macaques (Tsao et al. 2003; Pinsk et al. 2005). Once again, these findings support the face-specificity hypothesis.

(f) Do ‘non-preferred’ responses in the FFA form part of the code for non-faces?

In an important challenge to a more modular view of face and object processing, Haxby et al. (2001) argued that objects and faces are coded via the distributed profile of response across much of the ventral visual pathway. Central to this view is the suggestion that ‘non-preferred’ responses, for example to objects in the FFA, may form an important part of the neural code for those objects. While this ‘distributed coding’ hypothesis is still an active matter of debate, several considerations suggest that the FFA does not, in fact, play an important role in the representation of non-face objects. First, two studies have found that the profile of response across the voxels within face-selective patches in humans (Spiridon & Kanwisher 2002) and monkeys (Tsao et al. 2003) does not contain information enabling discrimination between different non-faces. Further, note that even if some discriminative information about non-face objects were present in the FFA (perhaps at higher resolution), it is not clear that this information would be used in perceptual performance. Indeed, the fact that some people with acquired prosopagnosia have apparently normal object recognition (Wada & Yamamoto 2001; Humphreys 2005) suggests that cortical regions that are necessary for face recognition are not necessary for object recognition. Finally, Tsao's single-unit recordings from face-selective patches in monkeys (see §2d) indicate that non-preferred responses in face-selective regions are virtually non-existent (Tsao et al. 2006), suggesting that the non-preferred responses observed in the FFA with fMRI may result from blurring of responses from an extremely face-selective FFA with neighbouring non-face-selective cortex (Schwarzlose et al. 2005). For all these reasons, we doubt that non-preferred responses in the FFA play an important role in coding for non-face objects. Indeed, more recently, Haxby and his colleagues have conceded that ‘preferred regions for faces…are not well suited to object classifications that do not involve faces…’ (O'Toole et al. 2005).

(i) Section summary

The evidence reviewed here argues against each of the six alternatives to the face-specificity hypothesis: the FFA does not appear to be selective for either lower level features or for the higher level category of bodies. Further, the evidence described here does not support a domain-general role for the FFA in individuation of exemplars of any category (including categories of expertise) or in extraction of the relative positions of parts within any stimulus type. Finally, we argue against the hypothesis that the FFA forms part of a distributed representation of non-face objects (Haxby et al. 2001), because damage to this region is devastating to face recognition but often leaves object recognition intact, and because physiological data from monkeys find almost no evidence for any response to non-face stimuli within face-selective patches in the first place. (In §4b, we describe evidence against another alternative hypothesis that the FFA is engaged in processing semantic information about people.) Instead, existing data support the hypothesis that the FFA is selectively engaged in the processing of faces per se. This conclusion brings us to the more interesting questions of what computations are performed on faces in the FFA, and what kinds of representations it extracts from faces.

4. What is the nature of the face representations in the FFA?

Many experiments implicate the FFA in determining face identity, i.e. in extracting the perceptual information used to distinguish between individual faces. For example, we showed a higher FFA response on trials in which subjects correctly identified a famous face than on trials in which they failed to recognize the same individual (Grill-Spector et al. 2004), implicating this region in the extraction of information about face identity. (No comparable correlation between the FFA response and performance was seen for identification of specific types of cars, guitars, buildings, etc.) Further evidence that the FFA is critical for distinguishing between individual faces comes from the fact that the critical lesion site for prosopagnosia is very close to the FFA (Barton et al. 2002; Bouvier & Engel 2005). However, these results tell us nothing about the nature of the representations extracted from faces in the FFA, which we turn to next.

What aspects of a face does the FFA respond to? Three prominent features of face stimuli are the classic frontal face configuration (the arrangement of two horizontally and symmetrically placed parts above two vertically placed parts), the presence of specific face parts (eyes, nose and mouth) and the bounding contour of a roughly oval shape with hair on the top and sides. Which of these stimulus properties are important in driving the response of the FFA? Liu et al. (2003) created stimuli in which each of these three attributes was orthogonally varied. The face configuration was either canonical or scrambled (with face parts rearranged to occur in different positions), veridical face parts were either present or absent (i.e. replaced by black ovals) and external features were either present or absent (with a rectangular frame showing only internal features, omitting chin and hairline). This study found that the FFA responds to all three kinds of face properties. Another study from our laboratory leads to the consistent conclusion that the FFA is involved in processing both the parts and the spacing among the parts of faces. We (Yovel & Kanwisher 2004) scanned subjects while they performed a successive discrimination task on pairs of faces that differed in either the individual parts or the configuration (i.e. spacing) of those parts (figure 3). Subjects were informed in advance of each block which kind of discrimination they should perform. The FFA response was similar and strong in both conditions, again indicating a role of the FFA in the discrimination of both face parts and face configurations. Thus, the FFA does not appear to be sensitive to only a few specific face features, but instead seems to respond generally to a wide range of features spanning the whole face.

(a) Invariances of face representations in the FFA

To understand the representations of faces extracted by the FFA, we need to determine their equivalence classes: which sets of stimuli are taken to be the same and which are taken to be different? If the FFA is involved in discriminating between individuals, then it must extract different representations for different individuals. But are these representations invariant across images of the same face that differ in size, position, view, etc?

The best current method for approaching this problem with fMRI is fMR adaptation (Grill-Spector et al. 1999; Kourtzi & Kanwisher 2001; Koutstaal et al. 2001), in which the blood oxygenation level dependent (BOLD) response to two (or more) stimuli in a given region of the brain is lower when they are the same than when they are different, indicating a sensitivity of that brain region to that stimulus difference. This sensitivity to the sameness of two stimuli enables us to ask each brain region which stimulus pairs it takes to be the same and which it takes to be different. Thus, this method enables us to discover equivalence classes and invariances in neural representations of faces in the FFA (Grill-Spector et al. 1999).1 Several studies have found robust fMR adaptation for faces in the FFA, i.e. a lower response to an identically repeated face than to new faces (e.g. Gauthier & Nelson 2001; Yovel & Kanwisher 2004; Avidan & Behrmann 2005; Eger et al. 2005; Pourtois et al. 2005b; Rotshtein et al. 2005). Does this adaptation reflect a representation of face identity that is invariant across different images of the same person? Indeed, several studies have found adaptation across repeated images of the same face even when those images differ in position (Grill-Spector et al. 1999), image size (Grill-Spector et al. 1999; Andrews & Ewbank 2004) and spatial scale (Eger et al. 2004). Further, Rotshtein et al. (2004) used categorical perception of morphed faces to show adaptation across physically different images that were perceived to be the same (i.e. two faces that were on the same side of a perceptual category boundary), but not across physically different images that were perceived to be different (i.e. two faces that straddled the category boundary). Thus, representations in the FFA are not tied to very low-level image properties, but instead show at least partial invariance to simple image transformations.

However, representations in the FFA do not appear to be invariant to non-affine changes in lighting direction (Bradshaw 1968), viewpoint (Warrington et al. 1971; Pourtois et al. 2005a; see also Fang & He 2005) and combinations thereof (Avidan & Behrmann 2005; Pourtois et al. 2005b). However, a recent study by Fang et al. (2006) reveals evidence for view-invariant representation of face identity in the FFA, in particular when the first stimulus (adaptor) is presented for a long duration (25 s). These findings suggest that long-term adaptation may reveal invariant properties of face representation in face-selective regions, which are not found in the typically used short-term adaptation.

In sum, studies conducted to date converge on the conclusion that neural representations of faces in the FFA discriminate between faces of different individuals and are partly invariant to simple image transformations including size, position and spatial scale. However, these representations are not invariant to changes in viewpoint, lighting and other non-affine image transformations.

(b) Does the FFA discriminate between familiar and unfamiliar faces?

A finding that the FFA responds differently to familiar and unfamiliar faces would support the role of this region in face recognition (though it is not required by this hypothesis as discussed shortly). Several fMRI studies have investigated this question (Sergent et al. 1992; Gorno-Tempini et al. 1998; George et al. 1999; Haxby et al. 2000; Leveroni et al. 2000; Wiser et al. 2000; Henson et al. 2002) using either famous faces or faces studied in the laboratory as familiar faces. For the purpose of this review, we will mainly focus on studies that report the response of the FFA to familiar and unfamiliar faces.

Two studies that investigated faces learned in the laboratory found opposite results, one showing an increase in the response to familiar compared with unfamiliar faces in the FFA (Lehmann et al. 2004) and the other (using PET) finding a decrease in the response to familiar faces (Rossion et al. 2003c). Although this discrepancy may be due to the use of different tasks in the two experiments (Rossion et al. 2003c; see also Henson et al. 2002), studies of famous faces, which provide a stronger manipulation of familiarity, do not give a much clearer picture. One study found a small but significant increase in the response to famous compared with non-famous faces (Avidan & Behrmann 2005), but two other studies found no difference in the response to famous versus non-famous faces in the FFA (Eger et al. 2005; Pourtois et al. 2005b; see also Gorno-Tempini et al. 1998; Gorno-Tempini & Price 2001). Taken together, these studies do not show a consistently different FFA response for familiar versus unfamiliar faces. Although these studies do not strengthen the case that the FFA is important for face recognition, it is important to note that they do not provide evidence against this hypothesis either. These results may simply show that the FFA merely extracts a perceptual representation from faces in a bottom-up fashion, with actual recognition (i.e. matching to stored representations) occurring at a later stage of processing. It is also possible that information about face familiarity is represented in the FFA but not by an overall difference in the mean response.

However, these studies do enable us to address a different question about the FFA, concerning its role in processing of non-visual semantic information about people. Since famous faces are associated with rich semantic information about the person, but non-famous faces are not, the lack of a consistently and robustly higher response for famous than non-famous faces in the FFA casts doubt on the idea espoused by some (Martin & Chao 2001), that this region is engaged in processing not only perceptual but also semantic information about people (Turk et al. 2005).

(c) The face-inversion effect and holistic processing in the FFA

As described in §2b, behavioural studies have discovered distinctive ‘signatures’ of face-like processing, including the face-inversion effect (Yin 1969) and the ‘composite’ effect. Does the FFA mirror these behavioural signatures of face-specific processing?

Early studies of the face-inversion effect in the FFA found little (Haxby et al. 1999; Kanwisher et al. 1999) or no (Aguirre et al. 1999; Leube et al. 2003) difference in the response to upright and inverted faces. However, we recently reported a substantially higher FFA response for upright compared with inverted faces (Yovel & Kanwisher 2004). Further, in a subsequent study, Yovel & Kanwisher (2005) reported that the FFA-face-inversion effect was correlated across subjects with the behavioural face-inversion effect. In other words, subjects who showed a large increment in performance for upright versus inverted faces also showed a large increment in the FFA response to upright versus inverted faces. Second, we found greater fMR adaptation for upright than inverted faces, indicating that the FFA is more sensitive to identity information in upright than inverted faces (Yovel & Kanwisher 2005; see also Mazard et al. 2005). Thus, consistent with the behavioural face-inversion effect, the FFA better discriminates faces when they are upright than inverted. In summary, in contrast to the previous findings that found only a weak relationship between the FFA and the face-inversion effect, our findings show a close link between these behavioural and neural markers of specialized face processing.

The larger inversion effect for faces than objects has been taken as evidence for holistic processing of upright but not inverted faces (Farah et al. 1995). However, more direct evidence for holistic processing comes from the composite effect (Young et al. 1987) in which subjects are not able to process the upper or lower half of a composite face independently from the other half of the face even when instructed to do so, unless the two halves are misaligned. This effect is found for upright but not inverted faces. If the FFA is engaged in holistic processing of faces, then we might expect it to show an fMRI correlate of the composite effect. Indeed, a recent study used fMRI adaptation to show evidence for a composite face effect in the FFA. In particular, the FFA only showed adaptation across two identical top halves of a face (compared with two different top halves) when the bottom half of the face was also identical, consistent with the behavioural composite face effect. As with the behavioural composite effect, the fMRI composite effect was found only for upright faces and was absent for inverted faces or misaligned faces.

Thus, fMRI measurements from the FFA show neural correlates of the classical behavioural signatures of face-like processing, including the face-inversion effect and the composite effect. These findings serve to link the behavioural evidence on face-specific processing with research on the FFA, as well as helping to characterize the operations and representations that occur in the FFA.

(d) Norm-based coding of faces

The power of caricatures to capture the likeness of a face suggests that face identity is coded in terms of deviation from the norm or average face, a hypothesis supported by behavioural studies (Rhodes et al. 1987; Leopold et al. 2001). A recent fMRI study found higher FFA responses to atypical compared with average faces, implicating the FFA in such norm-based coding of face identity (Loffler et al. 2005). However, efforts in this study to unconfound such face typicality effects from the greater adaptation effects expected between highly similar faces (in the average-face condition) versus very different faces (in the atypical face condition) were not entirely satisfactory. Therefore, the interesting hypothesis that the FFA codes faces in terms of deviation from the average face remains to be completely tested and explored.

(e) Is the FFA involved in representing facial expression information?

Functional MRI studies of face expression have primarily focused on the amygdala (e.g. Glascher et al. 2004; Williams et al. 2004). Studies that have investigated the response of the temporal cortex have found higher responses to emotional than neutral faces in the fusiform gyrus (Breiter et al. 1996; Dolan et al. 2001; Vuilleumier et al. 2001, 2003; Williams et al. 2004). It has been suggested that this effect is modulated by connections from the amygdala (Dolan et al. 2001). Consistent with this hypothesis, effects of facial expression (in contrast to face identity) are not specific to the FFA. Given the higher arousal generated by emotional faces, the higher response to expressive than neutral faces in the FFA may reflect a general arousal effect rather than specific representation of facial expression. Indeed, a recent fMR-adaptation study (Winston et al. 2003), in which expression and identity were manipulated in a factorial manner, did not reveal significant fMR adaptation to expression information in the fusiform gyrus, but did find fMR adaptation to face expression in regions in the STS. These findings are consistent with the idea that the FFA is involved in identity, but not expression processing, whereas the STS shows the opposite pattern of response (Haxby et al. 2000). However, a recent study found a higher FFA response during expression judgements than during identity judgements on faces (Ganel et al. 2005), casting some doubt on the simple idea that the FFA is involved exclusively in processing face identity information.

(i) Section summary

The results reviewed in this section provide the beginnings of a characterization of the computations and the representations that occur in the FFA. The FFA is implicated in face detection and face recognition, but evidence on the role of the FFA in discriminating familiar from unfamiliar faces or in discriminating emotional expressions in faces is inconsistent. Representations of faces in the FFA are partly invariant to simple image transformations such as changes in size, position and spatial scale, but largely non-invariant to changes in most viewpoints and lighting direction of the face image. The FFA shows both a face-inversion effect (i.e. a higher response for upright than inverted faces) and holistic processing of faces, as expected if this region plays a major role in face-processing phenomena established in previous behavioural work.

5. Areal specificity: do representations in the FFA differ from those in nearby cortical regions?

Does the FFA show not only functional specificity, i.e. a different profile of response to faces versus other stimuli (see §3 above), but also areal specificity, i.e. a different profile of response from that seen in other nearby cortical regions? Here, we contrast the pattern of response in the FFA with that of: (i) the nearby (and sometimes slightly overlapping) object-selective LOC (Malach et al. 1995), and (ii) the two other most widely reported face-selective regions, the OFA and the face-selective region in the STS.

(a) Contrasting the response of the FFA and the LOC

Numerous behavioural experiments have suggested that our representations of faces differ in important respects from our representations of non-face objects (e.g. see §2b). If the FFA plays an important role in the generation of these ‘special’ face representations, we should see parallel differences in the pattern of the BOLD response in FFA versus response of other cortical regions involved in representing object shape, such as the LOC. Importantly, in the studies described below, object-selective regions were defined as cortical regions that respond more strongly to objects than to scrambled images of objects, rather than as regions that respond more strongly to objects than faces, a comparison that has been used in some studies (Aguirre et al. 1999; Haxby et al. 1999; Andrews & Schluppeck 2004), but that is likely to yield not the LOC but a functionally very different region called the parahippocampal place area (PPA; Epstein & Kanwisher 1998). The problem with using the region identified with a contrast of objects greater than faces is that the response to faces is very low to begin with in this region, so the absence of sensitivity to stimulus manipulations here might be merely due to floor effects. In contrast, the LOC shows a high response to faces, in particular in its lateral occipital region, and it is therefore a more valid region to compare to the FFA.

Several studies have recently reported robust dissociations between the response of the LOC and the FFA. First, the FFA and LOC exhibit important and striking differences in the face-inversion effect. Whereas the FFA shows a significantly higher response to upright than inverted faces, the LOC shows an opposite effect of a higher response to inverted than upright faces (Yovel et al. 2005b; see also Aguirre et al. (1999) and Haxby et al. (1999) who found similar pattern in non-face-selective regions that responded higher to houses than faces). Furthermore, we measured the correlation across subjects between the magnitude of the fMR-face-inversion effect (i.e. the difference between fMRI response to upright and inverted faces), and the behavioural face-inversion effect (i.e. the difference between performance level to upright and inverted faces in a face discrimination task that subjects performance in the scanner; Yovel et al. 2005b). Only in the FFA was the fMR-face-inversion effect correlated across subjects with the behavioural face-inversion effect. This correlation was absent with the (opposite direction) fMR-face-inversion effect in LOC. These findings suggest that the FFA, but not LOC, is a neural source of the behavioural face-inversion effect.

Second, the sensitivity of the FFA to identity information in faces was recently assessed using an event-related fMR-adaptation technique (Yovel et al. 2005b). As explained in §4a, in fMRI adaptation, a higher response in a given brain region to two successively presented stimuli when they are different than when they are the same indicates sensitivity to that stimulus difference in that region of the brain. We created face stimuli with subtle differences between the faces (e.g. the faces shared the same hair but differed subtly in face identity information) and found robust adaptation for these faces in the FFA but no adaptation to faces in LOC. These data again suggest that only the FFA (not the LOC) is sensitive to subtle differences between different faces.

Third, as mentioned above, Grill-Spector et al. (2004) found a higher FFA response on trials in which subjects correctly identified famous faces versus when they were incorrect on faces of the same individuals. Importantly, LOC did not show this trial-by-trial correlation with successful discrimination of faces, showing once again a greater involvement of the FFA than the LOC in face identification.

Finally, we reported that the right FFA response was similar when subjects discriminated faces that differed in their parts or in the spacing among these parts (Yovel & Kanwisher 2004). The FFA response to houses was much lower than to faces and also similar for the spacing and part tasks. In contrast, LOC showed a higher response on the part task than the spacing task for both faces and houses (see figure 4). These findings resonate with theories of object recognition, which emphasize the role of parts in representations of object shape (Hoffman & Richards 1984; Biederman 1987), and contrast sharply with theories of face processing, which emphasize holistic representations.

Figure 4

The response of the FFA and LOC to the face and house stimuli (see figure 3) when subjects discriminate the stimuli based on their parts (eyes and mouth for faces, and windows and doors for houses) or the spacing among the parts. Findings show a clear dissociation between the FFA, which responds more strongly to faces than houses but similarly on the spacing and part tasks versus the LOC, which shows a similar response to faces and houses and a higher response when subjects discriminate stimuli based on parts than based on the spacing among the parts.

Taken together, these findings indicate that the representations in the FFA differ in many respects from the representations in LOC. Thus, the FFA is not only selective for faces, but also generates a specialized representation of faces that is qualitatively different from the representations of faces in other regions. Next, we contrast the FFA with other face-selective regions.

(b) Dissociation between face-selective regions (FFA, OFA and STS)

Several studies have compared the response of the FFA to the response of the two other face-selective regions, the OFA in the lateral occipital cortex and what we will call the fSTS (a face-selective region in the posterior part of the superior temporal gyrus). Figure 2 shows these face-selective activations on an inflated brain from one subject. Overall, these studies suggest that the FFA and OFA are primarily involved in distinguishing between individual faces, whereas the fSTS apparently extracts other dimensions of faces such as their emotional expression and gaze (Haxby et al. 2000).

FFA versus OFA. Findings by Rotshtein et al. (2005) showed that the OFA is more sensitive to physical aspects of the face stimulus than the FFA. In their morphed face experiment, the OFA showed a similar response to two faces that differed physically regardless of whether the subject perceived the two stimuli as similar or different. This finding contrasts with the FFA, which was sensitive to the perceived similarity, but not the physical similarity in their study. Second, in a recent study that investigated the neural basis of the face-inversion effect, Yovel et al. (2005b) found that the OFA showed a similar response to upright and inverted faces, and there was no correlation across subjects between the magnitude of the behavioural face-inversion effect and the difference in the response of the OFA to upright and inverted faces (OFA-face-inversion effect). In contrast, the FFA showed higher response to upright than inverted faces and this difference was correlated across subjects with the behavioural face-inversion effect. Finally, whereas the FFA responds to first-order stimulus information about both face parts and face configurations, the OFA is sensitive only to face parts (Liu et al. 2003). Taken together, these findings suggest that the representation of faces in the FFA is closer to the perceived identity of the face, whereas the OFA representation reflects more closely the physical aspects of the face stimulus.

Evidence that the OFA may be a critical stage in the face-recognition pathway comes from the case of an acquired prosopagnosic patient with no OFA in either hemisphere (Rossion et al. 2003a). Although this result by itself makes sense, a puzzle arises from the fact that the same patient shows an FFA in fMRI. One possible account of these findings is that this patient's FFA is present but not functioning normally because normal input from the OFA is disrupted. Indeed, a recent paper has used fMRI adaptation to show that the FFA in this subject does not discriminate between individual faces (Schiltz & Rossion 2006).

FFA versus fSTS. Studies that have examined the response of both the FFA and the fSTS show clear functional dissociations between the two regions. First, two studies have found that the FFA but not the fSTS is correlated with successful face detection. Andrews & Schluppeck (2004) presented ambiguous stimuli (Mooney faces) that were perceived as faces on some trials but as novel blobs on others. Whereas the FFA response was stronger for face than blob percepts (see also Kanwisher et al. 1998), the fSTS showed no difference between the two types of trials. These findings are consistent with Grill-Spector et al. (2004), who found that the response of the FFA was correlated with successful detection of faces in brief masked stimuli, but the response of the fSTS was not. The failure to find a correlation with successful face detection in the fSTS when stimuli are held constant (or are similar) is somewhat surprising, given that this region by definition responds more strongly when faces are present than when they are not. In any event, the correlation with successful face detection of the FFA but not fSTS, which was found in both studies, shows a dissociation between the two regions.

Given the findings just described, it is not surprising that the fSTS shows no sensitivity to face identity information. The first study to report a dissociation between FFA and fSTS found a higher response in the FFA when subjects performed a 1-back task on face identity than gaze information, and vice versa in the face-selective fSTS (Hoffman & Haxby 2000). Consistent with these findings, Grill-Spector et al. (2004) found no correlation of the fSTS response with successful identification of faces. Similarly, studies that used fMR adaptation found sensitivity to face identity in the FFA but not in the fSTS (Andrews & Ewbank 2004; Yovel et al. 2005b). The face selective fSTS did show fMR adaptation for identical faces relative to faces that differed in expression, gaze and viewpoint (Andrews & Ewbank 2004). However, since the faces differed in all three dimensions, it is hard to know whether the fSTS was sensitive to only expression, gaze or head rotation or to any combination of the three.

Several studies have found a robust face-inversion effect (higher response to upright than inverted faces) in the fSTS (Haxby et al. 1999; Leube et al. 2003; Yovel et al. 2005b). However, in contrast to the FFA, this difference between upright and inverted faces was not correlated with the behavioural face-inversion effect measured in a face identity discrimination task (Yovel et al. 2005b). These findings are consistent with the idea that the fSTS is not involved in face identity processing. Its higher response to upright than inverted faces may suggest that the computations which are done in the fSTS to extract dynamic aspects of facial information are specific to upright faces.

Taken together, these data indicate a robust dissociation between the face representations in the fSTS and the FFA, in which the FFA but not the rSTS represents identity information.

(i) Section summary

The evidence reviewed here indicates that the FFA differs functionally in a number of respects from both the shape-selective LOC and the two other best-known face-selective regions of cortex, the OFA and fSTS. Functional ROI analyses are sometimes criticized for focusing narrowly on one brain region, while ignoring the rest of the brain. Here, we show that a functional ROI investigation of the FFA which is accompanied by similar analyses of nearby regions allows us to assess the extent to which the FFA response is indeed ‘special’. The clear functional dissociations between these regions also demonstrate that the functional localizers used to define these regions indeed are picking out functionally distinct regions, reinforcing the importance of studying them independently. Many of the functional dissociations described in this section would probably not be apparent in a group analysis, because the necessarily imperfect registration of physically different brains would blur across nearby but functionally distinct regions such as the FFA and LOC.

6. Open questions

As our review of the literature shows, considerable progress has been made in understanding the FFA and its role in face perception. However, fundamental questions remain unanswered. In our final section, we speculate on two of these: the developmental origins of the FFA; and the question of whether the FFA is unique in the cortex or whether it is one of a large number of other cortical regions specialized for domain-specific cognitive functions. We end with a summary of the main conclusions from this review.

(a) Origins of the FFA

How does the FFA arise in development? Recent neuroimaging studies show that the FFA is still developing into the early teenage years (Passarotti et al. 2003; Aylward et al. 2005; Golarai et al. 2005). Intriguing as this finding is, it does not tell us about the mechanisms that give rise to the FFA. Is it constructed by a process of experience-dependent cortical self-organization (Jacobs 1997) or is it partly innately specified? For the case of faces, this question is hard to answer because both experiential and evolutionary arguments are plausible, and we have very little data to constrain our speculation.

On the one hand, experience must surely play some instructive role in the development of face areas, given the ample evidence that neurons in the ventral visual pathway are tuned by experience (Baker et al. 2002; Op de Beeck et al. submitted). Evidence of such experiential tuning of face perception, in particular, is seen in the ‘other race effect’, in which behavioural performance (Malpass & Kravitz 1969; Meissner & Brigham 2001) and neural responses (Golby et al. 2001) are higher for faces of a familiar than an unfamiliar race, even if the relevant experience occurs after age 3 (Sangrigoli et al. 2005). On the other hand, at least some aspects of face perception appear to be innately specified, as infants less than 24 h old preferentially track schematic faces compared with visually similar scrambled or inverted faces (Johnson et al. 1991; Cassia et al. 2004). However, these two observations leave open a vast space of possible scenarios in which genes and environment could interact in the construction of a selective region of cortex such as the FFA.

What does seem pretty clear is that the development of normal adult face processing (and thus by hypothesis the development of the FFA) is constrained both anatomically and chronologically. First, the very fact that the FFA lands in roughly the same location across subjects, along with its predominant lateralization to the right hemisphere, suggests some constraints on its development. Second, neuropsychological patients who selectively lose face-recognition abilities as a result of focal brain damage are rarely, if ever, able to relearn this ability, suggesting that the remaining visual cortex (which is adequate for visual recognition of non-face objects) cannot be trained on face recognition in adulthood (but see DeGutis et al. (in press) for evidence of short-term improvement in face recognition in a case of developmental prosopagnosia following extensive perceptual training with faces). Third, this apparent inability to shift face processing to alternate neural structures may be set very early in development, as evidenced by a patient who sustained damage to the fusiform region when only 1 day old, and who as an adult still has severe difficulties in the recognition of faces (and some other object categories; Farah et al. 2000). Although it is not clear what is so special about this region of the fusiform gyrus that the FFA apparently has to live here, one intriguing clue comes from reports that face-selective cortex also responds more strongly to central than peripheral visual stimuli (even non-faces; Levy et al. 2001). This fact may suggest that face-selective regions reside in centre-biased cortex either because it has computational properties necessary for face processing, or because we tend to foveate faces during development (Kanwisher 2001).

Other clues about the development of specialized mechanisms for face processing come from individuals with ’developmental prosopagnosia’, who have no brain damage discernible from MRI images or life histories, but who have severe and lifelong impairments of face recognition (Behrmann & Avidan 2005). For at least some of these individuals, the deficit is remarkably selective for face processing only (Duchaine et al. 2006), providing powerful converging support for the face-specificity hypothesis. Anecdotal reports suggest that developmental prosopagnosia may run in families (De Haan 1999; Duchaine & Nakayama 2005; Kennerknecht et al. 2006).

The possible heritability of this syndrome, its strong specificity for faces and its developmental nature all suggest that genetic factors may contribute to the construction of face-processing mechanisms. One as yet unresolved mystery is why many developmental prosopagnosic subjects have FFAs (Hasson et al. 2003; see also Vuilleumier et al. 2003). This may indicate that either the deficit in these subjects arises at a later stage of processing or the FFAs in these subjects exist but do not function normally (Schiltz & Rossion 2006). Although Avidan et al. (2005) have argued that the FFAs of prosopagnosic subjects show normal fMR adaptation for face identity, these studies were conducted using a blocked design which is subject to attentional confounds.1

Evidence that very early experience is also crucial in the development of normal adult face recognition comes from studies of individuals born with dense bilateral cataracts (Maurer et al. 2005). These people have no pattern vision until their cataracts are surgically corrected between two and six months of age. After surgery, pattern vision is generally intact, though not quite normal. Surprisingly, these individuals never develop normal face perception. As adults, they are impaired (relative to normal subjects) at discriminating between upright faces. Although it has been claimed that the deficit in these patients is specific to discriminations between faces on the basis of the position of the features, not the shapes of individual features, the stimuli used in the study making this case (i.e. the Jane face) confound spacing/part changes with overall difficulty. Importantly, studies that matched the task difficulty of the spacing and the part tasks found that prosopagnosic individuals showed deficits for both spacing and part discrimination tasks (Yovel & Duchaine 2006). Second, face parts used by Le Grand et al. (2004) differed not only in shape, but also in contrast/brightness information (e.g. lipstick). A recent study showed that prosopagnosic individuals can normally discriminate between faces in which the parts differ in contrast/brightness in addition to shape information (Yovel & Duchaine 2006). Thus, to determine the role of spacing and part-based information in face recognition in these patients, it will be important to retest the early-cataract subjects with these more balanced stimuli in which face parts differ by shape and not by contrast/brightness information, which can be discriminated by non-face mechanisms.

Studies that examined holistic processing showed that these patients do not show the composite effect (described in §1; Young et al. 1987) indicating a failure to process faces holistically (Le Grand et al. 2004). Thus, pattern vision in the first few months of life is necessary for the development of normal face processing as an adult; years of subsequent visual experience with faces is not sufficient. Most intriguingly, it is early deprivation of input specifically to the right hemisphere that leads to adult impairments in face processing in these individuals; early deprivation of visual input to the left hemisphere does not (Le Grand et al. 2003). Thus, although these investigations point to a critical role of experience in the construction or maintenance of face-processing mechanisms, this experience must be directed to a specific anatomical target (the right hemisphere) and must occur very early in development. Two important pieces of this puzzle have yet to be answered empirically. First, is the deficit in cataract patients specific to face perception? Here, it would be particularly useful to measure the performance of these people on closely matched face and non-face stimuli such as those shown in figure 3. Second, what happens to the FFA in individuals with early bilateral cataracts? We speculate that they may have FFAs (as developmental prosopagnosic subjects do), but their FFAs may not function normally.

A brief comment about studies of the supposed lack of FFAs in individuals with autism spectrum disorder (ASD; Schultz et al. 2000; Critchley et al. 2001; Pierce et al. 2001). This finding has been cited as evidence for a role of experience in the construction of the FFA, based on the argument that ASD subjects tend not to look at faces during development as much as normal subjects do. However, this argument has multiple flaws. First, few would doubt the conclusion that experience with faces is important in the development of the FFA. The interesting question is whether experience plays an instructive rather than a permissive role (Crair 1999). (An instructive role for experience might predict that people—or more likely, monkeys—raised in an environment where faces had a very different structure would develop face-processing mechanisms that are selectively responsive to this alternate structure.) Studies of autism cannot answer this question. Second, even if individuals with ASD lacked FFAs as claimed, this would not demonstrate the importance of experience for the development of the FFA, because these disorders also have a genetic component which could itself be responsible for the lack of an FFA. Third, given the well-documented tendency of individuals with ASD to avoid looking at faces, any failure to find FFAs in subjects with ASD may result from the failure of the subjects to look at the stimuli during the scans (!). Indeed, studies that required subjects to fixate faces found normal face activation in the fusiform gyrus in subjects with ASD (Hadjikhani et al. 2004; Dalton et al. 2005). Thus, current investigations of FFAs in ASD subjects do not help us understand the developmental mechanisms by which FFAs are constructed.

One way to unconfound genetic and experiential factors in the development of category-specific regions of cortex is to consider a category for which a specific role of genes is unlikely: visual word recognition. People have only been reading for a few thousand years, which is probably not long enough for natural selection to have produced specialized machinery for visual word recognition (Polk & Farah 1998). Thus, strong evidence for a region of cortex selectively involved in the visual recognition of letters or words would provide an existence proof that experience alone with a given category of stimulus, without a specific genetic predisposition, can play an instructive role in the construction of a region of cortex that is selectively involved in the recognition of stimuli of that category. Some evidence has been reported for cortical specializations for visually presented letters (Polk et al. 2002) and words (Cohen et al. 2000). Ongoing work in our laboratory reinforces these conclusions, showing small letter string selective regions in most subjects tested individually, and further showing that these selectivities are shaped by experience. Of course, the fact that experience can apparently create cortical selectivities in the absence of a specific genetic blueprint for that cortical region does not imply that this is the origin of the FFA.

In sum, substantial evidence indicates important roles for both genetic factors and specific early experience, in the construction of the FFA. Although a detailed account of this process remains elusive, the recent discovery of a possible homologue of the FFA in macaques (Tsao et al. 2003; see §2d) opens up the exciting new possibility of investigating the effect of early experience on the development of face-selective regions of cortex.

(b) Cortical specialization for other functions?

Of course, the evidence for the face-specificity hypothesis reviewed here need not imply that all of cognition is conducted by domain-specific mechanisms. Are faces unique in this degree of functional specificity or do other similarly selective regions of cortex exist in the human brain? Within the occipitotemporal pathway, we have characterized two other category-selective regions, the PPA, which responds selectively to images of places (Epstein & Kanwisher 1998) and the extrastriate body area (EBA) that responds selectively to images of bodies and body parts (Downing et al. 2001). Like the FFA, these areas can be found in more or less the same anatomical location in almost every normal subject. These category-selective regions thus constitute part of the basic functional architecture of the human brain.

Are these three category-selective regions just the tip of the iceberg, with dozens more in the occipitotemporal pathway waiting to be discovered? In a broad survey of 20 different stimulus categories, Downing et al. (2006) replicated the FFA, PPA and EBA in the vast majority of subjects, but failed to find other categories that produce the kind of strongly selective response in a focal region of cortex seen in the FFA, PPA and EBA. Of course, there are many ways to fail to detect a category-selective region that actually exists and new ones may be evident when we scan at higher resolution (Schwarzlose et al. 2005). Nonetheless, it appears that we do not have special regions of cortex on the spatial scale of the FFA, PPA and EBA for many common categories; faces, places and bodies may be ‘special’ in the cortex.

Why these categories and (apparently) not others? In our efforts to answer this question, explorations of other domains of cognition may provide important clues. The recent discovery of a region in the temporoparietal junction which is very selectively involved in the representation of other people's beliefs (Saxe & Kanwisher 2003; Saxe & Wexler 2005) shows that a high degree of cortical specificity is not restricted to the realm of high-level vision. Ongoing work is investigating the possibility that the human brain also contains cortical regions selectively involved in other domains of cognition, such as number (Dehaene et al. 2004; Shuman & Kanwisher 2004), language (Caplan 2001) and music (Peretz & Zatorre 2005).

7. Summary

In this review, we began with the classic question of whether face processing recruits domain-specific mechanisms specialized for face perception per se (the face-specificity hypothesis). This question has remained at the heart of theoretical and experimental work on face perception for decades. The research reviewed here shows that the field is in fact making progress in resolving this longstanding debate, as the evidence supporting the face-specificity hypothesis is getting ever stronger. Studies of the FFA have contributed importantly, enabling us to rule out five of the most widely discussed domain-general accounts of the function of this cortical region and supporting the face-specificity hypothesis.

We then turned to the question of what the FFA does with faces and what kinds of representations it extracts from them. Many studies implicate the FFA in extracting the perceptual representations of faces used in face recognition (and face detection), and several studies have further shown that the pattern of response in the FFA mirrors classic behavioural signatures of face processing such as the face-inversion effect. Further work using fMRI adaptation has enabled researchers to characterize the representations of faces in the FFA, which are partly invariant to simple image transformations (such as changes in size and position), but not to changes in viewpoint or lighting direction. In §5, we reviewed the evidence that the FFA shows not only functional specificity (for faces versus objects) but also area specificity: the response profile of the FFA differs in many respects from that of the nearby shape-selective LOC, as well as that of two other face-selective cortical regions (the OFA and the fSTS). We then speculated about the origins of the FFA in development, noting that experience with faces is likely to be crucial, but that evidence also suggests strong anatomical and chronological constraints on when and where this experience can be used in the construction of the FFA.

Finally, we returned to the question of domain specificity of mind and brain, pointing out that despite the very strong evidence for domain-specific mechanisms for face perception, there is no reason to assume that all or even most of cognition will be implemented in similarly domain-specific mechanisms. Thus, the nature and specificity of the mechanisms underlying other domains of cognition can only be resolved by detailed investigation of each. In this enterprise, the cognitive neuroscience of face perception will serve as an informative case study.

Acknowledgments

We would like to thank Chris Baker, Mike Mangini, Scott Murray and Rachel Robbins for their comments on the manuscript. We also thank Bettiann McKay for help with manuscript preparation. This research was supported by NIH grants 66696 and EY13455 to N.K.

Footnotes

  • One caveat should be noted here, however. Several fMRI-adaptation studies (Tarr & Gauthier 2000) have used blocked designs, which is problematic because subjects are likely to pay less attention to a block in which the identical stimulus is presented many times in a row, than a block in which each stimulus is new. Thus, this design confounds adaptation with attention (which is well known to affect the FFA response (Wojciulik et al. 1998) leading to potential overestimation of adaptation effects. For this reason most current studies minimize this confound by using event-related methods to measure adaptation (Kourtzi & Kanwisher 2001).

References

View Abstract