This article discusses four different scenarios to specify increasingly complex mechanisms that enable increasingly flexible social interactions. The key dimension on which these mechanisms differ is the extent to which organisms are able to process other organisms' intentions and to keep them apart from their own. Drawing on findings from ecological psychology, scenario 1 focuses on entrainment and simultaneous affordance in ‘intentionally blind’ individuals. Scenario 2 discusses how an interface between perception and action allows observers to simulate intentional action in others. Scenario 3 is concerned with shared perceptions, arising through joint attention and the ability to distinguish between self and other. Scenario 4 illustrates how people could form intentions to act together while simultaneously distinguishing between their own and the other's part of a joint action. The final part focuses on how combining the functionality of the four mechanisms can explain different forms of social interactions. It is proposed that basic interpersonal processes are put to service by more advanced functions that support the type of intentionality required to engage in joint action, cultural learning, and communication.
Humans have an amazing ability to cooperate with one another to achieve things they cannot achieve alone. Almost every single action we perform is embedded in a long chain of events that involves hundreds, if not thousands, of interacting people. Think of the simple act of making coffee. The coffee maker had been designed by a team of engineers, assembled by a team of workers and delivered to a store through the workings of a logistics company before you went to buy it. A similar complex chain of social interactions brought coffee beans, milk and sugar, as well as mug and spoon into your kitchen. Whereas some forms of human social interaction appear to be unique in their complexity, there are also many basic forms of human social interaction, some of which seem to be shared with other animals (cf. Barresi & Moore 1996; Tomasello & Call 1997), be it bees communicating the location of food sources (e.g. Riley et al. 2005), a school of fish moving in synchrony (e.g. Stone et al. 2003), lions hunting together (Stander 1991) or apes grooming one another (de Waal 1989). How can we distinguish between different forms of social interaction, and what are the mechanisms underlying them?
We will discuss four different scenarios to specify increasingly complex mechanisms that enable increasingly flexible social interactions. The key dimension on which these mechanisms differ is the extent to which organisms are able to process other organisms' intentions and keep them apart from their own. Of particular interest to us is the ability to engage in joint action, defined as any form of social interaction where two or more individuals coordinate their actions in space and time to bring about a change in the environment (Sebanz et al. 2006a). We will start with a brief review of previous thinking about the role of intentions in social interaction. In four scenarios, we will then move from the interaction of ‘intentionally blind’ organisms to that of organisms that can simultaneously keep their own and others' intentions in mind, discussing different notions of intention used in current research as we go along. Finally, we will consider how combining the functionality of the four mechanisms can explain different forms of social interactions. Our guiding hypothesis is that basic interpersonal processes are put to service by more advanced functions that support the type of intentionality required to engage in joint action.
2. The role of intention in previous thinking
The distinction between controlled and automatic processing (Schneider & Shiffrin 1977) has dominated psychological research on social cognition for the last three decades (Bargh 1984; Wegner & Bargh 1997; Greenwald et al. 2002) and continues to be strong (e.g. Dijksterhuis & Nordgren 2006). In this distinction, consciousness and intentionality are equated (controlled=conscious, automatic=unconscious), leading to a categorical distinction between intentional and non-intentional processes within individual cognitive systems. Accordingly, there has been a strong focus on individual processing of social information, which is alive and well in current research in social cognitive neuroscience (Lieberman 2007; Ochsner 2007). Social psychologists have been keen to demonstrate how social stimuli affect social behaviours outside of awareness (e.g. Banaji & Hardin 1996; Dijksterhuis & van Knippenberg 1998; Dasgupta & Greenwald 2001; Bargh & Williams 2006). Research on shared intentions and reciprocity in social interactions has tended to be restricted to the conscious, controlled level (see Smith & Semin (2004) for an alternative approach).
Individual higher-level cognition has also been the focus of the philosophical main stream within cognitive science, addressing the representation of mental states like desires and beliefs (e.g. Fodor 1975) rather than intentions. Most relevant for the present purpose is the work on Theory of Mind, our ability to attribute mental states to others (see Flavell 2004 for a review). This work has guided research on social cognitive development towards the study of explicit knowledge about others and has influenced research on the neural underpinnings of mind reading (e.g. Vogeley et al. 2001; Frith & Frith 2006; Saxe 2006; Apperly 2008). One central question of Theory of Mind research has been how individuals reason about one another (e.g. Wimmer & Perner 1983; Repacholi & Gopnik 1997). Some theories suggest that knowing and reasoning about the social world are not much different from knowing and reasoning about other domains such as physics (Gopnik & Wellman 1992; Saxe 2005). Only recently, there has been increasing interest in how the development of intentional action affects social understanding and social interaction (Gergeley et al. 2002; Elsner & Aschersleben 2003; Sommerville & Woodward 2005; Tomasello et al. 2005; Falck-Ytter et al. 2006).
In another philosophical approach, philosophy of action (e.g. Searle 1983; Bratman 1987; Mele 1992; Pacherie 2005), intention is a central construct. Philosophers of action have explicitly addressed intentions arising in reciprocal social interaction where people work together (Bratman 1992; Tuomela 1993; Gilbert 2003). One main issue of this debate is whether individuals' intentions mainly refer to their part in a social interaction or whether they refer to what the group as a whole wants to achieve (‘we-intentions’, see Pettit & Schweikard 2006). Philosophical approaches to joint action have influenced empirical work on collaborative activities with a focus on language use (Clark 1996; Brennan 2005), but otherwise have rarely been subject to empirical testing. At the same time, the contribution of lower-level processes to social interaction has hardly been considered. This has led philosophers to postulate complex intentional structures that often seem to be beyond human cognitive ability in real-time social interactions—leading to a sort of ‘intention inflation’.
In contrast to the approaches described above, several schools of thought, broadly pertaining to embodied cognition (cf. Clark 1997; Barsalou 2008), have stressed that higher-level cognition is grounded in basic perception and action processes or emerges out of the interaction of the organism with its environment (Gibson 1979; Smith & Thelen 1994; Port & van Gelder 1995; Van Orden et al. 2003). Only recently, it has been recognized that these assumptions may have fundamental implications for social interaction (Rizzolatti & Arbib 1998; Barsalou et al. 2003; Gallese et al. 2004; Arbib 2005; Knoblich & Sebanz 2006; Marsh et al. 2006; Sebanz et al. 2006a; Sommerville & Decety 2006; Spivey 2007). The core idea is that basic perceptual and motor processes are sufficient to enable many basic forms of social interaction and are still part of the machinery that makes more complex social interactions possible.
If one assumes that these basic forms of social interactions are not void of intentions (Shaw 2001; Jordan & Ghin 2007), it seems possible that the evolution of intentional mechanisms could be the key dimension that has enabled increasingly sophisticated social interactions (Barresi & Moore 1996; Tollefsen 2005; Tomasello et al. 2005; Pacherie & Dokic 2006). In the following we will spell out this idea based on recent empirical findings, thereby attempting to bridge the gap between embodiment accounts and purely cognitive accounts of social interaction (cf. Barresi & Moore 1996). We start with a scenario where organisms lack any functionality that would allow them to share or recognize intentions.
3. Scenario 1: social couplings between ‘socially blind’ individuals
Scenario 1 illustrates social interactions as envisaged by ecological psychology (Marsh et al. 2006). In this scenario, the behaviour of two moving actors A1 and A2 can become coupled either because they mutually affect each other's behaviour (entrainment; figure 1a) or because an object (O) in the environment provides the same individual action opportunity for both actors (simultaneous affordance; figure 1b). To illustrate entrainment, two people sitting next to each other in rocking chairs tend to synchronize their individual rocking frequencies. To illustrate simultaneous affordance, buffets invite hungry people to pile food onto their plates, resulting in converging movement towards the buffet and a high density of people moving around it.
In order to properly interpret concepts such as affordance and entrainment, it is important to keep in mind that ecological psychology is probably the most radical version of embodiment, rejecting any notion of representation that is internal to the actor. In this emphatically interactionist view of how actors and environment relate (Gibson 1979; Turvey 1990; Shaw 2001), it is assumed that information arises as an invariant relation between actors' dynamically changing movements and their dynamically changing perception. As a consequence, perception and movement reciprocally (co-)specify each other. In contrast to most cognitive science notions, intentions are not considered as a mental or psychological state within a person. Instead they are considered to be a property of the ecosystem (Shaw 2001) arising in the interaction between organisms and their environment. Accordingly, intentions are considered to be an aspect of the physical world rather than the mental world. A key concept that illustrates this notion is ‘affordance’, which refers to ‘action possibilities’, that a particular environment provides for an organism given the organism's particular action repertoire. A further implication of the ecological approach is that actor–object relations and actor–actor relations are considered as being governed by the same dynamical principles.
The central role of dynamical relationships in the ecological framework has led researchers in this field to primarily explore temporal synchronization during social interaction. The first studies tested the assumption that the same dynamical principles hold when a single person coordinates the movement of two limbs (Kugler & Turvey 1987; Kelso 1995) and when two people coordinate the movement of one limb each (Schmidt et al. 1990; Mottet et al. 2001). This is expected because in both cases two moving entities become entrained, regardless of whether they belong to one or two people (Spivey 2007). It was found that participants swinging one leg each from left to right in a coordinated fashion showed a dynamical relation between their legs, which is typically observed in single participants moving two limbs in a coordinated fashion. In particular, as they sped up together, they switched from a less stable parallel mode where they both synchronously swung their legs in the same direction (≫, ≪) to a more stable symmetric mode where they both synchronously swung their legs in opposite directions (<>, ><). The same pattern was observed when single participants moved two limbs synchronously. Similar results have been obtained for pendulum swinging (Schmidt et al. 1998).
Later studies showed that similar temporal entrainment effects occur even when people are not instructed to synchronize their movements (Schmidt & O'Brien 1997; Richardson et al. 2005). A suitable example for this comes from a study where two participants sat side by side in rocking chairs that had more or less similar natural rocking frequencies (Richardson et al. 2007, 2008). This was manipulated by positioning weights on a platform attached at the base of the chair (the higher the weight, the slower the natural rocking frequency). Participants either looked at each other's chairs or looked away from one another. In the condition where they looked at one another, they tended to rock together in synchrony even when the natural frequencies of the rocking chairs differed. In a sense, participants rocked against natural frequencies in order to rock in synchrony.
Whereas entrainment arises in a direct interaction between two (or more) organisms perceiving each other, the ecological framework seems to leave room for another mechanism of coordinated behaviour that is mediated by object affordances (cf. ‘funktionale Toenung’, von Uexküll 1920; Gibson 1977). When two organisms have a similar action repertoire and perceive the same object, they are likely to exhibit similar behaviours because the object ‘affords’ (invites) the same actions for them. Although object affordances have been studied extensively in research on individual perception (Jones 2003), we are not aware of any psychological research looking at the role of affordances in coordinating behaviour between different individuals.
Note that some researchers have started to explore how the presence of another person provides affordances for acting together (Richardson et al. 2007, 2008). This is different from the mechanism we consider here, because in our scenario actors do not perceive actor–object relations. We mean the simple fact that if somebody spreads bread crumbs on a Venetian Piazza he/she will probably be surrounded by dozens of pigeons that, presumably, are not looking for company. Such simultaneous affordances can probably act as a magnet for ‘social encounters’ which increase the likelihood of direct interactions between individuals, such as entrainment.
4. Scenario 2: relating to others through action simulation
Scenario 2 depicts social interaction as envisaged by extensions of James's ideomotor theory (James 1890; extensions: Prinz 1997, Jeannerod 1999, Hommel et al. 2001) and supported by findings on mirroring (Decety & Grezes 1999, 2006; Rizzolatti & Craighero 2004). The ideomotor approach maintains that individuals perceive others' actions in the light of their own action repertoire (see figure 1c,d). Perceiving an actor manipulating an object activates a corresponding representation of the perceived action in the observer. Through this match, the observer simulates performing the perceived action. The same applies to perceiving how one actor directs his/her actions at another. To illustrate, when one sees someone grasping a glass of beer, one's own motor programmes of grasping a glass get partially activated leading to a simulation of the observed action. Similarly, when one sees someone patting a third person on the shoulder, the motor programmes for patting will be partially activated in the observer.
In contrast to the ecological approach, the ideomotor approach puts intentional representations into the organism and postulates an interface between perception and action that allows observers to simulate intentional action in others. Two central components of this interface can be distinguished. The first component is a representational level of common codes (Prinz 1997) capturing aspects of a situation that remain invariant across situations where one acts upon objects or individuals oneself and situations where one perceives another person acting upon objects or individuals. These invariants can lie in the effect the action has on the object (action effect, Hommel et al. 2001) or in the movement with which the action is implemented. The second component consists of simulation mechanisms tapping into the observer's motor system (Blakemore & Decety 2001; Grush 2004; Wilson & Knoblich 2005). These mechanisms can be used not only to derive action goals during or after observing actions (Bekkering et al. 2000; Rizzolatti & Craighero 2004; Hamilton & Grafton 2006), but can also be used to predict the outcomes of actions as they unfold (Knoblich & Flach 2001; Umiltá et al. 2001; Schubotz & von Cramon 2004; Wilson & Knoblich 2005). In a nutshell, the assumption is that when one observes others' actions, one can project intentional relations guiding one's own object- or person-directed actions onto be observed actions.
There is rich empirical evidence to support the mechanisms outlined above (for a recent review see Keysers & Gazzola 2006), ranging from single cell studies in monkeys to behavioural and brain imaging studies in humans. The ideomotor approach received broad attention following the discovery of mirror neurons in the ventral premotor (Gallese et al. 1996) and inferior parietal (cf. Fogassi et al. 2005) cortex of macaque monkeys (hence the term ‘mirroring’). These neurons fire not only when the monkey performs an object-directed action, such as grasping a grape, but also when the monkey observes another individual perform the same action. Thus, mirror neurons provide a neural substrate for the direct perception–action match described above. In humans, brain activity is observed in analogue areas not only when they observe object-directed actions but also when they observe pure bodily movements (Decety et al. 1997; Buccino et al. 2001, 2004; Grezes et al. 2003), such as dancing (Calvo-Merino et al. 2005; Cross et al. 2006). Behaviourally, the close link between perception and action manifests itself in facilitation and interference effects, where it is easier to perform the same actions one is concurrently observing (Stürmer et al. 2000; Brass et al. 2001) and more difficult to perform actions opposite to those concurrently observed (Kilner et al. 2003).
So far, our discussion has focused on how the ideomotor machinery allows an observer to identify actor–object relations. Hardly explored so far is the question of how actor–actor relationships are perceived (but see Prinz in press). Does action simulation also occur when one perceives an organism acting upon another, such as when a monkey perceives one monkey grooming another? It would be surprising if this were not the case—otherwise one would need to assume that monkeys are able to distinguish between actor and object relations and actor–actor relations and that a perception–action match occurs only for the former. Another question that arises in this context is how the action simulation mechanism deals with situations where two organisms interact. Whereas actor–object relations are asymmetrical by definition (actor acts upon object), actor–actor relations are frequently symmetric with two organisms acting upon each other. This raises the question of whose actions get simulated, those of one actor, the other or both? We will come back to this issue in our discussion of the next scenario.
5. Scenario 3: sharing perceptions with others
Scenario 3 (see figure 1e) depicts social interaction as envisaged by developmental psychologists studying joint attention (Moore & D'Entremont 2001; Tomasello & Carpenter 2007). Research on joint attention addresses the question of how people manage to attend to the same objects or actors in the world together (Eilan et al. 2004). Two different components of joint attention can be distinguished. One is the ability to derive the location an observed actor is attending to, using cues such as eye gaze (Flom et al. 2006) or body orientation (Jellema et al. 2000) to simulate what the other perceives or does not perceive. A further critical component is to relate one's own and the observed actor's perceptual experiences, and in particular, to determine whether these experiences are shared (Tomasello & Carpenter 2007). Thus, the focus is on shared perceptions rather than shared intentions. However, we believe that the self–other distinction arising in the attention domain may pave the way for keeping one's own and others' intentions apart.
Empirical studies on joint attention have focused more on developmental trajectories than on specific mechanisms. One central finding is that the ability to derive the location to which an observed actor is attending (e.g. gaze following) develops earlier than the ability to relate one's own and others' perceptions, both phylogenetically (Kaminski et al. 2005) and ontogenetically (Tomasello et al. 2005). Gaze following has been shown in behavioural studies on goats (Kaminski et al. 2005), dogs (Hare & Tomasello 2005) and chimpanzees (Hare et al. 2000). Single cell studies in monkeys have revealed that neurons in the anterior part of the superior temporal sulcus may crucially contribute to the ability to follow others' gaze and to determine what they are seeing (e.g. Jellema et al. 2000). In contrast, the ability to relate one's own and others' perceptions seems to be present only in humans (Tomasello & Carpenter 2007) emerging from 12 months of age onwards (Moore & D'Entremont 2001; Liszkowski et al. 2004).
The mechanisms behind the ability to relate one's own and others' perceptions are still somewhat underspecified. Tomasello et al. (2005, p. 682) refer to a ‘special motivation to […] perceive together with others’. However, the functional mechanisms that need to be in place to achieve this ability are not spelt out. Clearly, in order to determine to what extent perceptions are shared with others, one needs to be able to keep the perceptions of self and other apart. This is a crucial difference to the previous two scenarios. In the present scenario, an observed actor–object or actor–actor relationship leads to a perceptual simulation of what the actor perceives (cf. current imagined schema, Barresi & Moore 1996), which is separable from one's own perception. In addition to this separation, one needs to postulate mechanisms that compare the two perceptions. Such mechanisms may drive the development of new actions to guide others' attention, e.g. pointing somewhere the other should look (Kita 2003).
It is somewhat unsatisfying that one needs to suddenly resort to a mechanism that keeps self and other apart without being able to explain how it came into existence. One possible solution is to look for aspects of the previous two (simpler) scenarios that can support a developing self–other differentiation (cf. Rochat 2003). In the ecological scenario 1, there is an asymmetry in respect to how one interacts with actors (entrainment) or with objects (affordances). Whereas objects tend to remain stationary, other actors tend to move. This could lead to particular invariances that only exist in the interaction with other actors and would thus provide dynamical cues to distinguish between actors and objects. Such an animate–inanimate distinction (e.g. Wheatley et al. 2007) could be a first step towards distinguishing between self and other, because it paves the way for ‘conceiving’ of oneself as an actor and not an object.
Within the ideomotor scenario 2, a further avenue towards distinguishing self and other arises through the asymmetry between actor–object relations and actor–actor relations. The latter are special in that one can not only simulate carrying out an observed action but that one may also develop the ability to simulate what it feels like being the recipient of the observed action. Evidence for this type of simulation comes from brain imaging studies demonstrating that the brain areas involved in feeling touch are also activated when one sees someone else being touched (e.g. Keysers et al. 2004). Similarly, observing someone receiving a painful stimulation leads to activation in brain areas involved in feeling pain (e.g. Singer et al. 2004). The two different types of simulation, in turn, could give rise to a basic distinction between actor and recipient (agents and patients), which could be a further building block the self–other distinction rests on. Simulating the two roles of the actor–actor relationship could pave the way for conceiving of oneself as actor and recipient and to attribute the complementary role to an entity like oneself, which becomes the ‘other’. These and further developments could become channelled into a coherent representation of self and other and thus provide the functionality needed for scenario 3.
6. Scenario 4: intending with others
Scenario 4 (see figure 1f) illustrates the intentional machinery that completes the minimal functionality that is needed to engage in joint action. Unlike the previous three scenarios, we cannot link this scheme directly to a particular theoretical approach. It shares some similarities with the Theory of Mind approach because a central component is to distinguish between one's own and others' mental states. However, we focus on the representation of intentions rather than beliefs or desires (cf. Pacherie 2005). Furthermore, our actors share the same physical environment enabling them to derive intentions from perceived actor–object and actor–actor relations. In contrast, Theory of Mind research typically uses more abstract tasks where participants are not directly involved in a social interaction.
We propose that three critical components are needed to explain how people can form intentions to act together while simultaneously distinguishing between their own and the other's parts of a joint action. First, actors need to be able to derive the intentions behind object-directed actions (Runeson & Frykholm 1983; Grezes et al. 2004) and actor-directed actions (Heider & Simmel 1944; Schultz et al. 2005). This is different from the action simulation described in scenario 2, because it implies that the other is perceived as an intentional agent (Dennett 1987).
Second, the actors in scenario 4 need to be able to keep derived intentions separate from their own intentions. This could be achieved through a similar mechanism of self–other distinction as the one needed to keep one's own and others' perceptions apart in scenario 3. Whereas these assumptions are straightforward, the third assumption is critical and miraculous at the same time. There needs to be an intentional structure that allows an actor to relate his/her own intention and the other's intention to an intention that drives the joint activity (Roepstorff & Frith 2004). In other words, two actors need to share an intention, but they also need to plan their respective parts in order to achieve the intended outcome. This creates a link to philosophical accounts of joint action as described in the introduction, but we will argue below that people only resort to this high level if the simpler functionality described in the previous scenario is inefficacious.
Even though the third assumption sounds quite intricate, there is some empirical evidence providing at least partial support for it. When distributing two parts of a task between two actors, we found that each actor represented not only his or her own part of the task but also the other's part of the task (Sebanz et al. 2003, 2005). Compared with performing the same part of the task alone, acting together led to increased demands on executive control, as actors needed to decide whether it was their turn or the other's turn to act (Sebanz et al. 2006b). Finally, using fMRI (Sebanz et al. 2007), we found evidence that acting together led to increased brain activity in areas involved in self–other distinction (ventral mediofrontal cortex, cf. Brass et al. 2005; Mitchell et al. 2005; Amodio & Frith 2006). Thus, these findings suggest that humans have a strong tendency to take others' tasks (and the related intentions) into account, while at the same time possessing mechanisms to keep them apart. An open question is how joint intentions are formed, and how individual intentions are related to them when two people perform a joint action.
7. Linking the scenarios
So far, we have described different social functions in isolation. However, we believe that their full power only reveals itself once they work in concert. Thus, we do not think of these functions as being contained in relatively isolated modules but as organized in a highly interactive hierarchical network with the simple sensorimotor mechanisms described in scenario 1 on the bottom and the joint intentionality described in scenario 4 on top. This is similar to the assumptions made by hierarchical models of individual action control (Koechlin et al. 2003; Pacherie 2005; Jordan & Ghin 2007). Of course, this implies that the functionality of lower levels is retained when more complex functions arise and that the functionality of the latter depends on the former. At the same time we assume that simpler mechanisms tend to be controlled by more complex ones. As a consequence, the functionality of lower levels is embedded in new control structures and can be used in a more flexible way. In the following, we will illustrate how embedding the functionality of scenarios 1–3 within the intentional machinery postulated in scenario 4 can support different forms of joint action.
Embedding mechanisms for entrainment and simultaneous affordance within joint intentionality allows one to understand a variety of joint actions that require synchronous actions. Examples where joint action depends on entrainment are easily found in domains like music, art and sport. Think of two drummers creating a particular rhythm together or show dancers like Radio City Music Hall's Rockettes moving in synchrony. Some of the studies on interpersonal synchronization described earlier actually presuppose this kind of interaction between joint intentionality and entrainment. Instructing participants to synchronize their actions (e.g. Schmidt et al. 1990) implies that each of them will have the intention of performing the same action as the other participant at the same time. This is usually not discussed in the ecological accounts of social interaction because it would require assuming some form of internal representation of intention, which is square to the fundamental ecological credo (Marsh et al. 2006).
Combining simultaneous affordance with joint intentionality allows one to address the issue of how different actors perform non-identical actions upon the same object to achieve a joint goal. For example, the way people lift a two-handled basket depends on whether they lift it alone or together. When alone, a person would normally grasp each handle with one hand. When together, one person would normally grasp the left handle with his/her right hand and the other person would grasp the right handle with his/her left hand. Thus, embedded in joint intentionality, simultaneous affordance changes into a joint affordance, inviting two different actions from two co-actors. In other situations, joint affordance can help co-actors to determine when one needs the help of the other. This was demonstrated in a recent experiment (Richardson et al. 2007, 2008; experiment 4) where participants lifted planks of ascending or descending length from a conveyor belt by touching them at their ends. Of interest was at which length participants would switch from solo lifting to joint lifting and vice versa. The result of interest here was that the switch occurred as a function of the participants' combined arm span. Thus, the plank's affordance depended on the team's joint action capabilities.
What can we gain from embedding action simulation (scenario 2) in joint intentionality (scenario 4)? The main gain is that it becomes possible to keep apart action simulations that pertain to one's own actions from action simulations that pertain to others' actions (cf. Knoblich & Jordan 2002; Decety & Grezes 2006). The idea is that common codes and the ensuing simulation mechanisms can be used to plan one's own actions as well as to predict others' actions and their outcomes, in parallel and in relation to a jointly intended outcome.
Examples where such parallel simulations would come in handy abound in music, art and sports. Consider two jazz musicians improvising together. Each of them needs to predict what the other will be doing next in order to keep dissonances within the range allowed by a particular style. Likewise, aerial acrobats need not only have exquisite timing, but they also need to predict how their partner's movements unfold. Finally, the happiest moments in watching football arise when the midfielder of the team one supports passes the ball to a spot that the striker will reach before the defenders of the opposing team can catch up with him.
Using a simple tracking task that can be performed alone or together, Knoblich & Jordan (2003) investigated whether teams are able to coordinate their actions with respect to future outcomes of their joint activity as successfully as a single actor performing the whole task alone. The results showed that co-actors took their respective actions into account and learned to reciprocally adjust their actions so that their coordination was almost indistinguishable from the coordination individuals could achieve with their two hands. In this task, good coordination could only be achieved through integrating the effects of one's own and the other's actions into a prediction of the joint outcome. Thus, the findings provide behavioural evidence for the parallel simulation assumption.
A recent brain imaging study where people performed actions identical or complementary to those they had observed provides further support for this assumption (Newman-Norlund et al. 2007). Activation in areas pertaining to the mirror system (premotor and parietal cortex) was stronger when the participants performed complementary actions than when they performed the same action as the one observed. This suggests that the perceived action and one's own action were simulated in parallel. Finally, in behavioural studies using the same experimental paradigm, it was found that participants were as fast at responding to pictures of actions by performing complementary actions as by performing identical actions (Van Schie et al. submitted). This is surprising given that perceiving an action should activate the corresponding motor programme, facilitating performance of the same action. However, the finding can be explained if one assumes that a higher-level intentional structure controlled action simulation.
Finally, what can we gain from embedding joint attention mechanisms (scenario 3) within joint intentionality? Tomasello et al. (2005) provide a detailed discussion of this question, which will not be copied here. In a nutshell, being able to represent one's own and others' intentions allows one to determine whether one's partner has sufficient and adequate perceptual information to perform his/her part of the task. If this is not the case then one can employ attention-guiding gestures such as pointing in order to actively direct the other's attention to locations providing this perceptual information (cf. Liszkowski et al. 2004). For instance, when repairing a bike together, one may point out the location of the screwdriver to one's partner when one sees the other looking around while holding a screw in his/her hand.
8. Beyond immediate social interaction: culture
In the previous sections we have seen how basic social interactions differ in respect to the extent to which others are perceived as acting intentionally. Embedding basic processes of social interaction and action understanding within joint intentionality has allowed us to address a broad variety of joint activities. However, so far we have ignored two main players that have probably revolutionized the ways in which organisms can interact, tool use and symbolic communication. Of course, we are not able to do justice to these players in this final section (nor will we be able to explain how coffee makers get into kitchens). Instead, we will provide two interfaces for our tool-less, non-verbal organism, which may help to get it admitted into a larger society.
Let us start with tools. First traces of tool use are likely to be found in our actors in scenario 2. At this level, individuals may be able to discover that they can use one object to manipulate another. This would lead to an extension of their action repertoire that includes not only pure actor–object relations but also actor–object relations that are mediated by what we would consider a simple tool such as a stick. In fact, there is evidence that macaque monkeys are well able to learn to use tools in order to obtain desirable objects (Iriki et al. 1996; Imamizu et al. 2000) and that apes make use of tools in the wild (Breuer et al. 2005). In accordance with the action simulation account, this extension of one's own action repertoire would probably lead to a corresponding understanding of other actors performing similar tool-mediated actions.
However, scenario 2 does not entail any means for learning tool use through imitation. We suggest that the full machinery for joint intentionality described in scenario 4 needs to be in place before the know-how about tools can be passed on between individuals or generations. In other words, whereas each actor in scenario 2 needs to discover a tool anew, actors in scenario 4 have the ‘intentional equipment’ to find ways of sharing tool-related discoveries. The via regia to achieve this is, of course, imitation. Thus we concur with Tomasello et al.'s (2005) view that imitation of tool use became only possible once joint intentionality was in place and that it went hand in hand with tool-making abilities that created the first artefacts.
Once these two abilities were in place, cultural evolution could thrive. However, it is very important to remember that even the low-level mechanisms described in scenario 1 gain new relevance once cultural transmission starts. The creation of enduring artefacts opened up a whole new world of affordances and ways of interacting with the world in a direct manner. The resulting fact that artefacts embody socially transmitted knowledge about ways of interacting with objects is hardly ever acknowledged in the research on object perception.
Turning to symbolic communication, the potential interface with our non-verbal system is straightforward. We concur with Clark (1996) that language can be regarded as an extremely powerful coordination device for joint action, cementing the self–other distinction, defining different potential roles for actors and extending the temporal horizon of joint activities. Accordingly, joint intentionality would be a prerequisite for symbolic communication. Discussing the many different accounts of language evolution is way beyond the scope of this article. However, we do believe that even symbolic communication remains grounded to some extent in the basic interpersonal functions described in our scenarios.
A study by Shockley et al. (2003) clearly demonstrates how the entrainment mechanism of scenario 1 reappears in conversation. They showed that people talking to each other synchronized their postural sways (micromovements of the body that are needed to maintain an upright body position) even when they could not see one another. This demonstrates that the rhythm of language (prosody) remains coupled to the rhythm of the body.
Studies on mimicry during conversation (Chartrand & Bargh 1999) suggest a link to the action simulation mechanisms of scenario 2. People talking to each other have a tendency to mimic each other's mannerisms such as wiggling one's foot or touching one's face. This can be interpreted as an overt behaviour reflecting a spillover of non-inhibited action tendencies arising through simulation that takes on the function of keeping up the bond between speakers (Lakin & Chartrand 2003).
Finally, the work of Richardson & Dale (2005) provides evidence for the contribution of the joint attention mechanisms of scenario 3 to conversation. They recorded eye movements of speakers talking about the happenings between different characters from a famous TV series while the speakers were looking at their pictures. The eye movements of listeners who could remember well what the speaker had said coincided more closely in space and time with the speaker's eye movements than those of listeners who remembered less. This shows that joint attention can play a crucial role in successful conversation. A similar conclusion can be drawn from Clark & Krych's (2004) finding that people who were attending to the same workspace while performing a joint action communicated less and more efficiently than people who did not share the same workspace. This suggests that joint attention can reduce the need for language use in joint action.
We would like to conclude with an observation that seems almost paradoxical in the context of the present discussion. Social psychologists and sociologists increasingly use dynamical principles very similar to those described in scenario 1 for modelling large-scale interactions that occur on a cultural level. Examples are the spreading of certain opinions and attitudes (Vallacher et al. 2002) or the development of cooperation strategies in a society (Axelrod et al. 2002). Thus, once one proceeds from an individual level of analysis to a societal level of analysis, things seem to start all over again at the most basic, socially blind level.
One contribution of 14 to a Theme Issue ‘The sapient mind: archaeology meets neuroscience’.
- © 2008 The Royal Society