Sign language descriptions that use an analytic model borrowed from spoken language structural linguistics have proved to be not fully appropriate. Pictorial and action-like modes of expression are integral to how signed utterances are constructed and to how they work. However, observation shows that speakers likewise use kinesic and vocal expressions that are not accommodated by spoken language structural linguistic models, including pictorial and action-like modes of expression. These, also, are integral to how speaker utterances in face-to-face interaction are constructed and to how they work. Accordingly, the object of linguistic inquiry should be revised, so that it comprises not only an account of the formal abstract systems that utterances make use of, but also an account of how the semiotically diverse resources that all languaging individuals use are organized in relation to one another. Both language as an abstract system and languaging should be the concern of linguistics.
1. Signing, speaking and language
Modern sign language research, it is commonly agreed, began with the publication, in 1960, of William Stokoe's monograph, Sign language structure: an outline of the visual communication systems of the American deaf . Stokoe, originally a student of Medieval English, worked as an Instructor in English at Gallaudet College (later, Gallaudet University), an institution located in Washington, DC, founded to promote higher education for deaf students. It seemed obvious to him that the signing he witnessed among his students showed much more structure than he had been led to believe, and in the late 1950s he began to investigate this. To do this, he took as an analytic tool the approach to the linguistic analysis of language then prevailing in America at the time. This was the so-called structural linguistic model that had developed within the tradition of Sapir and Bloomfield, especially as it was expounded by Trager & Smith . Their model of language analysis had a strong influence on Stokoe, who writes of how, from personal acquaintance with both of them, he developed the conviction ‘that their methods of linguistic analysis are sufficiently mathematical to apply to a symbol system in a different sensory medium’ [1, p. 3]. Using this approach, Stokoe successfully demonstrated that the communication system he observed could be analysed as if its lexical units, or signs, were composed from a limited repertoire of contrastive features. It seemed to have an organization comparable to the phonological level in spoken language, that is to say. He was able to show that there are consistent patterns of sign combination, showing that utterances using this system were constructed according to a syntax. He was able to claim, thus, that this system, contrary to what was often asserted at the time, had the ingredients of a language. His 1960 monograph was followed a few years later by the Dictionary of American sign language , which amplified these earlier claims. Stokoe was also quite active in promoting his insights about American Sign Language (ASL) in various academic and educational settings. Nevertheless, his initial efforts were met with some scepticism among academic linguists (for example, see the review in Language of Stokoe's monograph by Lander ) and a good deal of resistance among the educators of the deaf, many of whom shared the widely accepted view that signing was a loose collection of pantomimic gestures and could not be considered to have any of the features of a language and so could not be a suitable vehicle for education and intellectual advancement .
Some years after Stokoe's 1960 publication, studies of sign language that followed Stokoe's lead began to appear, though many of them undertaken by psychologists or psycholinguists rather than by traditional linguists. A factor that contributed to this expanded interest in sign language was the announcement by the Gardners, in 1969, that they had successfully taught a young home-raised chimpanzee to use signs derived from ASL . This, for some, created a kind of urgency to the question as to whether or not ASL could really be considered a language because of a widespread and persistent view that only humans can have language. If chimpanzees could be shown to be capable of learning something like a human language, this would challenge cherished beliefs about the nature of humaness. The claims made about chimpanzee language accomplishments must either be dismissed as a fraud (see ), or given very serious consideration. Thus, it became especially important to establish the nature of the system that the chimpanzee Washoe was purported to have learned. The establishment of a unit to investigate sign language at the Salk Institute, under the direction of Ursula Bellugi, was an indirect consequence of this . By the late 1970s, several important publications had appeared (see [9–13]), and these were followed by the first integrated survey of the new understanding of sign language that had now emerged. This was The signs of language by Klima and Bellugi, which appeared in 1979 . This publication was addressed to a wide academic audience. It showed, beyond any doubt, that ASL (linguistic studies of other sign languages were then very scarce), was, in the words of Charles Hockett, structurally and functionally ‘as much like a spoken language as it possibly could be, given the difference in channel’. Hockett, who wrote these words in 1978, was one of the leading American academic linguists of the day and his recognition of the linguistic character of sign language carried considerable weight [15, p. 273].
Hockett drew attention to an important difference between spoken and signed languages, however. These differed, he said, in terms of what he called ‘syntactic dimensionality’. That is, as he put it, in speech ‘the only possible arrangement of words is linear’. On the other hand, in a sign language, ‘there are four usable dimensions, three of space and one of time’. Because of this, sign languages can be iconic to an extent to which spoken languages cannot. He writes: ‘when a representation of some four-dimensional hunk of life has to be compressed into the single dimension of speech, most iconicity is necessarily squeezed out’ [15, p. 275]. If one has a four-dimensional system such as a sign language, on the other hand, much less iconicity is lost. For Hockett, thus, systems such as spoken languages or sign languages do their work with the properties that they have, and he suggests that spoken languages, just because of this linearity that squeezes out iconicity, have limitations that sign languages do not. Nevertheless, he says, because ‘in 50 000 years or so of talking we have learned to make a virtue of necessity’, we have become proud of the arbitrariness of speech [15, pp. 273–275].
The pride that Hockett refers to here is part of what is responsible for the moral loading that the issue of ‘linguisticness’ bears. We like to make a point of the arbitrariness of language, as if this is something that makes it superior to systems that are not arbitrary. Why this should be so, I am not sure. There is a view that iconicity is somehow ‘easier’ than arbitrariness, and when, as adults, we are confronted with learning a new language, we do indeed have to apply a certain conscious mental discipline to the task. This is regarded as virtuous, of course. Hockett seems free from this prejudice, however. For him, it is clear that a system that shows iconicity can be just as respectable as one that does not. For example, he agrees that, in the light of a careful reading of Stokoe's monograph, sign languages have what he calls ‘duality of patterning’—an important property, also, of spoken languages. He adds, however, that ‘[j]ust as speech in any language is characteristically accompanied by various paralinguistic and kinesic effects … so also signing can be accented and punctuated by purely iconic or expressive body motions that lack cenematic structuring’ [i.e. lack a phonology, or something analogous to it] [15, p. 276]. ‘Iconic devices’ in sign language are thus, for him, part of the picture, just as they are in spoken languages.
Hockett's broadmindedness with regard to sign languages was not always found in those others who, at that time, were taking up the study of sign languages in a serious way. As Wilcox has observed , a good deal of the research in sign language that followed Stokoe's demonstration of the structural analogies between sign language and spoken language has involved attempts to show that sign languages can be analysed, at least grammatically, in the same way as spoken languages can be, and efforts have been made to argue that even the iconic or expressive devices that Hockett mentions and which, as he says, lack cenematic structuring, after all somehow do show this. There was an ideological agenda behind these efforts, however, not just a scientific one. This was an agenda that derived from the moral superiority attributed to what is counted as being ‘truly linguistic’. At least, since the middle of the nineteenth century, sign languages had come to be dismissed as unworthy. They were regarded as nothing but loose gesturings or pantomimes, and could not be a vehicle for the intellectual development of the deaf. Among many who were concerned with the education of the deaf, there was an immense prejudice against sign language (at least in the USA) . Accordingly, for those who knew how misguided this was, it became of great importance to demonstrate just how unlike ‘loose gesturing and pantomime’ sign language really was. However, as Wilcox  reminds us, many of the attempts to analyse sign languages just as if they are spoken languages—compelling them, as it were, to fit a model of language reared through the analysis of spoken languages—meant that many features of what signers actually do when constructing utterances either had to be overlooked or represented as something that they were not.
From the beginning of what Battison  has described as a ‘renaissance’ in sign language linguistics (for him this is the decade 1970–1980), Wilcox's observations notwithstanding, there were a number of students of sign language who had already seen that the structural linguistic model, as borrowed unchanged from spoken language linguistics, could not serve as a complete framework for the analysis of sign languages [19–22]. Indeed, Stokoe himself had realized this. For example, he proposed a technical terminology for the analysis of the structure of signs that was different from that used in spoken language phonology to accommodate his view that the lexical units of sign language are structured as simultaneous configurations of features, rather than as sequences, as spoken language phonology dictated . He also thought that the separation of phonology from morphology, which is insisted upon in spoken language linguistic analysis, could not be applied to sign language, and later wrote of what he called ‘semantic phonology’ . Other early students of sign language, for example Boyes-Braem  reached a similar conclusion. She writes of the ‘morpho-phonemics’ of sign language, showing that there are consistent relationships between handshapes used in a sign and sign meanings, suggesting that these derive from the visual metaphors that the language selects in building much of its lexicon.
More recently, the notion that the structural–linguistic framework should be modified for the description of sign languages has become more widespread . It is becoming recognized that gradient or analogical forms of expression, the use of pantomime and pictorial depiction through bodily movement, spatial inflections of individual signs and of units of signed discourse, and the possibilities of complex simultaneities in expression, all play integral roles in signed discourse. This recognition has led linguists such as Liddell  to suggest that perhaps our conception of language in general is too narrow, and should be revised. He argues that ‘spoken and signed languages both make use of multiple types of semiotic elements in the language signal’ (p. 332), adding that once we recognize this then we must agree that what is widely accepted as ‘language’ excludes much of what should really belong there. Similarly, Johnston et al. have written: ‘Rather than being homogenous systems as commonly assumed [so that] all major elements of signing behaviour are equally part of a morpho-syntactic system, signed (and spoken) languages may be best analysed as essentially heterogeneous systems in which meanings are conveyed using a combination of elements, including gesture’ [26, pp. 197–198].
As my quotations from Charles Hockett show, the semiotic heterogeneity of human communicative action is something that had long been recognized. In the history of what has been distinguished as the functional approach to spoken language (see ), there has always been a recognition that what speakers do, over and beyond the write-downable or scriptable words that they utter and the contexts in which they do them, may be crucial to understanding how such words work as bearers of meaning. Notwithstanding, it remains that, for the most part, there have until quite recently been few attempts to develop a systematic understanding of what the different semiotic resources are, how they work in relation to one another, and what governs their deployment. An important reason for this has been that, for a long time, the various voicings, voice qualities and speech tunes and tempos, as well as the visible actions encountered in utterances, could not be ‘fixed’ in a way that would allow them to be inspected and analysed. Once audio and, a little later, audio-visual recording techniques that make this possible did become available, the question of how to describe and transcribe what could be observed became more urgent. As yet, however, although established techniques for analysing many aspects of the use of voice in speaking have been available for a long time, no methods of analysis have yet been developed that have gained general acceptance when visible bodily actions are also to be considered.
If we accept, as surely we must, that utterances produced by living languagers (speakers or signers—see p. 36 in ) in the ordinary co-present circumstances of life—diverse as these may be—always involve the mobilization of several different semiotic systems in different modalities and deployed in an orchestrated relationship with one another, then we must go beyond the issue of trying to set a boundary between ‘language’ and ‘non-language’, and occupy ourselves, rather, with an approach that seeks to distinguish these different systems, at the same time analysing their interrelations. Liddell , as we noted above, drawing on insights gained in his studies of sign language, suggested that the definition of ‘language’ has for a long time been too narrow. Yet, a system to which the term ‘language’ is often applied (the formal system, to apply Dik's terminology ), which Liddell feels is too narrowly defined, can certainly be isolated. Extracting just those aspects that can admit of phonological and morpho-syntactic analysis in the structural tradition remains central in the linguistics of both spoken and signed languages. However, this system must be seen as only part of the story. A more comprehensive understanding of how utterers achieve meaningful utterances will require that we incorporate in a systematic way these other systems that do not admit of a formal-linguistic analysis.
2. Manual action in speaker utterance construction
I now wish to refer back to what Hockett termed ‘syntactic dimensionality’. As already mentioned, by this Hockett meant to refer to the ‘geometry of the field in which the constituents of a message are displayed’ [15, p. 274]. In a sign language, this geometry includes the three dimensions of space as well as the dimension of time. As Hockett argued, this means that sign languages can be iconic to an extent that is impossible for spoken languages. However, besides this, it is also important to consider the anatomical capacities of the instruments by which sign languages are produced. Signers make use of two hands, they make use of the head and face, gaze direction and bodily orientation. And because these various instruments of sign production can, to some extent, at least, be used differentially, this makes it possible for some body parts to persist longer in some actions than others, or for something to be done with one body part while something else is being done with another. Accordingly, in a signed discourse, signs not only may follow one another successively in time, but also various kinds of significant action can be produced concurrently. This means that a construction in sign language can involve several different components that overlap with one another in time. Signs need not only be produced in linear order, one sign at a time, as words must be. They can also enter into what have been called simultaneous constructions (see  for many examples).
However, speakers also have these same anatomical resources available to them and in making use of them, perhaps they also can be seen to produce utterances in which several different expression units are performed at the same time. They can move their hands differentially, they can engage in head movements and in actions of the face, and they can do all this while they are speaking. If we pay attention to the visible bodily actions that speakers engage in when they speak, then we can find plenty of examples in which, in various ways, such visible actions enter into the creation of the speaker's meaning, and do so in ways which, from the point of view of how the speaker's propositional meanings are arrived at, are comparable to the ways the morpho-syntactic components manifested in speech do so. Just as we may find ‘simultaneous constructions’ in signers, that is to say, we can also find them in speakers.
For the most part, however, the visible bodily actions of speakers are not looked upon in this way. They are not usually counted as being part of talk (as one might put it), because it seems generally to be supposed that, when a speaker has something to say, he aims to say it in words only. To the extent that he does not do so, this often seems to be seen as a kind of failure. In much research on the visible bodily actions of speakers, these are approached rather as if they are auxiliaries to the process of word production. For example, the hand movements speakers make have been said to help the speaker find needed words , they help in packaging the speaker's thoughts so that they can be formulated in words , or they may make manifest the speaker's ‘mental imagery’ , but they are less often looked upon as components of the speaker's final product, they are less often seen as integral to the utterance that the speaker constructs (see ). This is because it seems that a speaker's visible bodily actions, even when they contribute to the speaker's meaning, can, in the end, be done without. All that really is deemed to matter are the words that can be written down. Of course, these visible bodily actions may be interesting and illuminating from the point of view of what they may reveal about the speaker's mental processes or otherwise unobservable mental imagery, but they are not regarded as part of the talk itself, because, it seems, we can almost always make ourselves clearly understood in words alone.
For example, a speaker can always re-say what has just been said in another way, re-saying in a more fully verbal way what had previously just been said using visible actions as well as words. Because of this, the visible actions speakers use as part of their utterance tend to be less constrained by conventions of performance and so may show a good deal of individual and situational variation. Accordingly, where spoken words are also used, these visible actions tend to be less codified and regularized in their forms and uses. This is why they are usually not seen as part of what is said and, in consequence, they are not treated as part of the language, if by this is meant a shared, relatively stable, socially instituted system of vocal expressions. Thus it is that, for spoken language linguistics, the visible actions employed by a speaker are seen as something on one side, either as a part of what is termed paralanguage, or else as something spontaneous and idiosyncratic, providing a way of observing otherwise hidden aspects of the processes of utterance formation, but not a part of the utterance as the speaker constructs it. As such, from the point of view of building a description of what are the parts of the system that show least dependence upon individual idiosyncrasies, visible bodily actions tend to be seen as less deserving of serious attention.
For the signer, on the other hand, who really has only visible bodily action to rely on to construct utterances, the things done with head and eyes and face, as well as with arms and hands, are all indispensable for making a fully meaningful utterance. Accordingly, in looking at how signers construct utterances, all aspects of visible bodily action that are involved are open to being included in what is taken to be the signer's language. What is counted as being a part of language, thus, may be different for signers from what it is for speakers, depending upon how we wish to use this term. Yet, the visible bodily actions that speakers use and which are deployed as part of utterance construction are often quite similar to many of the expressive practices followed by signers. From the perspective of an approach that seeks to understand how producers of utterances achieve semantically significant packages of action and that takes into consideration the full range of semiotic forms that are used in this, it will be seen that, as far as visible bodily action used in utterance construction is concerned, signers and speakers share many things in common [33, pp. 307–325]. However, because signers can only use visible bodily action when they produce utterances, they have only this medium to share in developing a language. In signers, therefore, a much wider range of visible bodily action becomes stabilized and regularized for utterance use than seems to be the case for speakers.
3. Illustrations of what speakers do when producing utterances
It will be useful now to look at speakers engaging in utterance production to illustrate some of the ways in which, in doing this, the different resources and capacities of speech and visible bodily action are used in conjunction. In these examples, we shall see how visible bodily action can enter directly into utterance construction in a number of different ways. We shall see that it does so, not as an auxiliary or an add-on, but as an integral part of how the utterance was constructed in that occasion of speaking.
The resources offered by visible bodily action that I shall examine here are certain kinds of hand movements that speakers often make. Speakers do not use only their hands when talking, however. They also make movements of the head, the face, often of the whole body itself, changing its posture or how it is positioned and oriented with respect to the others with whom the speaker may be in interaction. These aspects, too, play various roles of importance in shaping the utterances of which they are a part.
For example, speakers embark upon utterances when co-present with others, typically in those moments when, within the flow of activity in the occasion of interaction, they are given or they obtain from others a ‘slot’ or a ‘turn’ to do so. Utterances are usually embarked upon when the speaker has an audience. Accordingly, in the organization of an utterance, there is always an aspect of it that may be called an address. That is, an utterance always includes an indication of its intended audience. Utterances are always constructed for others—whether for specific others, for several others simultaneously, for non-present or virtual others, or even just only for the speaker (as when talking to oneself). For whom, the utterance is produced is always a feature of its construction and, with respect to this, bodily posture, head orientation and gaze direction all play very important roles.
Such utterance framing functions (as we might call them) can, for analytic purposes, be kept distinct from those aspects that serve to provide for what may be called the content of the utterance. That is, it is useful to distinguish between what is said from for whom it is said, and it is with this what that I am mostly concerned. And although visible bodily actions in the torso, head and face can and do play roles in what is said in an utterance, here I shall concentrate upon the way hand actions interact with what is spoken in the production of content.
Here is a simple example to provide an illustration for some issues I would like to raise. In figure 1, three moments taken from a video of MC (right) who, with his wife (S, middle) and a friend (A, who stands opposite him) is describing some things about the grocery shop his father used to own, and some of what he did in running it. Here, he is talking about how the cheeses his father sold in the shop were packed when they arrived. If you show this video-extract to observers without letting them hear what is said, all recognize that MC is talking and they recognize that the lifting of his hands away from the table, in figure 1b and c, are hand movements made as a part of his talking (I rely here on classroom experiments and a study not yet completed). They are seen as voluntary movements, done as part of MC's description of something. Observers cannot tell what he is describing but they do suggest that, with the hand action depicted in figure 1b, he is showing the length and depth of some object. The hand actions depicted in figure 1c, in which both hands with index fingers extended are moved in a linear fashion, diagonally outward and downward, and then inward and downward, tend to be seen as movements that depict the shape of something. In other words, these movements have a recognizable semantic character and their meanings can be understood, to some extent, in very general terms.
As soon as these movements are perceived in conjunction with the concurrent speech—the normal circumstance, of course—they are understood in a much more specific way and their role within the utterance then becomes clear. For example, as the speaker in this clip lifts up and extends his two hands forward, hands open, palms facing one another (figure 1b), he says ‘and the cheeses used to come in big crates about as long as that’—it is immediately understood that the hands held out in the way they are, previously recognized as engaging in an action that shows the size and shape of something, are now taken to be showing the size of crates used for cheeses. The hand action is taken to refer to the ‘crates’ rather than to ‘cheeses’, because it is ‘crates’ that are understood as having length and breadth (and the hand action is seen as showing length and breadth of a static object) and because they coincide with the expression ‘about as long as that’. As he performs the actions depicted in figure 1c, still talking of the crates, he says ‘an’ they were shaped like a threepenny bit’ (here he refers to the shape of 12-sided three penny coin that was in circulation in Britain until 1972). In his words, thus, he talks about the length of the crates, and he describes the sort of shape they had, whereas his hand actions are now seen as showing the length and the shape. It is as if he is using his hands to draw sketches of the objects he is talking about and, by means of these sketches, he adds a kind of description, allowing, perhaps, the nature of the objects to be envisaged in a more precise way than the verbal description by itself might allow. The total meaning of what he is now saying is a product of an interaction between the meanings of his verbal phrases and the manually sketched illustrations that go with them. This is an example of what Enfield  has called a composite utterance. This seems clear enough. In order to take this further, however, what needs to be developed is an understanding of how such composites work. What are the principles according to which they are constructed?
Here, there are at least two questions that need to be explored: first, there is the question of how the hand movements achieve their semantic functions. What are the representational principles that are followed in their production? The second is the question of how the semantic interaction between the spoken language constructs and the kinesic constructs is brought about. Here, we need to understand the nature of the semantic coherence that is established between the hand actions and the co-occurring spoken expression, and also how this coherence is established. We are dealing with a semantic interaction between the spoken expressions and the speaker's manual actions, through which a combined meaning comes about.
With regard to the first question, a certain amount of systematic investigation has been undertaken. Thus, Mandel  has analysed so-called iconic devices in sign language. I made use of this work in my own analyses, conducted in relation to two different sign languages, one a primary sign language from the Papua New Guinea highlands , and the other an alternate sign language in use among the Warlpiri and other groups of the north central desert regions of Australia [36,37 ch. 6]. This was also made use of in a discussion of highly conventionalized gestures used among speakers  (so-called emblems; see ). Müller  has undertaken work on what she calls ‘modes of representation’ observed in forelimb actions used by speakers in conversation, and Sowa  has likewise examined the representational devices speakers use when using their hands in describing complex objects. The complex work of Calbris  must also be taken into account. All of these investigators, in different ways, have developed analyses of how the forelimb actions used in movements that are regarded as representing the features of an object or an action, achieve such representations.
From this work, a number of general points have emerged. The hands, when used as part of an utterance, are intelligible mainly because they are seen as manipulatory actions acting in a virtual world (see also ). The hands act upon this virtual world in various ways, and the objects or actions that they are understood as evoking or depicting arise from an understanding of the objects implied by these manipulations. Seeing the hand act in a certain way, the object in relation to which actions of this sort might be performed can be envisaged.
Thus, a hand held forward with the fingers posed as if grasping something suggests features of the object being grasped, such as its size, its weight and how it might be structured to enable it to be held or grasped.
Two hands held as if they are placed at either end of an oblong object provide the basis for imagining such an object, and thus understanding its length and also, perhaps, it breadth or volume (as in figure 1b).
If the hand held with fingers extended and palm down is moved in a linear fashion, such actions may be understood as an action on a surface, modifying it to make it smooth and flat, or it may be understood as representing an object moving along an already existing flat surface. In either case, the notion of a flat surface tends to be evoked. If the movement is performed in an upward or downward sloping fashion, then a sloping surface may be envisaged.
If the hand is moved with a well-defined trajectory through space which involves changes in direction, with only the index finger extended, then it is seen as tracing a line in a virtual medium—in this way, speakers can create virtual sketches of shapes (as in figure 1c).
A hand similarly posed with index finger extended, but which engages in a simple linear movement outward and away from the speaker, may be seen as ‘pointing’.
A hand with fingers extended and adducted, held with palm facing upwards and moved outward into the shared interactional space may be seen as an action of offering or as an action of holding the hand out to receive something.
Sometimes, the hand itself is configured so that its action suggests that the hand itself is an instrumental object of some kind—as when a flat hand is moved as a blade might be moved in cutting something. The hands may also be used in such a way as to suggest not an object being manipulated, nor that the hand itself is an object being moved, but, rather, a pattern of movement. Here, the hand does not perform a manipulation; it is no longer a hand acting on something but serves, rather, to represent ‘something that moves’. In this way, a movement pattern can be depicted.
These (and other) representational practices that I have just described are widely shared and are subject to varying degrees of social conventionalization. Some forelimb utterance actions may become so standardized that they acquire meanings that may be glossed with stable verbal expressions (often known as ‘emblems’ ), and, as such, are sometimes used as substitutes for spoken words in some contexts. In this case, we have something comparable to a lexical sign in a sign language. The hands can also be used to provide conventional representations of graphic signs—they can depict symbolic objects that are already established in other media, as in the thumb and index posed to from a circle, which is taken to mean ‘zero’ in some parts of Europe, the so-called V-for-victory hand shape or, perhaps, the fingers being held up to represent numbers of objects.
The second question raised above is: how, in speakers, are the meanings of utterance hand movements and the meanings of associated spoken expressions combined? This has received much less systematic attention. The processes involved in this semantic interaction are usually simply taken for granted. However, some thought has recently been given to this issue in the work of Engle  (who, in collaboration with Herb Clark, first put forward the notion of the ‘composite utterance’), Enfield  and by Lascarides & Stone , who have tried to analyse the processes by which semantic coherence is established between utterance hand actions and spoken expression. In the examples now to be described, we will make use of ideas suggested by this work in explicating them.
The examples that I now describe are intended as illustrations of just some of the different ways in which meanings expressed in speech and meanings expressed in utterance forelimb actions interact (see [33, pp. 176–198] for an earlier and fuller discussion).
I begin with a very simple example. When the clip from which figure 2 is taken is viewed without sound, the hand action of the speaker is understood as being done to show the length or size of something. The speaker in this case is acting as a guide to an archaeological site. He has just talked about some large beams that were found in an ancient swamp. He then says ‘And underneath that they found a huge bronze spearhead, it was wedged underneath’. As he says ‘they found’ he lifts his hands up, and as he says ‘a huge bronze spearhead’, he holds his hands forward, index fingers extended, in a manner that most people recognize as being either a ‘length-specifier’ or a ‘size-specifier’ action. Because this is done coincidentally with the nomination of an object—‘huge bronze spearhead’—the size-specifer action is taken to refer to that object, which in any case is the only object referred to. The size-specifier action here provides the limits to be set on how the adjective ‘huge’ is to be interpreted (figure 2).
The following example (figure 3) is similar. The speaker is talking about the chef whom his father knew at a local hotel, who used to favour him with some of the soup he made for the hotel. He says ‘We used to get soup from him as well, he used to make lovely soup, an’ he used to give us, in them days he'd a (…) a two pint milk bottle, and uh, he'd fill this full of soup and uh we'd bring it home for dinner’. As he says ‘ 'ad a (…) a two pint milk bottle’ he lifts up both hands, one held palm facing down toward the other hand, palm facing up. He then moves them so the palms of both hands are facing each other. Seen without sound, most people recognize this as an action that suggests the height and width of some cylindrical object, longer than it is wide, positioned in a upright fashion. This is suggested by the vertical space evoked between the speaker's hands in figure 3a (compare the speaker's hand positions in figure 1b), and hand shapes used in figure 3b, which suggests holding something with round sides rather than flat sides.
Carried out in conjunction with naming the object, the ‘milk bottle’, this demonstration is taken to refer to this object. With his hands, the speaker provides a diagram or sketch of the milk bottle's height and width.
Such object–dimension demonstrations are commonly linked to the object nominated in speech because of their juxtaposition with the nominating expression, but on occasion the speaker will treat these demonstrations as if an object has been created which can now be referred to with a deictic expression. Thus, in the following example, the same speaker we have just seen tells about how the chef also used to make his family a rabbit pie. He says: ‘He used to make us a pie (…) like that (…) a rabbit pie’. In the pause that follows ‘pie’ he leans forward over the table and places his hands, index fingers extended, in the start position for sketching out a rectangular shape over the table surface (very similar to what we see in figure 4). Anyone seeing this action recognizes this as ‘shape–sketch’ action and would recognize the size and shape depicted. As the speaker does this action, he says ‘like that’—in this way establishing the object that he has sketched as a depiction of the pie he has just mentioned. He then further specifies the pie by saying ‘a rabbit pie’.
The space delineated by an action of this sort may then be treated as if the object nominated persists after the depictive action has been completed. In an example similar to the one just described, this same speaker talks about a large Christmas cake that his father, who used to own a grocery shop, had sent down from London at Christmastime, and which was displayed in the shop, and customers could ask to buy pieces of it. The speaker says ‘Every Christmas, he used to have sent down from London, a Christmas Cake, and it was this sort of size’. As he says ‘and it was this sort of size’, the speaker leans forward over the table in front of him and, with both hands, index fingers extended, moves them together over the table in such a way as to sketch out a large rectangular area. Conjoined with ‘an’ it was this sort of size’, this is taken as a representation of the cake (figure 4). He then says: ‘an’ you cut it off in bits'. As he says this, he lifts his left hand, now all fingers extended, oriented so palm is vertical, and, lowers it toward the table, within the space previously delineated for the cake (figure 5). The lowering of the hand is conjoined with the verb ‘cut’, this action being seen as doing a cutting action, as if with a knife. The speaker is thus treating the area on the table where he had sketched the cake as if it is still occupied by it, as if the cake is there and can be acted on with a knife.
Here, then, the utterance contains a verbal component, which names an object and an operation that is performed upon this object, whereas the manual actions show the dimensional features of this object and show something of how the operation of ‘cutting’ is performed on this object. Suggested by the hand shape used with the action associated with the verb ‘cut’, one may note, is the idea of a broad-bladed knife being inserted into the virtual cake in such a way as to imply that the ‘bits’ would be oblong blocks of cake rather than thin slices (figure 5).
Utterance forelimb actions are also often used to modify a verb or verb phrase, commonly by suggesting the manner of action referred to. The ‘cutting’ hand action just described illustrates this. As a further illustration, here is a contrasting pair of examples, again from the same speaker, in which hand actions are used in conjunction with a verb phrase involving the verb ‘throw’. The speaker does not specify the manner of throwing verbally, the associated hand actions seem to do that.
In the first of these two examples (figure 6), the speaker is talking about his father (the grocery shop owner). He is describing how his father stored the cheeses he was to sell in the cellar and how, as the cheeses ripened, they exuded moisture. To absorb this moisture his father used ground rice, which he threw over the cheeses. In the passage shown here, he says ‘an’ he used to go down there an’ throw ground rice over it’. Just as he says ‘throw’, he lifts up his hand, palm up with fingers curled over, and extends his wrist twice in succession, the second of these extensions being done in a very small pause that follows his pronunciation of ‘throw’.
Seen without speech, this action tends to be recognized as an action of scattering something, such as a powder or sand. Seen conjoined here with ‘throw’ in the sentence ‘throw ground rice over it’ we see how the hand action sets limits upon the interpretation of the verb ‘throw’. It shows us what sort of ‘throwing’ it was.
In the second of these two examples (figure 7), the speaker again uses the verb ‘throw’, but this time, the hand action associated with it is quite different. Here, the speaker is talking about American airmen who, during the Second World War, were stationed at an airfield not far from the little town where the speaker lived as boy. He describes how, at the end of the war, when they were preparing to go back to America, the airmen used to drive their lorries through the town and throw things such as oranges and chewing gum into the streets for the children to pick up. He says ‘an’ they used to come through Oundle and throw oranges and chewing gum all off the lorries to us kids in the streets’. As he says ‘oranges and chewing gum’, he extends his full arm, lifting it up and moving it back rapidly, once as he says ‘oranges’ and again as he says ‘chewing gum’.
Seen without speech, this arm action is understood either as an action of pulling or throwing something backwards, behind the speaker. Conjoined with the phrase ‘throw oranges and chewing gum’, it is taken to refer to the action of throwing objects away in an undirected manner. Here, the arm action refers to the kind of action performed with the oranges and chewing gum and suggests a kind of throwing action for which the English verb ‘chuck’ or ‘toss’ might be appropriate. Note, by the way, these arm actions are not performed as the verb ‘throw’ is pronounced, but as the objects of the verb's action are nominated. The rhythmic coordination of each of the two arm actions with the stressed syllables of ‘ORAnges and CHEWingum’, respectively, which are spoken with an even rhythm, suggests the repetitive or continual nature of the soldiers' actions. The arm actions here (and also manner of speech) thus provide a depiction of the manner of throwing and of its aspect as a continual action.
Hand and arm actions can also combine with verbal object nominations in such a way as to refer not to features of the object, but to how that object is treated or processed in relation to an account of some process or sequence of operations. For example, a Neapolitan speaker is giving an account of how she makes spaghetti bolognese (figure 8). As she names each ingredient, she executes a hand movement that is understood to be a form of action, interpreted here as indicating how the ingredient is acted on as it is prepared for the dish. Thus, listing the ingredients, she says: ‘Un po’ di sedano, 'na bella carota, con la cipolla—A little celery, a nice carrot, with onion’. As she says ‘un po’ di sedano’, she places her right index finger just above the wrist of her left arm in an action which is used to indicate a ‘short length of something’ (cf. [28, pp. 330–331])—so here she says, in effect, ‘short piece of celery’. As she says ‘ 'na bella carota’, her right hand, now with all fingers extended, is rapidly moved down and up five times in a ‘chopping’ movement above the extended fingers of her left hand. In this way, she indicates that the carrot is chopped up, so it is as if she said ‘carota tritata—chopped carrot’. She repeats this action as she says ‘con la cipolla—with the onion’—again indicating ‘chopped onion’. In this way, the hand actions add information about how the celery, carrot and onion are prepared.
Finally, here are two further Neapolitan examples, which illustrate two additional ways in which hand actions combine with speech to result in a conceptually more complex expression.
In the first of these (figure 9), a speaker uses a manual expression to express a concept that is an implication of his verbal expression. The speaker, who is a bus driver, is describing how young people behave on the buses he drives in the city of Salerno. He complains that the boys write obscene phrases on the backs of the seats of the buses and they do this in full view of the girls, who laugh and join in this fun initiated by the boys. He is shocked that the girls enjoy this. He says of them: ‘Sono contente, quindi sono consapevole anche loro. Gli sta bene anche a loro—They are happy [about it], hence also they are aware [of it]. It's OK also for them’. An implication of what he is saying is that the girls participate equally in this activity with the boys. This implication is given explicit expression kinesically. While he is uttering the above words, he lifts up both hands, index fingers extended, and moves them towards one another so that they make contact, repeating this movement several times (figure 9). This action of placing two extended fingers in side-to-side contact in this fashion is a very well-known expression, widely used in southern Italy. It is understood to refer to two people who are very close, as allies or as lovers, or, more generally, to refer to two things that are equal or the same. This expression, already described with these meanings in the nineteenth century by De Jorio [28, p. 90], is here used to add to the speaker's verbal expression the idea of equality of participation of girls with the boys.
In the second of these last two Neapolitan examples, a speaker uses a manual expression that is commonly recognized as expressing negation. Here, however, it is used in conjunction with a verbal expression that makes a positive assertion. The manual action is one in which the hand, palm facing downwards with all fingers extended and together, is moved rapidly and horizontally away from the speaker's midline (for a description of this expression, referred to as a ‘ZP’, and a discussion of its contexts of use see [33, pp. 255–264]). The example comes from a conversation in which the speaker, a native resident of the historical centre of Napoli, has been asked to explain what part of the city she thinks is ‘true Naples’. She mentions the streets that bound the area that she considers to be ‘true Naples’ and then adds ‘proprio questa qua è Napoli—just this here is Napoli’. As she says ‘è Napoli—is Naples’, she moves her right hand laterally in the manner described (figure 10). Here, this adds to what she is saying, the idea that her delineation of ‘true Naples’ is this part of the city, and nothing besides. She might have said (using a common Italian expression) ‘Questa qua è Napoli, e basta—This here is Naples, that's all’ (basta means ‘enough’, ‘sufficient’). The horizontal palm down hand action here serves to add this idea that there is no more, nothing besides what she has said, that is Napoli.
Here, a kinesic expression of negation is used as a way of intensifying an assertion by denying in advance, as it were, possible rival claims (see [33, pp. 262–264]). Once again, we see how an abstract notion is added kinesically to a verbal expression to create a more complex expression.
The examples I have described exemplify some of the ways in which manual actions appear to work semantically in relation to spoken expression. There are three points I want to emphasize in conclusion.
In the examples shown here, we have concentrated on the use of manual actions in association with speech. Regarding these manual actions as components of utterance, I have sought to draw attention to the point that in serving as components of utterance, these actions have their own semantics. I have stressed that they are intelligible, and I have sought to show in what ways they are. What is clear from the examples is that they are not elaborate pantomimes of specific action sequences nor carefully constructed shape descriptions of specific objects (although you can find examples in which speakers come very close to this). Rather, they are highly schematic actions that refer to very general or abstract concepts such as ‘flat surface’ ‘roundness’ ‘length–breadth’ ‘going down’ ‘going up’ or, as in our ‘throw’ examples, schematics that refer to different kinds of ‘throwing’ but in an abstract general form. To a very considerable extent, manual actions of speakers that are deemed ‘representational’ (sometimes called ‘iconic’, following McNeill ) are kinesic representations of abstract concepts. Furthermore, these kinesic representations follow patterns and principles that are to varying degrees conventionalized. They are not de novo productions, but are constructed according to certain principles and may often draw upon something like a repertoire of forms that the speaker already has access to and which is shared. Like words, they provide clues to concepts and, taken together with verbal expression, they allow the speaker to assemble sets of concepts in particular patterns that can serve to make complex meanings available. To the extent that this is so, they share features in common with lexical expression. As I once argued , such actions, from a semiotic point of view, show a wide range of forms, all the way from being actual equivalents of spoken expressions (and may even be used in substitution for them), to being closer to action–mimicry or picture-like expressions.
I have sought to draw attention to the issue of how the semantics of these manual actions of speakers interact with verbal semantics. There are two issues, closely related. First, there is the question of how the speaker's manual actions get ‘assigned’ to concurrent words. Second, there is the question of how the meanings of the two forms of expression combine. In the examples described, I have illustrated several different kinds of combination. Thus, we illustrated (i) verbal object nomination with visible action providing dimensional information through action, where the action is treated as if it provides a version of the object that can now be referred to with a deictic expression; (ii) visible action expressing verb manner (and aspect); (iii) an ‘additive’ relationship in which the hand action performed as an object is named indicates an action on the object, which changes its state or condition; (iv) another kind of ‘additive’ relationship where a concept implied in the verbal discourse, but not verbally expressed, is expressed by visible action that refers to that concept; (v) a final example, in which the visible action serves as kind of negative, here used (in our interpretation) to forestall any alternative ideas that a recipient might have that would counter the assertion that the speaker is making.
If our interpretations of these combinations can be accepted, it will be seen that, in the examples given, the speakers are giving expression to several ideas at once. In doing so, they partially overcome the constraints that the linearity of speech imposes on the expression of thought. Thoughts come in ‘wholes’, yet to express them in spoken language they have to be unpacked and organized in certain ways (McNeill  has suggested something like this in his notion of the ‘growth point’ and its unpacking). When we take the manual actions of speakers into account, however, we find that they make it possible for speakers to get around this difficulty, at least to some extent, by exploiting the additional ‘syntactic dimensionality’ that movable body parts provide. Because, among speakers, the primary focus of attention is upon what is spoken, these other dimensions of expression are more often backgrounded. Yet they are always there, and form an available resource for speakers that can be exploited whenever needed (and can also be exploited by addressees, of course).
More generally, then, my examples illustrate the point that speakers quite routinely take advantage of their several different anatomical resources and exploit several different semiotic practices as they go about constructing utterances. ‘Parallel construction’ is not something that only signers engage in. For signers, however, because visible action is all that is available for linguistic expression, the ‘parallel constructions’ that anatomical structure makes possible become part of how sign language works as a linguistic system. For speakers, on the other hand, spoken expression in itself offers enough flexibility and complexity to mean that, at least if enough time is available for verbal formulation, what is done in parallel, using visible action, can often be left on one side. For this reason, the uses of visible action by speakers that make these kinds of constructions possible has not been considered a part of ‘language’ when this is viewed as an abstractable formal system that exists as a social institution. On the other hand, if we approach ‘language’ as something that people engage in, something that they do, and consider how units of language action or utterances are constructed, then the resources of visible action as used by speakers, as well as used by signers, must be considered as a part of it, and from this point view they may be included in the purview of ‘linguistics’.
6. Closing comment
I began this paper by discussing how sign language descriptions that used an analytic model borrowed from structural spoken language linguistics were not fully appropriate. This led to the idea that the concept of ‘language’, as it developed in academic linguistics in the first part of the twentieth century, is too narrow. If sign languages are to be considered true languages, and yet they are found to use modes of expression that cannot be accommodated by models derived from the description of spoken languages, then these models should be revised and our concept of ‘language’ should be changed, accordingly.
This, in turn, has suggested that spoken languages may also deserve a new model. In recent decades, it has become a commonplace to observe that, when speaking, speakers do more than utter words. They also engage in various kinds of visible bodily actions that are integrated with the activity of speaking. If this is looked upon from the point of view of how these actions contribute to the utterance as the speaker constructs it in the moment of interaction, a point of view I have tried to put forward here, it becomes clear that speakers also make use of the dimensions of expression that visible bodily action makes possible. Often this is done in ways that can be compared with the ways signers make use of these dimensions. A new model of language that might incorporate these aspects, however, would be a model that would accommodate language as a mode of action, rather than treating it as an abstract, quasi-static social institution. That is to say, languaging, or doing language, would become the object of study [cf. 47]. In such a case, how visible bodily action is used in utterance construction by speakers becomes as much a part of the study of speakers as, necessarily, it is already a part of the study of signers.
All figures in this paper are reproduced from drawings made by the author. Drawings in figure 1 and figures 4–10 have previously appeared in Adam Kendon Gesture: visible action as utterance. Cambridge: Cambridge University Press, 2004, and are reproduced here with the permission of the publisher.
One contribution of 12 to a Theme Issue ‘Language as a multimodal phenomenon: implications for language learning, processing and evolution’.
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.