How long did it take you to read this sentence? Chances are your response is a ball park estimate and its value depends on how fast you have scanned the text, how prepared you have been for this question, perhaps your mood or how much attention you have paid to these words. Time perception is here addressed in three sections. The first section summarizes theoretical difficulties in time perception research, specifically those pertaining to the representation of time and temporal processing. The second section reviews non-exhaustively temporal effects in multisensory perception. Sensory modalities interact in temporal judgement tasks, suggesting that (i) at some level of sensory analysis, the temporal properties across senses can be integrated in building a time percept and (ii) the representational format across senses is compatible for establishing such a percept. In the last section, a two-step analysis of temporal properties is sketched out. In the first step, it is proposed that temporal properties are automatically encoded at early stages of sensory analysis, thus providing the raw material for the building of a time percept; in the second step, time representations become available to perception through attentional gating of the raw temporal representations and via re-encoding into abstract representations.
Time presupposes a view of time. It is, therefore, not like a river, not a flowing substance. (Merleau-Ponty 1945)How does ‘physical time’ relate to ‘experiential time’? The classic ‘river metaphor’ (Jackson & Michon 1992) describes time as a continuous and coherent unfolding of events from past to present to future. The Newtonian time and the laws of entropy (e.g. the ageing of the body) exemplify the irreversibility of an absolute time (Ruhnau 1994; Sachs 1996). Time as we perceive it is directional: if a glass shatters on the floor, there is no means to reverse this effect except through imagination. In the theory of relativity, there is no absolute time per se (Ruhnau 1994) but a (causal) relationship between events occurring in the same reference frame. In perceiving time, the observer is the reference frame (Van De Grind 2002); the past and the future only exist with respect to a referent, the perceived present. The necessity of a referent or observer was long intuited: for e.g. Aristotle (350 BC) asked whether there would be time if there were no soul, suggesting that an observer is necessary for time to emerge while casting doubt on the existence of an absolute time. Additionally, Kant (1781/1997) suggested that ‘simultaneity or succession would not themselves come into perception if the representation of time did not ground them a priori’, thereby laying theoretical grounds for the existence of internal operations underlying the temporal structuring of perception.
The operational and internal constraints that shape perception are crucial for one's conscious appreciation of the world: suffice to consider the seminal thought experiment on what being a bat is like (Nagel 1974). From an evolutionary standpoint, the temporal constraints that primates are endowed with may be very different from, say, those of a snail for which Mozart sonata would have to be played so slow that it would lose any perceptual coherence for the human ear (Brecher 1932; Blumenthal 1977). With respect to time perception proper, the matter is complicated: whereas sensory receptors receive multisensory information over time, there is no specialized receptor for the transduction of time. This is perhaps consistent with the notion that there is no absolute time out there, at least at the level of scrutiny of the perceptual systems. Thus, time perception should not even exist (Pöppel 1997) but it does, somehow extracted out of external events (Gibson 1975) and internal ones, and is crucially shaped by the anatomical and dynamical constraints of the nervous system. The perception of time includes duration as temporal lapses between events, ordering events (arguably a necessity to establish causal relationships), assessing simultaneity and temporal coincidence and discriminating temporal rates and rhythms. One can grasp the temporal features of the surrounding world, sequence, manipulate and evaluate them in a number of ways as well as use the temporal relationships of one's memories and imagine future events, i.e. play with one's internal representations of temporality.
This review focuses on the ‘subjective present’ (Pöppel 1988) i.e. time perception below a 2–3 s span. I will first present the difficulties encountered in thinking about time perception, keeping in mind the cognitive levels of analysis—computation, algorithm and implementation (Marr & Poggio 1977; Marr 1982). I will then review some empirical work pertaining to temporal effects and time perception in multisensory research in order to illustrate the amodal nature of temporal representations in perception. Finally, I will turn to possible means by which time representations acquire an abstract status and reach conscious awareness.
2. Two-way non-identity problem: subjective time is not objective time is not linear neural time
(a) Subjective time does not equate to objective time
Time perception is not identical to objective time (Efron 1970a), namely the physical duration of an event does not map one-to-one with the subjective duration of it. This may come as no surprise as there is no ‘absolute time’ ability (analogous to ‘absolute pitch’), suggesting a lack of invariant time representation in the brain. The internal state of the observer readily modulates time perception: attention (Block & Zakay 1997; Coull et al. 2004), emotional state or valence of the stimuli (Angrilli et al. 1997; Droit-Volet et al. 2004; Noulhiane et al. 2007; Droit-Volet & Gil 2009) and expectancy levels (Tse et al. 2004) as well as task demands (Fraisse 1957; Block & Zakay 1997) can all affect time perception. For short durations, the role of attention may be less salient (Lewis & Miall 2003) than that of stimulus parameters. For instance, for a same objective duration, a filled interval or an interval containing more discrete events (e.g. click train) or events at a faster rate are judged longer than an empty interval or an interval containing less or slower discrete elements, respectively (Fraisse 1957; Goldstone & Lhamon 1976). Numerous cases of subjective time distortions (‘dilation’ when objective time is overestimated, ‘compression’ when it is underestimated) have recently been reported using different paradigms across sensory and motor modalities (Rose & Summers 1995; Yarrow et al. 2001, 2004; Hodinott-Hill et al. 2002; Sasaki et al. 2002; Park et al. 2003; Yarrow & Rothwell; 2003; Tse et al. 2004; Morrone et al. 2005; van Wassenhove et al. 2008). For example, an oddball within a stream of standard events of identical duration is perceived as longer than a standard event (Tse et al. 2004). The perceptual salience of a stimulus can also lead to dilation and compression of subjective duration: a looming object even if expected to tend to be judged longer than a stationary object of the same duration in both audition and vision (van Wassenhove et al. 2008). Evidence will be provided throughout illustrating that subjective time does not equate to objective time and that, as Searle (1992) noted, ‘phenomenological time does not exactly match real time’.
(b) The represented and the representing
In addressing the problem of mapping objective time to subjective time, one is confronted with the potential fallacy that there would be a direct identity relationship between objective time, its neural representation and its subjective perception. The aforementioned temporal illusions exemplify the non-identity mapping of objective to subjective time. I now focus on how objective time functionally maps onto neural activities and relates to subjective time. Specifically, I will focus on time perception—i.e. time as a perceptual construct—as opposed to time processing—i.e. neural temporal properties: the former focuses on the kind of functional isomorphism the brain uses to handle subjective temporality, whereas the latter focuses on the temporal resolution or granularity of the dynamics in the nervous system. For instance, one would not claim that the property of being red in the sensory world maps onto redness in the brain leading to the percept ‘red’. Rather, there is a formal correspondence between the physical properties of redness and its representation as instantiated in the neural activity of the brain and which constitutes the isomorphism proper (see for instance Gallistel 1989). These aspects of cognition are not accessible to conscious awareness, just as we are not aware of the automatic computations of our brain allowing us to say that ‘2×7 makes 14’ (Dehaene 1997). In the tradition of distinguishing levels of analysis in cognition (Marr 1982; Fodor & Pylyshyn 1988), we are now dealing with time perception and its representations versus its implementation as neural temporal dynamics.
The non-identity between neural and perceptual time can be addressed by examining differences of neural latencies within the visual system and across sensory modalities. Zeki (2001) reviewed the problem of local asynchronies within the visual system where, for instance, motion and colour pathways are processed with different latencies despite our percept being of a temporally unified ‘moving red ball’. An analogue problem exists in multisensory perception: neural latencies in audition (Celesia 1976; Lakatos et al. 2005) can be tens of milliseconds shorter than in vision (Buchner et al. 1997; Schroeder 1998; Zeki 2001), yet it does not prevent robust informational binding across sensory modalities (for a classic example of audio-visual speech illusion, see Mcgurk & Macdonald 1976). It is here argued that a reasoning in terms of neural latencies implicitly assumes that time is represented as linear time in the brain and that awareness of an event emerges when the processing of that event ends, e.g. upon entering cortex, time is 0, next processing latency, time is 20 ms and so on. Dennett & Kinsbourne (1992) argued that, ‘what matters for the brain is not necessarily when individual representing events happen in various parts of the brain (as long as they happen in time to control the things that need controlling!) but their temporal content’. A perceptual phenomenon illustrating the distinction between content and format is the ‘flash-lag effect’ (FLE; Metzger 1932; Mackay 1958; Nijhawan 1994; Eagleman & Sejnowski 2000), which is a riddle of temporal ordering in perception. In the FLE, a brief flash presented during a moving visual stimulus is perceived as lagging behind the moving event by approximately 80 ms. An audio-visual FLE has been reported (Alais & Burr 2003), which cannot be accounted for by differences in neural latencies based on the estimation of visual motion integration and reaction times (Arrighi et al. 2005). A recent interpretation of this illusion has been ‘postdiction’, thereby the awareness of events occurs only after the events have been reconstructed in time (Eagleman & Sejnowski 2000). Phenomena such as the ‘rabbit illusions’ (Geldard & Sherrick 1972) are perhaps more explicit with regards to postdiction: tapping the skin repeatedly at the same spot (say, on the forearm) followed by a single tap at a different location (still on the forearm) leads the participant to perceive evenly spaced taps between the first and last tap—i.e. to perceive more locations than the actual two. Attention modifies the spacing (perceived timing) of taps (Kilgard & Merzenich 1995). This effect has been recently extended to the auditory, visual and audio-visual modalities (Moradi & Shimojo 2004; Kamitani & Shimojo 2005): when presented with three sounds aligned with a visual apparent motion stimulus (two flashes sequentially presented at a different location), a participant perceives a ‘ghost’ flash between the first and second flash. Crucially, although the direction of motion cannot be determined prior to the last stimulus in the sequence, the ghost percept is captured in the direction of motion (Moradi & Shimojo 2004; Kamitani & Shimojo 2005). At a first glance, these perceptual phenomena illustrate spatial distortions of perception, but they importantly suggest that events may be temporally tagged (encoding of temporal content) and re-ordered in the elaboration of a percept and the subjective appreciation of its temporality: as such, they are temporal illusions (Dennett & Kinsbourne 1992; Grush 2005).
The sensory independence of postdictive phenomena suggests that the temporal content can be read throughout different systems and that it evolves as amodal representations constrained by either centralized operations or similar distributed computational architectures. It is unclear when and how (and if, for the anti-representationalists; Varela 1999) time is encoded, but these illusions suggest that temporal content (i) is encoded rapidly and early upon analysis of sensory events and (ii) can be implicitly manipulated and reassessed in building a percept. To my knowledge, the existence of temporal representations and content has been seldom discussed theoretically (Dennett & Kinsbourne 1992; Ruhnau 1994, 1995; Grush 2005) and not tested empirically with this distinct goal in mind (but see Efron 1970a,b). The possible symbolic (Eagleman 2001) or abstract (Walsh 2003) nature of time representation has been pointed out but neither clear evidence nor formalization of a discrete unit for time perception nor temporal tagging (Pöppel 1978; Johnston & Nishida 2001; Nishida & Johnston 2002) has been demonstrated independently of neural time processing.
The distinction between the content and the format of a representation is particularly non-trivial for time perception proper. A striking illustration of content versus format is provided in two studies looking at speech comprehension. Saberi & Perrot (1999) showed that local time reversal over 50–100 ms of auditory speech signals has nearly no detrimental effect on speech recognition; similar findings were reported in sentential contexts (Kiss et al. 2008). These results suggest that acoustic inputs can be processed with no better temporal resolution than what is provided to the system, namely 50–100 ms. Said differently, acoustic information is integrated over a temporal window. Auditory speech is composed of fine acoustic temporal structures and as such, the analysis of speech is often posited to require neural mechanisms with adequate fine temporal resolution. The aforementioned results challenge this notion and recent models have indeed made use of specifically sized temporal windows of integration for the analysis of speech signals (Poeppel 2003; Poeppel et al. 2008), models that heavily rely on neuroimaging evidence (e.g. Boemio et al. 2005). In this example, it is clearly the content of a temporal window that matters, not the temporal window itself, i.e. the brain operates on a parsed or temporally windowed information flow and not on a linear timing or latency-based mechanism. With respect to time perception proper, subjective time has been argued to be adirectional within such small temporal windows, (Ruhnau 1994, 1995) yet as shown here, it does not impair the directionality of the speech percept. Can these examples on the temporal structuring of auditory speech be transposed to the problem of time perception proper? How would processing windows apply to time perception?
(c) Discreteness in subjective time?
The ‘time quantum’ in physiology (Von Baer 1876), the ‘specious present’ in psychology (Clay 1882; James 1890) and the ‘cinematographic hypothesis’ in philosophy (Bergson 1888, 1909) are contemporary notions in which the ‘river metaphor of time’ vanishes and which initiate a novel approach consisting in determining the unit(s) of time perception (Mabbott 1951; Stroud 1955; White 1963; Blumenthal 1977). To date, the lack of distinction between content and format has prevented to resolve this question. For instance, whereas the evidence for the discreteness of time processing is plentiful, the evidence for the discreteness of time perception is sparser. Chronometric studies have demonstrated robust periodicities in reaction times (approx. 10–30 ms) that are task-dependent but attention-independent (for a critical review see Dehaene 1993); similar multimodal distributions of saccadic reaction times have been reported with periodicities of 30–60 ms (Frost & Pöppel 1976). These data strongly suggest that automatic temporal processes underlying perception and action are discrete; however, temporal processing underlies perception at large and not uniquely time perception. Importantly, sensorimotor timing (perception–action loop) is finer than that observed in time perception (Repp 1999, 2000): ‘implicit timing’ thus refers to those temporal processes that escape conscious access but which can be tracked behaviourally, for instance by chronometric studies (see however a potential methodological confound if the measured processes are implemented in non-sequential parallel paths; Pöppel 2009). It remains unclear how ‘implicit’ and ‘explicit’ timing relate to each other and this is a crucial issue for any theory of time perception (Michon 1990).
Direct assessment of time perception uses several measurement methods and concepts. The first, subjective simultaneity, encompasses distinct mechanisms (Piéron 1952): two events can be simultaneous or successive in time and, if they are successive, the ordering of events should be perceivable. The distinction between ‘simultaneity’ and ‘order’ has lead to several types of psychophysical thresholds: the ‘fusion threshold’ or the amount of time, thereby the observer can perceive several events and not just one; the ‘temporal order threshold’ (TOT) or the amount of time required for two events to be correctly ordered in time; the ‘simultaneity threshold’ or the time separation required for two events to be correctly perceived as successive or simultaneous in time. Below the fusion threshold, two identical stimuli cannot be singled out and rather fuse into one unitary event. In audition, the fusion threshold approximates 1–2 ms (Exner 1875; Von Bekesy 1936) and the TOT approximates 20–40 ms (Exner 1875) i.e. one order of magnitude higher: two identical auditory events separated by approximately 10 ms are perceived in succession but cannot be ordered. Temporal information below the fusion threshold is accessible and registered for the computation of auditory space (Jeffress 1948; Carr & Konishi 1990; Carr 1993): temporal cues do not necessarily result in a time percept and non-temporal cues can contribute to the time percept. In vision, the fusion threshold (or ‘critical flicker fusion’; Landis 1954) is nearly reducible to the TOT, namely approximately 20 ms (Exner 1875). When two visual events are perceived, so are their order but with an additional trick: temporal delays approximating the TOT between two visual events lead to perceptual motion phenomena ranging from flickering to phi motion, beta motion and alternation percepts (Wertheimer 1912). When using transient stimuli, an average TOT of approximately 30 ms is often observed within and across auditory, visual and tactile sensory modalities (Hirsh & Sherrick 1961; Hirsh & Fraisse 1964). The similarity of TOT across sensory modalities for simple stimuli is surprising given the differences of neural latencies between audition (Celesia 1976; Lakatos et al. 2005) and vision (Buchner et al. 1997; Schroeder 1998; Zeki 2001). Several groups have suggested the possibility of mechanisms compensating for the neural delays across the senses (Engel & Dougherty 1971; Sugita & Suzuki 2003; Kopinska & Harris 2004), but they remain controversial (Lewald & Guski 2004; Arnold et al. 2005) and may not take place below the ‘horizon of simultaneity’, i.e. when auditory and visual sources are less than 10 m away from the observer (Pöppel 1988; Pöppel et al. 1990). Additional results on the TOT will be addressed later on.
If a dedicated system exists for time perception, be it localized or distributed, one owes to define on what representations that system operates and for instance, whether invariant representations of time are available to the system. Can the TOT be de facto taken as the perceptual unit of time perception or is 30 ms duration a ‘temporal integration unit’? (Pöppel 1971, 1997)? In the first scenario, explicit time perception would be discrete; in the second scenario, it is the implicit time processing that is discrete. Neither scenario entails nor excludes the other, but to date, available data mostly speak of the discreteness of time processing and not of what the representational unit of time may be for time perception, if such a unit exists.
3. An amodal representational space for time perception?
Empirical evidence is briefly reviewed for different levels of time perception across sensory modalities. The specific question is whether time perception should be conceived as being tightly coupled to a sensory modality or whether the representations for time perception acquire a level of abstraction. The distinction between temporal processes and percepts remains highlighted when necessary.
(a) Fusion, order and simultaneity
The fusion thresholds for audition (approx. 2 ms) and vision (approx. 20 ms) differ by one order of magnitude: how then can a coherent percept of time emerge? As Mabbott (1951) rightly sensed decades ago, ‘But what duration, if two of the senses which contribute to its content have different specious presents?’ Above the fusion threshold, sensory events can be individualized, but it is only above approximately 30 ms that they can be ordered within and across sensory modalities. For transient stimuli (e.g. clicks, flashes) the TOT approximates 20–40 ms; as the stimuli become more complex, the TOT gets larger. Fraisse (1984) reported that the TOT varied as a function of stimulus complexity and it generalizes here across sensory modalities. For transient stimuli, the TOT (e.g. Hirsh & Sherrick 1961) and the simultaneity thresholds (e.g. Zampini et al. 2005) are very similar and approximate 20–30 ms. However, as the complexity of the stimuli increases, larger differences are seen between the TOT and the simultaneity thresholds. For example, Vatakis & Spence (2006, experiment 2) used audio-visual speech stimuli (a face articulating speech sounds) and asked participants to make a temporal order judgement, namely to respond to ‘which of the auditory or visual event came first?’. They found a just-noticeable difference in a audio-visual speech of approximately 70 ms. In a simultaneity task using audio-visual speech and in which participants were asked ‘are the auditory and visual events simultaneous or successive?’, two independent groups (Conrey & Pisoni 2006; van Wassenhove et al. 2007) reported that asynchronies remained undetected and were deemed synchronized within a temporal window of approximately 200–300 ms. Two pieces of key information on the simultaneity task data should be highlighted. First, each participant showed a ‘temporal window of integration’ within which the auditory and visual speech stimuli appeared simultaneous: thus, the perceptual window was not an artefact of averaging across individuals. Second, the temporal profile obtained in the assessment of simultaneity was nearly identical to that obtained in the evaluation of the speech percept resulting from the fusion of audio-visual speech (Mcgurk & Macdonald 1976). Hence, whether participants assessed the synchrony or the perceptual outcome of audio-visual integration, a temporal window of approximately 200–300 ms was obtained (van Wassenhove et al. 2007). The comparison of the temporal order and simultaneity for audio-visual speech is particularly puzzling in light of a question asked by Fraisse (1957), ‘Peut-on percevoir l'ordre là où l'on ne pourrait même pas distinguer la succession?’ (‘Can one discriminate order when not even succession is being perceived?’). It appears that order can indeed be resolved despite stimuli being perceived as simultaneous and fused as one perceptual outcome. One possibility is that although temporal processing may operate with fine temporal resolution, this resolution may not permeate to conscious awareness: refined implicit temporal processing does not equate to refined explicit temporal representation. Indeed, learning on a temporal order task does not transfer to a synchrony task (Mossbridge et al. 2008). Attention to particular temporal features or changes in temporal expectations (Nobre et al. 2007) may affect these thresholds, but it is probably not a coincidence that naturalistic events tolerate large number of delays, i.e. they remain robust despite increase in (temporal) noise. If the nervous system is adapted to what are true realizations in the physical world (Fodor 2000) and if for biologically relevant stimuli, specialized modules can be hypothesized, the underlying computations and invariant representations modules entail could tolerate higher noise levels. Said differently, when external invariance is not, internal invariance may compensate for. To speculate further on this point, one would predict that for those encapsulated systems that operate on invariant representations (e.g. speech), more temporal noise would be tolerated than for those that do not (e.g. a click and a flash that do not provide a categorical perceptual correlate beyond the features that compose them).
Several factors further complicate the determination of TOT. For instance, participants can substantially lower their TOT by using non-temporal cues and different perceptual strategies (Fink et al. 2006). The effects of attention and ‘prior entry’—more attention allocated to the first stimulus—may play important roles in the evaluation of the TOT (for review, see Spence et al. 2001). Context and prior experience, even if short, also affect the TOT: in a recalibration paradigm, adaptation to intersensory or sensorimotor desynchrony widens the window of tolerance and lowers the precision of order perception under various types of stimulations (Fujisaki et al. 2004; Vroomen et al. 2004; Navarra et al. 2005; Stetson et al. 2006; Vatakis et al. 2007; Hanson et al. 2008).
Still, the variability in order thresholds across sensory modalities is largely comparable with that observed within a single modality, which is, as previously mentioned, quite impressive given the neural latency delays across sensory systems. At the scale of a few tens of milliseconds (i.e. within the order and simultaneity ranges), auditory information affects the quality of a visual percept: two transient sounds in a rapid sequence biase the number of perceived flashes (Shams et al. 2000) and transient sounds at various timings during a visual apparent motion display alter the thresholds at which different kinds of visual motion emerge (Soto-Faraco et al. 2003; Arrighi et al. 2006; Getzmann 2007). Numerous pieces of evidence have accumulated for the interaction of sensory modalities in motion perception; a thorough review has been provided elsewhere (Soto-Faraco et al. 2003). Surprisingly less is known of the possible interactions at the fusion thresholds, although some reports suggest that auditory noise affects the fusion threshold in vision (Kravkov 1934; Maier et al. 1961), but it remains controversial (Landis 1954).
(b) Temporal order and causality
‘[…] if only events are causes, then order and duration cannot be causes, since order and duration are features of events, not themselves events’. (Le Poidevin 2004)The perception of order can hardly be dissociated from that of causality, the former being considered a necessary condition for the latter. This view is, however, biased towards conceiving order and causality as products of a serial and hierarchical temporal processing of events as they arrive in and are analysed by the brain, i.e. it assumes a linear identity between objective and subjective time (see above discussion on neural latencies). No perceptual causality should be present below the TOT, although I am not aware of specific studies addressing this question outside of temporal perception. The FLE and rabbit illusions suggest that the order of external events could be internally and implicitly re-assessed for conscious perception; similar postdictive mechanisms have been discussed for perceptual causality (Choi & Scholl 2006), further suggesting that events may be temporally tagged (Pöppel 1978; Nishida & Johnston 2002).
Humans tend to naturally attribute causal relationships and intentionality to moving geometric figures (Michotte 1954); perceptual causality may derive from specialized automatic rules (Scholl & Tremoulet 2000), but it remains unclear whether these rules underlie perceptual processing or entail higher cognitive processes. One functional magnetic resonance imaging (fMRI) study showed activation of the right inferior prefrontal cortex during observation of stimuli leading to perceptual causality (Fugelsang et al. 2005); this area is crucially activated in time perception (Lewis & Miall 2006) and in the retrieval of temporally structured sequences (Fuster 2001), suggesting the implication of central mechanisms for the evaluation of temporal order (Pöppel 1997)—see also Battelli et al. (2007) for a review on the involvement of parietal cortex in ordering events.
One question is whether the temporal aspect of event order accesses conscious awareness outside a temporal task, i.e. under non-experimentally constrained situations in which an explicit assessment of temporal order is required. As mentioned above, complex stimuli show larger TOT values and may involve larger integrative windows than would be expected based on an approximate 30 ms TOT. Likewise, perceptual causality tolerates larger desynchronies than would be expected from the TOT. Early findings on the temporal constraints of perceptual causality for geometric forms have estimated at approximately 100 ms the delay above which reports of perceptual causality diminish (Michotte 1954). A similar finding was observed when desynchronized stimuli were used with stimuli leading to perceptual causality (Choi & Scholl 2006); the authors mentioned that participants were well aware of the temporal mismatch but still reported a causal relationship between events. Additionally, and perhaps more strikingly, perceptual causality in a sensorimotor context can be reversed in a recalibration paradigm such that a sound presented before a button press is perceived as being the consequence of the button press (Stetson et al. 2006). As noted by the authors, if the delay between the action and the auditory event is too large, i.e. above approximately 100 ms, recalibration effects are reduced. Perceptual causality has also been observed across sensory modalities. In the ‘bounce-stream illusion’, two identical visual objects crossing each other's path can be experienced as streaming passed each other or as bouncing off each other (Sekuler & Sekuler 1999): at the meeting point of these two objects, the presentation of a transient sound elicits a robust increase in bouncing percept, indicating causal inferences across sensory modalities (Sekuler et al. 1997; Watanabe & Shimojo 2001b). The duration of the post-meeting point trajectory of the disc also affected the illusion with 150–200 ms eliciting the most robust increase in bouncing percepts, suggesting possible postdictive mechanisms in this effect as well (Watanabe & Shimojo 2001a).
It is thus suggested that perceptual causality does not solely depend on computations of events timing but also, and to a very large extent, on prior implicit knowledge and expectations of the system, ultimately leading to predictive mechanisms in perception (Enns & Lleras 2008). Although approximately 30 ms may constitute a temporal unit—an integrative unit for ‘time gestalt’ (Ruhnau 1995)—a perceptual moment or frame is probably coupled with conscious awareness at a time scale that is not reducible to its smallest constituents.
(c) Temporal rate
The manipulation of temporal rate provides classic examples of multisensory effects. Temporal rate is studied using transient stimuli (clicks or flashes) presented with constant or variable rates of presentation. Visual temporal rate perception is classically affected by the concurrent presentation rate of auditory events, i.e. is subjectively sped up or slowed down as a function of the rate of auditory events. These findings have long been observed (Exner 1875; Hamlin 1895; Mass 1938; Gebhard & Mowbray 1959) and Shipley (1964) provided the first robust quantification of this ‘flutter-driven flicker’ perception. The dominance of auditory rate and rhythm over that of vision in time perception has been reported several times (Recanzone 2003; Wada et al. 2003; Guttman et al. 2005; Arrighi et al. 2006), but temporal cross-capture, a perceptual effect in which audition and vision influence each other, has also been observed (Wada et al. 2003). It is easier to recall an auditory rhythm than it is to recall a visual rhythm (Glenberg et al. 1989) and audition affects motor rhythm more than vision (Repp & Penel 2004). The temporal aspects of audio-visual perception are generally considered to be dominated by audition with effects such as temporal capture (Fendrich & Corballis 2001). Learning of temporal rate discrimination in somatosensation transfers to audition (Nagarajan et al. 1998), suggesting that the extraction of temporal rate is available to other sensory modalities, although plasticity associated with temporal rate learning is observed early in the sensory processing stream (van Wassenhove & Nagarajan 2007). The mechanisms of intersensory transfer in temporal learning are unknown. Temporal rate is also relevant for perceptual causality, especially in the context of the rabbit effects and postdiction phenomena described earlier. In those examples, information extracted from a temporal rate is used to establish the quality of a percept not a pure temporal percept; the effects are perhaps more striking since the perceptual outcome is a structure seemingly based on the most likely ordering of events given the implicit ‘belief system’ of the perceptual system.
(d) Simultaneity and coincidence
The notion of simultaneity is seldom addressed as opposed to that of TOT. A window of tolerance for several multisensory illusions surrounding strict objective simultaneity approximates 100 ms (Van De Par & Kohlrausch 2000): sensory events occurring within approximately 100 ms have a greater likelihood of being integrated than not, hence a multisensory perceptual unit emerges. This time period may represent a ‘perceptual frame of subjective time’. For more complex stimuli, however, the window of tolerance can reach approximately 250–300 ms (Massaro et al. 1996; Munhall et al. 1996; Grant et al. 2004a,b; Conrey & Pisoni 2006; Vatakis & Spence 2006; van Wassenhove et al. 2007) despite lower TOT.
Subcortical and cortical sites of multisensory integration contain neurons responsive to more than one sensory modality (Stein & Meredith 1993). The window of temporal tolerance or temporal tuning of these neurons is few tens to hundreds of milliseconds (Benevento et al. 1977; Meredith et al. 1987) with an optimal integration window estimated at approximately 250 ms (Meredith et al. 1987). This window pertains to the latency of arrival of different converging neural streams, hence to internal neural processing time and not necessarily to the objective time separating two stimuli (see discussion in §1). A neural moment is thus not a point in time, but a window of time and models of multisensory perception have started to include such a notion of temporal window of integration (Colonius & Diederich 2004).
A window of integration importantly suggests that ‘temporal coincidence’ is not a point in time. Recent investigations have suggested that rapidly co-occurring arbitrary audio-visual pairings can be implicitly associated (Seitz et al. 2007), i.e. learned without awareness of the temporal relationships between these events. It is unclear how much temporal disparity between the audio-visual pair is acceptable for learning, but it is likely that implicit learning would be insensitive to constant temporal disparities since simultaneity across sensory modalities is plastic (Fujisaki et al. 2004), e.g. if an asynchrony of 80 ms is consistently introduced between auditory and visual presentations; with unpredictable temporal jitters (each audio-visual pair is presented with a different desynchrony value on each presentation), implicit learning is likely to be more difficult. The comparison between implicit learning of temporally jittered multisensory events and their explicit temporal perception may bring interesting data that could highlight differences of implicit temporal processes and explicit temporal representations in multisensory perception.
Duration, unlike temporal order, is the fundamental time percept as it explicitly requires the measure of time that passes by. In duration perception, some intriguing differences across sensory modalities have been reported: given the same objective duration, an auditory event is perceived as being longer than that of a visual event, whereas often times, audition is more precise than vision in temporal perception (Goldstone et al. 1959; Goldstone & Goldfarb 1963, 1964a,b; Wearden et al. 1998; Penney et al. 2000; Penney & Tourret 2005). Two kinds of explanation for the differences in audition and vision have been put forward within the prominent clock models (Treisman 1963; Allan 1979; Gibbon et al. 1984). The minimalist clock model consists of a pacemaker generating discrete events at a fixed frequency and an accumulator counting these events. A switch between the pacemaker and the accumulator regulates the counting mechanism: when it is closed, the units add up in the accumulator, when it is open, accumulation stops. With regards to the observed differences in subjective duration of audition and vision, the latency of the auditory switch may be more stable and/or the rate of the auditory pacemaker may be faster than their visual counterparts (Wearden et al. 1998; Penney et al. 2000; Penney & Tourret 2005; Droit-Volet et al. 2006). Both hypotheses assume that each sensory modality has its own clock. Recent data have suggested that interactions across audition and vision are difficult to reconcile with independent clock models (van Wassenhove et al. 2008), but more data are needed to formalize multisensory interactions during duration discrimination. One parsimonious explanation for the differences in perceived duration across sensory modalities may be the lack of intensity-matching controls; intensity-duration tradeoffs have clear temporal effects on audition (Oléron 1952; Moore 1997) and vision (Goldstone et al. 1978; Eagleman et al. 2004). As such, prior evaluation of subjective intensity matching between an auditory and a visual event may lead to different patterns of results.
In summary, the systematic study of time as a process and as a percept in multisensory research is still in its infancy. Temporal relationships of multisensory events are an important factor in building a percept; the sensitivity to temporal features across sensory modalities are already well developed in infants of a few months old (Lewkowicz 2000). In turn, the ability to match temporal properties across sensory modalities suggests that they are largely accessible to conscious awareness. This section highlights that at multiple levels of time perception interactions occur across the sensory modalities that remain to be understood. A systematic approach to the study of time perception integrating multisensory interactions would provide much needed information on the level(s) at which temporal information gets integrated across different sensory systems. Recent efforts have been made in this direction (e.g. N'diaye et al. 2004; Nouhliane et al. 2008; van Wassenhove et al. 2008).
4. Shuffling time in the brain
There be three times; a present of things past, a present of things present, and a present of things to come.(Saint Augustine (400 AD))Time is in the observer's mind, yet no invariant or representational unit of time is available: what then does an internal clock count? Clock models classically posit a linear metric of time (specifically for duration), but the temporal representations on which these models operate are clearly underspecified. The phenomenology of time perception encompasses the perception of order, rate, simultaneity, successiveness and coincidence, all discussed in §2: each temporal property is presumably extracted out by specific neural mechanisms and accessed consciously with a lower resolution than would be expected considering the fine temporal granularity of neural operations. I now sketch a novel framing of time representation for perception.
(a) Brain oscillations and the temporal structuring of perception
‘Perceptual units’ are a ‘consequence of temporal integration’ over approximately 30 ms temporal windows (Pöppel 1971, 1997). With a periodicity of milliseconds to seconds, brain oscillations naturally lend themselves as temporal processors parsing the sensory field into cycles of brain time. The neural periodicities of brain oscillations echo the time scales of perceptual structuring in audition, somatosensation and vision, and they present a hierarchical structure at local and global spatio-temporal scales (Başar 1998; Llinás et al. 1998; Buzsáki 2006). In particular, the gamma band (more than 30 Hz) is likely to support approximately 30 ms integrative windows (Pöppel 1971, 1997). Synchronizations of neural populations in the gamma range are largely recognized as essential features of brain function both locally, for instance as a mechanism of feature binding in perceptual processing (Singer 1998), and globally as a large-scale support for cognitive operations (Engel et al. 2001). Synchronization of neural populations entails higher order features such as phase between synchronizing neural populations (Womelsdorf et al. 2007), including those neural populations that synchronize in different frequency regimes such as gamma and theta (Freeman 2000; Canolty et al. 2006).
With respect to time perception, the underlying assumption in positing a 30 ms perceptual unit is that the temporal content of a brain event (one gamma cycle) equates to the brain event itself. If approximately 30 ms were the perceptual unit of time or minimal time content for the time perception system, it would follow that any perceivable duration would be a multiple of approximately 30 ms. However, this is not the case: for instance, a 50 ms duration can be discriminated from a 57 ms duration for filled intervals and from 70 ms for empty intervals (Rammsayer & Lima 1991). A second issue is that although approximately 30 ms between two events are necessary (e.g. two gamma cycles) for ordering them, less than approximately 30 ms is necessary for the awareness of two events. One magnetoencephalographic (MEG) study targeting the neurophysiological correlates of perceiving one versus two auditory events as a function of their temporal separation showed that perceiving one event correlates with one gamma band response (GBR), whereas perceiving two events correlates with two GBRs (Joliot et al. 1994). This suggests potential correlations between GBR and fusion threshold but not with the TOT as would have been predicted. In fact electroencephalographic (EEG) findings conflict with these data and depict a more complex picture: one complete GBR per auditory transient was observed when events were at least 100 ms apart (Boemio 2003). In his study, Boemio (2003) reports a correlation between the temporal structuring of acoustic events and the GBRs yielding three distinct perceptual zones: (i) when transient events (0.1 ms clicks) are separated by more than 100 ms, they elicit a single isolated GBR and clicks are perceived as discrete events; (ii) as the time between clicks decreases from 100 to 10 ms, the GBRs increasingly summate over time and so do the individual clicks that acquire a pitch-like quality; and (iii) below 10 ms, a single GBR is elicited to the first click only and the percept of the individual click is lost. These findings suggest that the temporal fine structure of acoustic events is preserved in cortex (Boemio 2003) and comparable results have been reported in vision where V1 neurons can follow flicker frequencies that are below perceptual resolution (Gur & Snodderly 1997).
The view suggested by Pöppel (1997, 2009) is relevant for the temporal structuring of perception at large. For instance, in an auditory speech, the main acoustical difference between [bæ] and [pæ] resides in the delay between the consonantal burst and the vowel onset. If the delay is below approximately 30 ms, a [bæ] is perceived; if the delay is above approximately 30 ms, a [pæ] is perceived. The approximately 30 ms delay defines the categorical boundary between two distinct percepts and the perceptual realization is [bæ] ([pæ]) irrespective of delays below (above) the categorical boundary. This suggests that conscious access to temporal resolution is lost to the expense of another form of perception, here phonological; for time perception proper, some temporal resolution may similarly not be accessible for the representation of time. One possibility is that the integrative window underestimates the coding capabilities for temporal information and that the actual encoding of temporal properties operates on a much finer resolution (for a detailed account on the distinction between integrative and encoding windows, see Theunissen & Miller 1995). This possibility could account for the discrepancies of time resolution in processing (implicit timing, encoding windows would be non-transparent to conscious perception) and perception (‘explicit timing’, integrative windows would be transparent to conscious temporality).
(b) Raw temporal representations for time perception
Fundamentally, there is no dedicated perceptual system for time perception by analogy to the auditory or visual systems. There is not one but many time receptors since all senses receive temporal properties. The major source of temporal information is provided by external events when it is the duration of those events that is being perceptually assessed. These temporal properties are presumably encoded through the same analytical pathways as other aspects of stimuli contents such as colour, motion or pitch. Why? To date, there has been no evidence of analytical pathways specialized in encoding temporal features that would run in parallel with other analytical pathways within each sensory systems. At a later analytical stage however, a ‘when’ pathway has been hypothesized (Calvert 2001; Battelli et al. 2007). Second, there is no prototype of what a 100 ms is (for instance, by analogy to the prototype of what a chair is) and there is no absolute time representation to which an objective duration could be matched against except for experimental tasks in which a ‘standard duration’ is learned and stored in memory for future comparison with another ‘test duration’. The lack of internal invariance and the absence of an analytical pathway suggest that time encoding does not function as a typical encapsulated series of feature extractors as is classically conceived in sensory systems. Finally, there is a priori no frame of reference for time perception but perhaps the self, i.e. an egocentric reference frame (Van De Grind 2002).
One overlooked but increasingly scrutinized notion in system neuroscience is that of ‘spontaneous activity’, ‘resting state activity’ or ‘default network state’ (Buzsáki 2006): unsurprisingly, neural structures show organized spontaneous activity in the absence of external stimulation. Although the effects of sensory deprivation on subsecond timing are unknown, the perceptual system does assign some temporal structure to a steady stimulus or to a stable environment. For instance, a steady patch of light displayed extra-foveally eventually fades away from perception (Troxler 1804); when presented with a steady ambiguous stimulus, perception oscillates between the possible perceptual interpretations whether in vision (e.g. the Necker cube) or in audition (Pressnitzer & Hupé 2006). Llinás et al. (1998) remarked that external stimulations act as modulations of ongoing activity not as triggers of brain activity: activity at a given instant incorporates and is partially determined by the system state just prior to this instant. Some crucial implications are that (i) the ‘neural present’ is a window encompassing objective present and just recent past and (ii) the neural present can act as a predictor of the soon to be present (i.e. the objective future). It follows, theoretically, that the brain can be seen as a complex and dynamical inferential system (Friston 2002) and empirically that large-scale neural synchronizations may implement top-down influences that shape the analysis of incoming inputs (Engel et al. 2001). Recent empirical evidence has supported the notion of inferential or predictive analysis in visual perception (Enns & Lleras 2008) and in speech perception (van Wassenhove et al. 2005; Poeppel et al. 2008).
In time perception, non-clock models (Buonomano 2000; Karmarkar & Buonomano 2007) are based on the state of the network, its contextual history and its intrinsic temporal properties as short-term synaptic plasticity and time-dependent changes incurring over the network at the cellular and population levels. Such models suggest that time is ubiquitous and intrinsically encoded in the activity of neural populations: any neuron in the network has thus the potential to affect time representation within that network, and conversely, any neural population can become the site of time representation. Note that ‘objective time is not linear neural time’ does not hold here insofar as it is the complex temporal dynamics that are considered not the latency per se within a serial processing pathway. Additionally, temporal encoding in a network receiving inputs from audition and vision would naturally represent time irrespective of the origin of these inputs, hence providing a first level of amodal representation for temporal information. This model provides an elegant and parsimonious means to represent temporal properties, yet it neither provides an account as to how this information reaches conscious awareness nor how it becomes the object of mental manipulations.
(c) Accessing time
Recent theoretical and empirical advances in the field of consciousness studies have posited the existence of a global workspace (Baars 1988) postulating two major types of computational systems: one which is composed of encapsulated modules (automatic processors) and the other, the ‘global workspace’, which enables large-scale computations via long-range neural projections accessing the sets of distributed and specialized neural populations of the first system (Dehaene et al. 1998). This model posits two durations: the first one is the duration sufficient for a stimulus to elicit significant changes of activation to be registered at the level of the automatic processors and the second one is the duration needed for the represented stimulus to access the global workspace (Dehaene & Naccache 2001b), hence one's conscious awareness of the stimulus. Additional key features for this model are the prominent role of thalamocortical networks and re-entrant processes: re-entrant processes enable, for instance, via attentional focus to amplify the incoming signal of interest, thus producing a closed-loop system between bottom-up signal processors and top-down amplification signals. Temporal attention could be one mode by which time becomes the property of interest for the global workspace and research has started to focus on the properties of attentional focus in time (Nobre 2001; Nobre et al. 2007), a research topic much less explored than spatial attention. Another implication of the closed-loop system is that this mechanism imposes a temporal resolution or ‘granularity’ on the stream of consciousness (Dehaene & Naccache 2001b), suggesting that a perceptual moment may have a specific duration. At least 100 ms would be required for this process to take place (Dehaene & Naccache 2001b).
(d) Working hypotheses
I now turn to working hypotheses and supporting available data.
Non-clock models are the most parsimonious means to provide, at an early analytical stage, the raw material for (subsecond) time perception.
At this level, temporal processing is automatic.
Temporal representations do not rely on a specific set of neurons and any network is potentially an ‘automatic processor’ for temporal processing.
Ubiquitous temporal representations are accessible to the global workspace.
Attentional focus to time directed towards, for instance, a particular sensory modality enables the global workspace to access the temporal properties of that network. The implication is twofold: without attention, time does not access awareness; with attention, time is perceptually available. In either case, temporal information is nevertheless automatically and implicitly represented at early stages of sensory analysis (including somesthetic, see Craig 2009).
Once temporal information has entered the global workspace, it is encoded in an abstract form, which affords manipulations such as sequencing, ordering or quantifying.
Electro- and MEG evidence has shown that deviance in duration is automatically detected via a mismatch negativity (MMN) paradigm (Näätänen et al. 2004) in the early stages of sensory processing, suggesting that temporal information is represented early on independently of attention. In a duration discrimination task, early steady-state auditory responses and visual sustained fields reflect the temporal properties of the stimulus (N'diaye et al. 2004) consistent with the MMN data. Early variations in these sensory responses are a good predictor of the subsequent perceptual classification of a stimulus as short or long (Bendixen et al. 2005). Concurrently to the sensory-specific responses, a classic contingent negative variation (indicative of anticipatory attention) develops in a fronto-parietal network independent of sensory modalities (N'diaye et al. 2004). The effect of attending to time is seen as a P300 component in EEG experiment, i.e. a component that follows the early processing stages (Nobre 2001). Available fMRI studies suggest that the parietal cortex (specifically, the left inferior parietal sulcus) is differentially activated when attending to time (Coull & Nobre 1998). Together these results strongly suggest the role of a parieto-frontal network during temporal evaluation, i.e. when participants attend to time.
With respect to the hypothesis that time representations acquire a degree of abstraction only when they reach the global workspace, evidence is accumulating for shared neural substrates underlying the computations of time, space and numerosity as magnitudes (Walsh 2003; Bueti & Walsh 2009): repetitive transcranial magnetic stimulation (rTMS) over the parietal cortex, also implicated in the representation of numbers (Hubbard et al. 2005), impairs temporal perception (Giacomo et al. 2003; Walsh 2003; Alexander et al. 2005; Battelli et al. 2007; Koch et al. 2009). Recent psychophysical data have demonstrated the automatic influence of size (Xuan et al. 2007) and numerosity (Dormal et al. 2006) on duration judgements; however, and importantly, duration does not impair numerosity (Dormal et al. 2006). This asymmetry suggests that access to temporal representations is not automatic and need explicit attentional focus to be brought to awareness: in this process, numerosity can affect the extraction of temporal representations especially if using similar computational resources, for instance, in reaching the global workspace.
With respect to time perception and the granularity of conscious stream suggested in the global workspace model (Dehaene & Naccache 2001a), the distinct concepts of the duration of the represented versus the representation of duration have been highlighted in Efron's (1970b) seminal work that compares the objective duration of an event and the duration of its actual percept. ‘The hypothesis put forward in this paper applies only to the durations of perceptions, that is, to the durations of the conscious awareness of an existent. It does not necessarily apply to the durations of the awareness of an attribute of an existent nor does it necessarily apply to the durations of the neurophysiological mechanisms […]’. The evidence for perceptual temporal frames has increased and the minimum duration of a percept has been estimated at approximately 100 ms (for review on this history, see White (1963) and on the earliest estimates; Stroud 1955). Several examples of multisensory percepts in §2 suggested that they bear temporal resolutions of the order of approximately 100 ms. An abundant literature in visual perception has stressed the effect of persistence at different levels of visual processing: the consequence of persistence is that the briefest flash will leave a neural trace in the brain system. This is at the core of research on sensory memory, and for instance a memory trace of approximately 100 ms in vision has been proposed (Di Lollo 1977). More recently, this time range (approx. 13 Hz) has been interpreted as an attentional sampling rate for visual motion (Vanrullen et al. 2006). This sampling could potentially be related to the rate at which information from the automatic processing level reaches the global workspace, but it does not constitute an estimate of duration; rather, it may constitute an estimate of the rate at which temporal representations in the automatic temporal encoding levels can be accessed.
Experiential time is abstracted out of the temporal structuring of the external and internal events. In this section, I have suggested that time perception emerges in two steps. First, time is encoded as raw material in the modulation of the spontaneous and ongoing temporal structuring of the brain system—perhaps as state-space dynamics—and thus, time is automatically and intrinsically represented but does not automatically reach conscious awareness. In the second step, within the global workspace framework, it is only when attention or task requirements are focused on time that raw time representations are converted into an abstract magnitude, which affords mental operations. I summarize below several alternative conceptualizations for time perception research which have been raised throughout:
There is no minimal representational unit of time perception used to construct a time percept. Time perception is intrinsically contained in the temporal dynamics of the brain and naturally derives from the very temporal structuring of neural processes. Time perception is the ‘anti-module’ par excellence. One such kind of model would be the state-dependent or non-clock models (Buonomano 2000; Mauk & Buonomano 2004; Karmarkar & Buonomano 2007). Taken as the sole means of representing time, such a model would fit an anti-representationalist view of time perception (Varela 1999). One intriguing consequence would be that time is directly transparent to consciousness.
There is no minimal representational unit of time used to construct a time percept. Any information extracted in the course of perceptual processing is also a potential feature for constructing a time percept. In this scenario, neural latencies could be used as markers of time, leading, for instance, to micro-consciousnesses (Zeki 2001) and there exist several time zeroes for the brain.
A minimal representational unit of time perception exists and its unit is neural time: time content is thus reducible to time representation. The outcome is likely to become either (i) or (ii) and thus postulating a time representation is a conundrum.
A minimal representational unit of time perception exists. Its unit is not time, perhaps something abstract such as a ‘quantifier’ (Walsh 2003) or a symbol (Eagleman 2001), although it is unclear what encoding strategy would be used in such cases. Anecdotal evidence in time agnosia suggests that the manipulation of conscious temporal events may share common computations and/or a neural basis with numerical additions as it is frequently paired with acalculia (Georgiev 2004). Further investigations on such time impairments and their correlates would bring much needed clarification on the nature of internal time representation.
Within the global workspace theory (Baars 1988; Dehaene & Naccache 2001a), a combination of (i) and (iv) has been suggested here. Within particular sets of automatic processors, the state-dependent networks provide automatic temporal encoding (Buonomano 2000; Karmarkar & Buonomano 2007), but it is only when reaching the global workspace through attentional gating (i.e. when temporal information is the needed information for the task or goal at hand) that time representation affords abstraction, perhaps as a magnitude comparable to numerosity.
It has been argued that the representational units of time, if they exist, are largely underspecified in the literature and that time perception studies often make the hidden assumption that subjective time is neural time. One of the most puzzling questions about time perception is why fast temporal processing (implicit timing) does not access consciousness as such? Why does auditory localization arise from fast temporal processing and not time perception itself? There is no direct mapping of objective time to subjective time, and our perceptual system clearly evolves more slowly in time than our nervous system affords. The mapping from objective to neural time is probably more intricate than intuition would suggest. For the perception of time to exist, a necessary processing step is unlike any encoding of external information seen in sensory systems: the raw material for time perception is likely to be the very states of networks involved in specific computations of very diverse informational content (e.g. pitch, colour and emotion) and this automatic, unavoidable and ubiquitous time processing may be remapped onto abstract representations for conscious perception. In other words, one perceives time by reading out one's brain dynamics.
This work was partly written while being a post-doctoral fellow in the Division of Biology at Caltech thanks to the support of a 2008 Gordon Ross Fellowship. I would like to thank Anthony Boemio, Dean Buonomano and two anonymous reviewers for their critical insights into an earlier version of this manuscript.
One contribution of 14 to a Theme Issue ‘The experience of time: neural mechanisms and the interplay of emotion, cognition and embodiment’.
- © 2009 The Royal Society