Evaluating dedicated and intrinsic models of temporal encoding by varying context

Rebecca M.C. Spencer, Uma Karmarkar, Richard B. Ivry

Abstract

Two general classes of models have been proposed to account for how people process temporal information in the milliseconds range. Dedicated models entail a mechanism in which time is explicitly encoded; examples include clock–counter models and functional delay lines. Intrinsic models, such as state-dependent networks (SDN), represent time as an emergent property of the dynamics of neural processing. An important property of SDN is that the encoding of duration is context dependent since the representation of an interval will vary as a function of the initial state of the network. Consistent with this assumption, duration discrimination thresholds for auditory intervals spanning 100 ms are elevated when an irrelevant tone is presented at varying times prior to the onset of the test interval. We revisit this effect in two experiments, considering attentional issues that may also produce such context effects. The disruptive effect of a variable context was eliminated or attenuated when the intervals between the irrelevant tone and test interval were made dissimilar or the duration of the test interval was increased to 300 ms. These results indicate how attentional processes can influence the perception of brief intervals, as well as point to important constraints for SDN models.

1. Introduction

Our interaction with the world occurs as a nearly seamless sequence of events involving perception and action through time. Despite time's ubiquitous importance, our understanding of the mechanisms involved in temporal processing remains elusive, particularly at the sub-second level. Millisecond timing is crucial for the production of skilled movements that involve the coordinated integration of multiple joints and muscles (Hore et al. 1996; Timmann et al. 1999; Lewis & Miall 2003). The precise representation of temporal information is also essential for a range of sensory tasks spanning all modalities. In a sensorimotor control, correctly estimating both the place and the time of a moving ball is essential to correctly position the hand (Lee et al. 1983). In audition, many phonetic cues require the discrimination of brief temporal cues (Tallal et al. 1996; Ackermann et al. 1999). In somatosensation, sensorimotor predictions involve generating precise predictions of expected sensory consequences that dynamically change over time (Blakemore et al. 1999; Diedrichsen et al. 2005).

In general, two broad categories of models have been developed to describe timing in the range of hundreds of milliseconds (Ivry & Schlerf 2008). The first is extrinsic or dedicated models that use timers or clock-like mechanisms specialized for temporal processing. The second is intrinsic models that describe timing as arising from inherent temporal properties of neural networks. Aside from postulating a range of mechanisms that could, in principle, measure time, the differences between these classes of models have important implications for how information is coded as well as integrated and compared across modalities.

(a) Dedicated models of temporal encoding

Dedicated models focus on mechanisms that are designed specifically to represent temporal information. Such models typically draw on metaphors that relate to mechanical devices created for measuring time. One such metaphor that has been widely invoked, especially in the animal cognition literature, is the clock–counter model (Creelman 1962; Treisman 1963). Clock–counter models specify a number of component processes that subserve judgements of the passage of time. One process is a pacemaker that marks units of time either in a periodic manner or as a stochastic process. The output of these units is accumulated by a counter. In such models, a 300 and 400 ms auditory interval would engage the same pacemaker, but the difference would be captured by the fact that the longer interval achieves a larger count in the accumulator. Clock–counter models entail an extrinsic representation of time in two ways. First, time is explicitly represented in the combined output of the pacemaker and accumulator. Second, the information from the counter can be compared with long-term representations of previously encoded temporal intervals (Gibbon et al. 1984).

An alternative model of dedicated timing is based on the metaphor of delay lines or a bank of hourglasses (Ivry 1996). In these models, temporal intervals are coded by the activation of specific sets of neural elements. For example, a 300 ms tone would lead to the activation of a set of elements, some of which would be sustained for 300 ms, others for 400 ms, etc. Temporal judgements would then be based on a comparison of which elements were activated (or deactivated) at the critical point in time. As with the clock–counter models, delay lines assume the existence of a specialized process that is capable of providing a metrical representation of time. Unlike the clock–counter models, there is no single core element (e.g. a pacemaker) whose output is essential for all judgements.

Models of dedicated timing generally have, at least implicitly, a modular perspective. This perspective has led to considerable research designed to identify neural regions that may be specialized for performing their component operations. The basal ganglia is frequently cited as a core element of an internal timing system (e.g. Meck & Benson 2002; see Buhusi & Meck 2009), based on evidence showing that the perception of time can be altered by dopaminergic manipulations. Dopamine agonists and antagonists produce behavioural changes that are consistent with an increased and decreased rate of a pacemaker, respectively (Rammsayer 1993; but see Meck & Benson 2002; Ivry & Spencer 2004).

An alternative neurobiological model emphasizes the role of the cerebellum in dedicated timing. This work draws on theoretical models that describe how neural interactions at Purkinje cells might function in a manner similar to delay lines (Kotani et al. 2003) as well as experimental evidence showing a pivotal role for the cerebellum in tasks that require precise timing (e.g. Ivry et al. 1988; Ackermann et al. 1999). A favoured example here is eyeblink conditioning, a simple form of learning in which the organism learns to make a conditioned response after an initially neutral stimulus (e.g. a tone) is repeatedly paired with an aversive unconditioned stimulus (e.g. an airpuff). Adaptive learning requires not only associating the two stimuli, but also representing their precise temporal relationship. Lesions of the cerebellum prevent this form of learning or, if induced after learning, result in poorly timed conditioned responses. These same lesions spare forms of classical conditioning that do not require precise timing (Perrett et al. 1993; Gerwig et al. 2003).

A priori, dedicated timing mechanisms need not be localized to a single brain region. It is possible that the component operations are distributed across networks of neural regions, an idea that has been reinforced by neuroimaging studies of timing (Pouthas et al. 2005). Nonetheless, much of the experimental work to date has focused on evaluating the specific contribution of areas such as the basal ganglia, cerebellum or prefrontal cortex. A key strategy in this work, motivated by the basic assumption of dedicated timing models, has been to look for temporal contributions that are supramodal. Indeed, evidence of supramodal impairment has been taken as the signature of dedicated timing, providing a parsimonious account of why damage to a particular region might produce increased variability in rhythmic tapping, judgements of the duration of an auditory or visual stimulus and impaired eyeblink conditioning. However, the experimental picture remains muddied by the fact that patterns of supramodal deficits have been observed in patients with pathologies as distinct as Parkinson's disease (e.g. Artieda et al. 1992; Harrington et al. 1998a,b; Elsinger et al. 2003), right hemisphere lesions (Kagerer et al. 2002) or cerebellar degeneration (e.g. Spencer & Ivry 2005).

(b) Intrinsic models of temporal coding

Intrinsic models offer a very different perspective on temporal processing. They are grounded in the idea that timing is an inherent property of neural processing (Buonomano & Merzenich 1995; Buonomano 2000). As such, tasks that require precise timing need not depend on the recruitment of a specialized mechanism; rather, the local intrinsic dynamics of neural activity can be exploited in a task-specific manner. Thus, the periodic responses required in rhythmic tapping emerge from the continuous activation and deactivation of the signals that control the successive actions or the expected sensory consequences of these actions. Similarly, the duration of a stimulus is coded by the same neural elements that respond to other sensory properties of that stimulus. By this view, the dynamic properties of neurons in area MT/V5 can provide not only a representation of the motion of a stimulus, but also metrical information that might be required for judgements of absolute time.

In some models of intrinsic timing, these representations are inherently interdependent. For example, one form of intrinsic timing postulates that duration is encoded in the magnitude of neural activity. As such, there will be a bias to perceive a stimulus of a fixed duration as longer if it is brighter or louder (Pariyadath & Eagleman 2007; Eagleman 2008). An alternative form of intrinsic timing is based on the general properties or functioning of neural circuits, with the representation of time arising as a result of patterns of activity throughout these networks (Buonomano & Merzenich 1995).

An example of the latter is a state-dependent network (SDN), a model in which time is implicitly represented in the synaptic properties or state of a neural network (Buonomano & Mauk 1994; Yamazaki & Tanaka 2005; Karmarkar & Buonomano 2007). The manner in which such networks represent time can be understood by considering how the network would respond to a pair of tones separated by a 100 ms interval. With a series of simulations of an SDN, Karmarkar & Buonomano (2007) showed that the first tone of the pair will generate activity in the network which changes predictably over time. This activity will include fast and slow inhibitory post-synaptic potentials and short-term synaptic plasticity in the connections between nodes in the network. The response to the second tone of the interval, even if identical in duration, pitch and frequency, will produce a pattern of activity different from the first owing to the fact that the dynamical state of the network has been changed by the first tone. The temporal interval defined by the two tones can be interpreted from the final state of the network.

It is important to note that SDN models do not involve any explicit or independent representation of time. Time can only be inferred through changes in the pattern of activity over the network. Consider how the response of an SDN can distinguish between a 150 and a 200 ms interval. The stimulus will trigger a cascade of physiological changes in the network. Recognizing that there are various time constants associated with these excitatory and inhibitory processes, the SDN will be in one state at the end of the 150 ms stimulus and a different state at the end of a 200 ms stimulus. With training, one could learn to map the first state to a response category ‘short’ and the second state to a response category ‘long’. Owing to generalization, a typical psychophysical function could be derived from the output of the network when presented with stimuli of intermediate duration.

This form of temporal encoding contrasts with the mechanisms invoked in dedicated models in which, at least on a functional level, one might identify a chronotropic tuning profile for individual elements. In an SDN, the elements interact in a continuous, dynamic manner such that a range of unique intervals or combinations of intervals can be represented over the resulting patterns. In addition, temporal coding is entirely dependent on the context in which an input is received. This can be a useful asset. For example, in speech, the perception of a temporal cue may remain invariant over different speaking rates, preserved by its neighbouring stimuli (Pisoni 1993). Similarly, state estimation models can be useful for encoding temporal sequences (e.g. ordinal relationships), independent of the actual rate of the individual events. However, a potential problem associated with context dependency is that the output of the network may be highly sensitive to noise; perturbations to the system may lead to radically different final output states. This problem does not apply to dedicated models, with their linear representation of time. In fact, the observation that variability is proportional to mean duration is a direct consequence of the cumulative effects of noise (see Killeen & Weiss 1987). Nonetheless, recent simulations have shown that, within limits, SDN can tolerate at least some levels of noise (e.g. Karmarkar & Buonomano 2007). In the larger context of the brain, stability of these types of networks might arise from learning mechanisms such as long-term synaptic and homeostatic plasticity (see Buonomano & Maass 2009).

In the present paper, we focus on one important characteristic of SDN; namely, that the output of an SDN to a stimulus of a fixed duration will differ when the initial context is varied. For example, the response of the network to a 150 ms stimulus leads to one state if the stimulus is presented in isolation (or with a long inter-trial interval) compared with when the same 150 ms stimulus is preceded by an event that perturbs the initial state. As such, the dynamics that allow an SDN to encode the duration of an event actually preclude a direct way to code that two identical stimuli are of equal duration if the context in which they are presented varies. This issue, referred to as the ‘reset problem’, would not be expected to occur with clock-type models that can faithfully represent each temporal interval. As long as the processing mechanisms of a dedicated model are appropriately activated by the onset of the stimulus to be timed, their output should be independent of context. An important issue concerns the conditions that allow such mechanisms to be appropriately activated or engaged. We first describe some empirical work conducted to test the counter-intuitive predictions concerning context dependency derived from an SDN framework, and then return to this issue to develop alternative hypotheses.

(c) The reset task

To test the importance of context variation in time perception, Karmarkar & Buonomano (2007) introduced the reset task (figure 1). Participants were presented with a pair of tones, separated by a variable test interval. On each trial, they were required to judge whether this test interval was shorter or longer than an implicit standard of 100 ms. In addition to varying the length of time between the tones, two other factors were manipulated. First, for each block of trials, the test pair of tones was presented either in isolation (two tones or 2T) or preceded by an irrelevant third tone (three tones or 3T), creating what we will refer to as an irrelevant interval between the irrelevant tone and the first tone of the test interval. Second, the irrelevant interval was either fixed at 100 ms or varied approximately 100 ms. These conditions were tested in separate blocks on alternating days. Thus, in fixed blocks, the participants heard either two or three tones with the interval between the first and second tones on the 3T trials fixed at 100 ms. In the variable blocks, they again heard either 2T or 3T stimuli on each trial, but now the irrelevant interval on the 3T trials varied in duration.

Figure 1

Diagram of the reset task (adapted from Karmarkar & Buonomano 2007). (i) The two-tone (2T) trials and (ii) the three-tone (3T) trials, in which an irrelevant interval (D) precedes the test interval (T), are depicted. (a) In the fixed condition, the irrelevant interval was of a fixed duration, either 100 or 300 ms, across the block. (b) In the variable condition, the duration of the irrelevant interval was variable from trial to trial.

The critical comparison arises from the two 3T conditions. For fixed trials, the initial state of the SDN should be, relatively speaking, constant from trial to trial. By contrast, for variable trials, the context and thus the network state will be altered by the inclusion of an initial, irrelevant tone. This should interfere with accurate discrimination of the test interval by adding variability to the final state of the network across trials. Consistent with this prediction, a disruption of timing was found in only the 3T variable condition where the threshold was more than twice as high as in the other three conditions. Interestingly, there was only a modest (and non-significant) effect of the irrelevant interval in the fixed condition even though the SDN model might assume that the resulting activation patterns for the 2T and 3T conditions would also differ here (since the irrelevant tone also produces a change in the initial state of the network at the onset of the test interval). Presumably, the participants were able to simultaneously learn two classifications, one for patterns in the 2T condition and the other for patterns in the 3T condition.

(d) Attentional factors and the reset task

While the SDN model provides a compelling explanation of the psychophysical results on the reset task, it is important to consider the processing demands of the reset task and how these might influence performance. First, as described above, the 2T and 3T trials were randomly interleaved for each block type. While this design ensures that participants attend carefully to all tones in a stimulus, it also creates a high degree of uncertainty. Namely, when the second tone is presented, the participant does not know whether this tone marks the end of the test interval (as would be true on a 2T trial) or the start of the test interval (as is true on 3T trials). As such, it seems reasonable to assume that the participants attend to and even encode the duration of this interval. It is only when the third tone occurs that they can realize the first interval is irrelevant. Thus, the 3T condition has a higher degree of uncertainty than the 2T condition. Moreover, it may require a rapid shift of attention, either to access the representation of the duration of the test interval defined by the second and third tones, or to discard a response associated with the first (irrelevant) interval in favour of one associated with the second (test) interval.

Second, in Karmarkar & Buonomano's (2007) experiment, the irrelevant interval is quite brief, ranging from 50 to 150 ms. The irrelevant tone may trigger a form of automatic attentional capture (Posner 1978) and interfere with the participant's registration of the initial tone of the test interval. By this hypothesis, a variable irrelevant interval might impair performance by introducing noise into the participant's ability to orient to the test interval. A similar hypothesis was proposed by Grondin & Rammsayer (2003) to account for the effect of a variable foreperiod prior to the presentation of brief test intervals.

Third, in addition to the high level of uncertainty in the 3T condition, there may also be some crosstalk between temporal tags linked to the (irrelevant) initial interval and the test interval. For example, a short irrelevant interval in the variable condition might implicitly prime the concept ‘shorter’ and introduce biases with respect to the categorization of the test interval (Grondin & Rammsayer 2003; Xuan et al. 2007).

The uncertainty problem and effects of attentional capture should influence performance in both the fixed and variable duration conditions of the reset task. However, the irrelevant interval on fixed blocks could actually be viewed as providing a task-relevant context for the temporal judgements. Rather than making perceptual judgements based on reference to an implicit standard, one could use the irrelevant interval as an explicit standard. Specifically, the task in the 3T fixed condition could be performed by judging whether the interval separating the second and third tones is shorter or longer than the interval separating the first and second tones. Such a strategy might be expected to actually improve performance in the 3T compared with the 2T fixed condition. However, psychophysical studies have shown that people show minimal cost on duration discrimination tasks when the standard interval is implicit (e.g. Karmarkar & Buonomano 2003). Additionally, benefits from using the ‘irrelevant’ interval as a point of comparison in the 3T condition could be offset by costs associated with shifting attention to the test interval.

In sum, we have outlined a set of hypotheses concerning why performance may deteriorate when a test interval is preceded by a variable context. In the SDN model, the irrelevant tone perturbs the context of the network, introducing variability in the patterns elicited by the test interval. Performance suffers because the number of patterns to be mapped to the response categories is expanded. By contrast, the attentional factors described above emphasize how non-temporal factors might influence performance, independently of whether the representation of duration is based on dedicated or intrinsic mechanisms. We have outlined how the effect of a variable context may be related to the particular conditions used in the experiment reported by Karmarkar & Buonomano (2007), in which both the test and irrelevant intervals varied approximately 100 ms. This configuration would maximize uncertainty as well as invite the effects of attentional capture, and even decision-based priming. As such, we vary the irrelevant and test intervals in the following experiments to contrast the predictions of an SDN model and the uncertainty and attention hypotheses.

2. Experiment 1

To distinguish the predictions of an SDN model from those derived from attentional considerations, we compared the performance of two groups of participants on variants of the reset task in which the test interval varied approximately 100 ms. For the first group, the duration of the irrelevant interval varied by approximately 100 ms, providing a replication of Karmarkar & Buonomano (2007). For the second group, the duration of the irrelevant interval varied by approximately 300 ms. In both the cases, the inclusion of a varying irrelevant tone should produce a variable context in an SDN at the onset of the test interval across trials. As such, we would expect to observe an increase in the difference threshold in the variable, but not fixed 3T conditions regardless of the length of the irrelevant interval.

However, uncertainty should be greatly reduced or abolished when the irrelevant and test intervals are of very different durations (i.e. 300 and 100 ms on average). Thus disruptive effects related to uncertainty should be attenuated when the irrelevant interval is centred at approximately 300 ms. Moreover, costs associated with attentional capture in the variable 3T condition should dissipate with a longer irrelevant interval since the participants have more time to reorient to the test tones. The fixed 3T condition is the best point of comparison for isolating the effect of varying the duration of the irrelevant interval. As a result, the full design provides a replication of Karmarkar & Buonomano (2007).

(a) Methods

(i) Participants

Fourteen individuals (nine males; five females) between 18 and 28 years of age participated in the experiment. All procedures were approved by the Institutional Review Board at the University of California, Berkeley, CH. Informed consent was obtained before the experiment commenced.

(ii) Task

We used the reset task described in Karmarkar & Buonomano (2007). Blocks consisted of 120 trials in which a test interval, defined by two 15 ms tones (1 kHz, 5 ms ascending and descending ramps), was presented. Participants were instructed to judge whether the test interval was ‘shorter’ or ‘longer’ than 100 ms. Responses were made by pressing one of two buttons on a computer mouse. Note that the standard 100 ms interval was not presented during the experimental trials; rather, it was presented at the beginning of an experimental block until the participant indicated they had established an internal representation of this standard. Feedback (‘correct’ or ‘incorrect’) was presented on the computer monitor after each response. This feedback helped the participants maintain their internal representation of the standard 100 ms interval.

A staircase procedure was employed to determine the duration of the test interval on each trial. The initial value of the test interval was set to 100±10 ms. Following Karmarkar & Buonomano (2003, 2007), we used a ‘three up–one down’ adaptive procedure in which the duration of the test interval was reduced following three consecutive trials or increased following every incorrect response. The step size was initially set to 5 ms. Following the third reversal, the step size was reduced to 2 ms. The difference threshold for that block of trials was defined as two times the average of the final three reversals, a value that corresponds to a difference at which participants would be correct on 79 per cent of the reversal values obtained with the smaller step size.

Half of the 120 trials within a block were composed of only the two tones (2T) that formed the test interval. In the remaining half, an irrelevant tone preceded the test interval (3T) forming an irrelevant interval with the first tone of the test interval. With separate staircases maintained for each condition, 2T and 3T trials were randomly interleaved.

There were two types of block. In the fixed condition, the irrelevant tone occurred at a fixed time prior to the start of the test interval, defining an irrelevant interval of a constant duration. In the variable condition, the irrelevant tone occurred at a variable time prior to the onset of the first tone of the test interval, thus defining a variable irrelevant interval. The duration of the irrelevant interval was selected randomly of each trial from a uniform distribution. The range of this distribution was set to ±25 per cent of the mean irrelevant interval. For the 100–100 group (n=9), the mean duration of the irrelevant interval was 100 ms, with a range from 75 to 125 ms in the variable condition. In the 300–100 group (n=9), the mean duration of the irrelevant interval was 300 ms, with a range from 225 to 375 ms in the variable condition.

(iii) Procedures

Each participant was tested in four separate sessions within a 5-day span, with a minimum of 24 hours separating sessions. The sessions alternated between the fixed and variable conditions, with the starting condition counterbalanced across the participants. Within each session, the participants completed six blocks of trials (720 trials total). Four of the participants were tested in both the 100–100 and 300–100 groups, with a minimum of one week between their testing in the two groups.

(b) Results and discussion

Each participant completed 12 staircase procedures (two sessions with six blocks each) for each of the four conditions (2T or 3T, fix and var). The first staircase for each day was treated as practice; thus, for each condition, we calculated a difference threshold as the average of the other 10 threshold estimates. For each group, we performed a three-way ANOVA with the factors tone number (2T versus 3T), condition (fix versus var), and block number (blocks 2–6 for each session)). For graphical purposes, we adopted the convention in Karmarkar & Buonomano (2007), in which the depicted values are double the average of the reversal values from the staircase and thus express the difference threshold as the range between the upper and lower thresholds.

The results (figure 2a) for the 100–100 group replicate those reported by Karmarkar & Buonomano (2007). The main effects of tone number and condition were significant, F1,320=45.9, p<0.001 and F1,320=9.7, p=0.002, respectively. More importantly, there was a significant interaction of tone number and condition, F1,320=5.3, p=0.02. This interaction reflects the fact that thresholds were elevated in the 3T condition with a variable irrelevant interval to a greater extent than with a fixed irrelevant interval. The cost, 16 ms on average, is smaller than the 25 ms increase reported in Karmarkar & Buonomano (2007). This reduction is likely owing to the fact that the range of irrelevant intervals in the current study (±25 ms) is smaller than the range used in the previous work (±50 ms).1

Figure 2

Difference thresholds for experiment 1 in which the standard interval was 100 ms. (a) In the 100–100 group, the irrelevant interval was 100 ms (fix) or varied by approximately 100 ms (var). (b) In the 300–100 group, the irrelevant interval was 300 ms (fix) or varied by approximately 300 ms (var). The difference threshold corresponds to a value two times of that required to be correct on 79% of the trials.

The results for the 300–100 group are presented in figure 2b. There was a main effect of tone number (F1,320=6.1, p=0.01); the thresholds were elevated in both the fixed and variable conditions in the 3T conditions relative to the 2T conditions. The main effect of condition was not significant (F1,320=2.1, p=0.15) and, unlike the 100–100 group, there was no interaction of these factors, (F1,320<1). In terms of individual effects, only one participant showed an increase in the 3T variable condition that fell within the range of the participants in the 100–100 group.

Taken together, the results are at odds with the SDN model. By the SDN model, we would assume that an irrelevant tone that precedes the test interval would alter the state of a network in both the 100–100 and 300–100 conditions. If the context established by this tone varies across trials, then a cost should have been observed in the 3T variable condition for both groups. Instead, a cost was observed only with the 100–100 group.

By contrast, the dissociation between the 100–100 and 300–100 conditions is consistent with the hypothesis that uncertainty could disrupt timing in a variable context. We assume that by making the irrelevant interval considerably longer than the test interval in the 300–100 ms conditions, uncertainty was reduced. Here participants could identify the irrelevant interval and attend to the test interval, with a similar (and modest) cost in the 3T variable condition as that observed in the 3T fixed condition. For the 100–100 group, the impairment in 3T performance may have arisen because participants could not be sure whether the second tone marked the end of the irrelevant interval or the start of the test interval.

The uncertainty hypothesis is further supported by the fact that overall performance was poorer in the 100–100 group compared with the 300–100 group. Since the 2T and 3T conditions were interleaved within each block, there should have been considerable uncertainty present even when the irrelevant interval was fixed. Moreover, this uncertainty would also affect the performance when there were only two tones since participants could not know that it was a test interval until they recognized that there was no third tone. However, this hypothesis must be treated cautiously given that this is a between-subject comparison and only a subset of the participants were tested in both groups.

The results can also be matched with the predictions derived from the attentional capture hypothesis. In all conditions, the presence of the irrelevant tone led to an increase in discrimination thresholds. We assume that the participants initially oriented to the irrelevant tone (and irrelevant interval) and then had to shift attention to the test interval after hearing the second and third tones. The capture hypothesis would predict that the decrement in performance from an irrelevant tone would be attenuated as the interval between that tone and the test interval was lengthened (Grondin & Rammsayer 2003). In terms of mean values, there was a larger overall cost for the 100–100 group, consistent with this prediction. Nonetheless, there remained a small, yet reliable cost in both the fixed and variable conditions for the 300–100 group when the irrelevant tone was presented (3T), consistent with the idea that the presence of this tone entailed some attentional cost.

As noted in the introduction, having similar durations for the irrelevant and test intervals may not only entail attentional costs, related to uncertainty, capture or both, but could also produce crosstalk between the perceived duration of the irrelevant and test intervals. To assess this hypothesis, we conducted a post hoc analysis in which we evaluated whether the response to the test interval was biased by the duration of the irrelevant interval. For this analysis, we focused on the variable conditions and divided the trials into six bins based on the irrelevant interval duration, ignoring the duration of the test interval.

A strong bias was evident in the responses of the 100–100 group (figure 3a). These participants were much more likely to respond short when the irrelevant interval was short and long when the irrelevant interval was long, even though there was no correlation between the durations of the irrelevant and test intervals. The magnitude of the bias was quite strong, varying by almost 30 per cent over the range of irrelevant intervals. This bias adds further support to the conjecture that the participants in the 100–100 ms group had difficulty selectively attending to the test interval. A similar, but somewhat weaker, bias was also present for the 300–100 group (figure 3b).

Figure 3

Response bias analysis for experiment 1. Percentage of trials in which the participant responded ‘long’ as a function of the duration of the irrelevant interval on the 3T variable trials ((a) 100–100 group, (b) 300–100 group). The irrelevant intervals were sorted into six bins.

In summary, the results of experiment 1 fail to conform to the predictions of the SDN model. The disruptive effects of a variable context were limited to conditions in which the irrelevant intervals were short and overlapped with the duration of the test intervals. We note here one important caveat. The length of the irrelevant interval in the 300–100 group may have been sufficient to allow the network to reset prior to the onset of the test interval. In this case, the absence of an effect in the variable condition may not contradict SDN models, but may be indicative of their temporal range for duration discrimination. While we return to this issue in §4, we first present a second experiment to further examine the attentional factors at play in these duration discrimination tasks.

3. Experiment 2

The goal of experiment 2 was to compare the uncertainty and attentional capture hypotheses. In all conditions, the test interval varied around a 300 ms implicit standard. As in experiment 1, this interval was either presented alone (2T), or preceded by a tone that defined an irrelevant interval (3T). The mean of the irrelevant interval was either 100 ms or 300 ms. By the uncertainty hypothesis, discrimination thresholds should be the poorest when the irrelevant interval varies approximately 300 ms since this condition would maximize similarity between the irrelevant and test intervals. That is, the interaction between the number of tones and the type of irrelevant interval should now be found for the 300 ms irrelevant tone length condition instead of the 100 ms condition. Indeed, the uncertainty hypothesis would predict minimal disruption of performance with a 100 ms irrelevant interval. By contrast, a capture-based attentional account in which the irrelevant tone disrupts the registration of the initial test tone would lead to the opposite prediction. By this hypothesis, thresholds should be more affected when the irrelevant interval is short (100 ms) compared with when it is long (300 ms).

Experiment 2 also provides a second test of the SDN model. As in experiment 1, the most general form of the model would predict that a variable context should disrupt performance; that is, thresholds should be elevated in the 3T variable conditions with either 100 ms or 300 ms irrelevant intervals.2 Assuming that the contextual effects dissipate with time, this effect would be attenuated with a variable 300 ms irrelevant interval.

(a) Methods

(i) Participants

Sixteen individuals (6 males; 10 females) between 18 and 32 years of age were recruited for the experiment.

(ii) Task and procedure

The task and procedures were identical to that of experiment 1 with the following changes. First, the standard interval was set to 300 ms. Second, the initial value of the test interval was set to 300±30 ms and the step sizes were initially 30 ms, dropping to 15 ms after three reversals.

There were two groups of participants, a 100–300 group (n=9) and a 300–300 group (n=9). The range of irrelevant intervals in the variable conditions was again ±25 per cent of the mean irrelevant interval. Four individuals were tested in both groups. One participant was also tested in both groups of experiment 1. As in experiment 1, within each group, participants completed four test sessions, two with the 2T and 3T fixed conditions and two with the 2T and 3T variable conditions.

(b) Results and discussion

Difference thresholds were again calculated as the average of the final five threshold estimates for each session. The results for both groups are shown in figure 4a,b. For the 100–300 group, the effect of tone number was significant, F1,320=34.0, p<0.001. There was neither the effect of condition (var versus fix: F1,320<1) nor the interaction reliable F1,320<1. For the 300–300 group, the main effect of tone number approached significance, F1,320=2.9, p=0.09. The main effect of condition and the interaction were not significant (all Fs<1).

Figure 4

Difference thresholds for experiment 2 in which the standard interval was 300 ms. Irrelevant intervals were determined in the same way as shown in figure 2 ((a) 100–300 group, (b) 300–300 group).

The results of this experiment fail to support the predictions of both the SDN model and the uncertainty hypothesis. At odds with the SDN model, we did not observe an increase in threshold when the context varied compared with when the context was fixed in both groups. The result for the 100–300 group is especially striking given that the mean and range of the irrelevant interval are identical to that used with the 100–100 group in experiment 1. The results for the 300–300 group argue against the uncertainty hypothesis given that there was no indication of the predicted interaction despite the high degree of overlap between the irrelevant and test intervals.

The results, at least in part, are in accord with the attentional capture hypothesis. A significant cost in performance was observed when there was a brief interval between the irrelevant tone and the test interval. Under such conditions, participants would not be fully prepared for the presentation of the test tones (Grondin & Rammsayer 2003). Unlike, experiment 1, this effect was similar for the fixed and variable conditions. A smaller, non-significant effect was also found for the 300–300 group, suggesting that the cost associated with an irrelevant tone decreases with increasing time between that tone and the test interval.

As in experiment 1, we performed an analysis to identify biasing effects induced by the irrelevant interval. The data were binned by the duration of the irrelevant interval while ignoring the duration of the test interval. As shown in figure 5, the duration of a short irrelevant interval produced a strong bias in the judgement of the test interval. We note that the bias created by a short irrelevant interval is much smaller on a longer test interval (300 ms) than on a short test interval (100 ms as in experiment 1).

Figure 5

Response bias analysis for experiment 2. Data are plotted as in figure 3 ((a) 100–300 group, (b) 300–300 group).

To summarize, the results of experiment 2 provide a further challenge to the SDN account of context effects, as well as argue against the uncertainty hypothesis. Instead, they are consistent with an attentional capture hypothesis. An irrelevant tone can disrupt the perception of the duration of an interval marked by auditory information, most notably when there is only a brief gap between the irrelevant tone and the test interval. Moreover, in situations with these brief gaps, the duration of the gap introduces a strong bias on the reported percept.

4. General discussion

An important and appealing feature of intrinsic models of timing is that they suggest how temporal information may be encoded in a task-specific, flexible manner. By using intrinsic neural properties, the representation of time does not require a dedicated system, but rather is available from network dynamics. Such dynamics, by definition, are highly context sensitive. The flexibility that allows temporal information to be encoded in a distributed manner also imposes a strong constraint on the generality of such systems. For example, perceptual training in one modality or with a particular duration may show little transfer to another modality or duration (e.g. Wright et al. 1997; Karmarkar & Buonomano 2003).

The reset task introduced by Karmarkar & Buonomano (2007) provided an intriguing demonstration of context specificity effects in a simple duration discrimination task. The two experiments described here provide further evidence of how contextual factors can influence perception. We first review how these results might be interpreted within the framework on one particular form of intrinsic timing, SDN. We then turn to an alternative perspective in which the effects of context are attributed to non-temporal factors that may influence performance on duration discrimination tasks, independent of whether the temporal representation is encoded intrinsically or by a dedicated timing mechanism.

(a) Constraints on the temporal extent of state-dependent networks

The current results point to limitations in the explanatory power of SDN. The disruptive effects of a variable context were limited to the condition in which both the irrelevant and test intervals were centred at approximately 100 ms. If either interval was centred at approximately 300 ms, the inclusion of a third tone tended to increase the discrimination threshold, but the magnitude of this increase was similar for the fixed and variable contexts. One interpretation of these findings is that the temporal extent of SDN may be limited to events spanning up to only a few hundred milliseconds. Computational studies of SDN (e.g. Buonomano & Merzenich 1998; Markram et al. 1998; Reyes & Sakmann 1999) have always incorporated the constraint that the underlying physiological mechanisms are of limited temporal extent. The network is essentially reset for events that exceed this temporal extent, thus eliminating contextual effects. Indeed, Karmarkar & Buonomano (2007) included control conditions in which they used an irrelevant interval centred at approximately 1000 ms. With intervals of this length, performance was similar for the variable and fixed conditions.

It is possible that the longer intervals used in the present experiments exceed the temporal extent of an SDN. Such a constraint would be consistent with the results of experiment 1 in which we failed to find a disruptive effect of a 300 ms irrelevant interval on the participants' ability to judge a 100 ms interval. Similarly, the 300 ms test intervals in experiment 2 might be outside the temporal boundary of an SDN, even when this test interval was preceded by a brief irrelevant interval.

While consistent with the results, this interpretation would place an important temporal limitation on the applicability of SDN for understanding basic phenomenon in the timing literature. Temporal processing of intervals limited to just a few hundred milliseconds is certainly important for many tasks; for example, most of the critical temporal cues in speech perception are less than 100 ms (see Buonomano & Merzenich 1998). However, substantial behavioural, neuropsychological and neuroimaging literature related to temporal perception and production is based on intervals that generally range from 250 to 1000 ms (see reviews in Lewis & Miall 2003; Ivry & Spencer 2004). The current results should serve as a cautionary note for theorists who seek to apply concepts derived from biologically plausible models of SDN, as well as other forms of intrinsic timing, to a larger domain of temporal phenomena.

(b) Attentional influences on time perception

We considered alternative accounts of why an irrelevant, variable context would lead to a marked increased in discrimination thresholds. A first hypothesis related to the specific task used in Karmarkar & Buonomano's (2007) study. By mixing trials in which the irrelevant interval was either present or absent, the participants faced an ambiguous situation: the second tone could either mark the end of the test interval (2T) or mark the start of the test interval (3T). We tested this hypothesis in experiments 1 and 2 by either using similar or dissimilar ranges for the irrelevant and test intervals. While the predictions of the uncertainty hypothesis were supported in experiment 1, the level of uncertainty did not influence performance in experiment 2.

A more parsimonious account of the current results emerges from the consideration of two other attentional effects, what we have referred to as attentional capture and bias. The former refers to the fact that orienting processes are known to be strongly influenced by auditory signals (Posner 1978). We suggest that the irrelevant tone may transiently capture attention. A consequence of this would be an increase in noise associated with the registration of the first tone defining the test interval, and correspondingly, an increase in discrimination thresholds. In both experiments, discrimination thresholds were higher when there was an irrelevant tone, an effect observed in both the fixed and variable conditions (although this effect was only marginally reliable for the 300–100 group in experiment 1). Moreover, the magnitude of this effect was larger when the irrelevant interval was 100 ms compared with when it was 300 ms. With a longer irrelevant interval, there may be sufficient time to orient to the task-relevant stimulus.

The duration of the irrelevant interval in the variable conditions was also found to have a biasing effect on the participants’ judgements of the duration of the test interval. Again, this effect was much stronger when the variable interval was centred at approximately 100 ms compared with when it was centred at approximately 300 ms, independent of the range of the test interval. An explanation of the biasing effects can be derived from an SDN framework: the irrelevant interval might be coded together with the test interval as a single temporal object. A shorter irrelevant interval would reduce the total duration of a 3T object, creating the bias found in the responses.

However, the biasing effects are also consistent with the attentional capture hypothesis. Suppose the irrelevant tone produces a delay in the registration of the first tone of the target interval. If the irrelevant interval is short, the participant would be relatively late in orienting to the first test tone. This would result in the test interval being perceived as shorter in duration. Grondin & Rammsayer (2003) described similar orienting effects on both difference thresholds and bias in a series of experiments in which the duration of the foreperiod between the end of one trial and the onset of the next was manipulated.

The biasing effects may also arise at later stages of processing. An incidental temporal tag might be generated for the irrelevant interval. Crosstalk between these codes and the temporal tags associated with the target interval could interact on decision processes (Xuan et al. 2007; Ivry & Schlerf 2008). Thus, for a test interval of a particular duration, short irrelevant intervals would produce a bias to respond short and long irrelevant intervals will produce a bias to respond long. The current design does not provide a clear way to assess whether the biasing effects are related to an attentional capture, response bias or both.

One point to emphasize in closing is that the attentional issues we have discussed underscore one way in which non-temporal factors may influence the fidelity of a temporal representation. Such factors are independent of the mechanisms that constitute these representations, and as such are important to consider for both dedicated and intrinsic models of temporal processing (Ivry & Schlerf 2008).

Acknowledgments

All procedures were approved by the Institutional Review Board at the University of California, Berkeley. R.M.C.S. was funded by NIH F32 NS048012 and is currently funded by K99/R00 AG29710. R.B.I. is funded by NIH HD060306.

Footnotes

  • The first two authors contributed equally to this work.

  • One contribution of 14 to a Theme Issue ‘The experience of time: neural mechanisms and the interplay of emotion, cognition and embodiment’.

  • 1 We opted to use the narrower range for two reasons. First, we wanted the same proportional range of irrelevant intervals for the 100–100 and 300–100 groups. Second, we wanted to ensure that the distractor and test intervals were very distinct for the 300–100 group.

  • 2 We note that a similar prediction would be made should both capture and uncertainty be operative, with the former predicting this effect for the shorter irrelevant interval and the latter predicting this effect for the longer irrelevant interval. While this is problematic, the interaction was not observed in either condition.

References

View Abstract