The ability to determine the interval and duration of sensory events is fundamental to most forms of sensory processing, including speech and music perception. Recent experimental data support the notion that different mechanisms underlie temporal processing in the subsecond and suprasecond range. Here, we examine the predictions of one class of subsecond timing models: state-dependent networks. We establish that the interval between the comparison and the test interval, interstimulus interval (ISI), in a two-interval forced-choice discrimination task, alters the accuracy of interval discrimination but not the point of subjective equality—i.e. while timing was impaired, subjective time contraction or expansion was not observed. We also examined whether the deficit in temporal processing produced by short ISIs can be reduced by learning, and determined the generalization patterns. These results show that training subjects on a task using a short or long ISI produces dramatically different generalization patterns, suggesting different forms of perceptual learning are being engaged. Together, our results are consistent with the notion that timing in the range of hundreds of milliseconds is local as opposed to centralized, and that rapid stimulus presentation rates impair temporal discrimination. This interference is, however, decreased if the stimuli are presented to different sensory channels.
Timing in the range of tens of milliseconds to a few seconds is of fundamental importance for a wide range of sensory and motor tasks (Ivry & Spencer 2004; Mauk & Buonomano 2004; Buhusi & Meck 2005; van Wassenhove 2009). For example, the ability to discriminate the interval and duration of sounds is critical for speech processing (Liberman et al. 1956; Scott 1982; Drullman 1995; Shannon et al. 1995; Aasland & Baum 2003). However, the neural mechanisms involved even in a simple temporal task, such as interval discrimination, remain unknown.
Advances in the understanding of the neural basis of learning and memory benefited tremendously from the realization that memory was not a unitary process, but could be divided into declarative and non-declarative memory and each of these into further subdivisions (Squire 1986). Similarly, the emerging realization that temporal processing is not a unitary neural process, but probably encompasses a number of independent or interdependent processes, is an important factor in understanding existing data and in guiding future experiments.
The mammalian brain processes temporal information and tells time over time scales exceeding 10 orders of magnitude: from the few microseconds used for sound localization, to daily, monthly and yearly rhythms relevant to sleep–wake, menstrual and seasonal cycles, respectively (Buonomano 2007). It is well established that the neural mechanisms underlying the shortest and longest extremes of temporal processing, sound localization and circadian rhythms, are entirely distinct and independent (Carr 1993; King & Takahashi 2000; Panda et al. 2002). While the mechanisms underlying timing in the intermediary range of milliseconds to minutes are not understood, it is becoming increasingly evident that this range is likely to also encompass distinct mechanisms (Fraisse 1984; Gibbon et al. 1997), and the distinction has been made between perceptual/automatic versus cognitive timing (Michon 1985; Rammsayer 1999; Lewis & Miall 2003), millisecond timing versus interval timing (Buhusi & Meck 2005) and millisecond versus second timing (Mauk & Buonomano 2004). Thus, in order to address the neural mechanisms of timing, it is useful to distinguish between various potential divisions of temporal processing. Although, the correct taxonomy of temporal processing remains an open question, relevant classification dimensions include:
Time scale: perceptual versus cognitively mediated timing. A number of groups have proposed and presented evidence suggesting that there is a mechanistic distinction between time perception on the scale of a few hundred milliseconds and time estimation on the scale of seconds and minutes (Michon 1985; Rammsayer & Lima 1991; Rammsayer 1999; Buonomano & Karmarkar 2002; Lewis & Miall 2003). Exactly where the boundary lies is debated. However, it is likely that there is a time range in which there is significant overlap between rapid perceptual and slower cognitively mediated timing (see §4).
Sensory versus motor timing. Another distinction that should be considered is whether sensory and motor timing rely on shared mechanisms (Ivry 1996; Mauk & Buonomano 2004; Buonomano 2005). For example, does a complex sensory temporal task such as Morse code recognition rely on the same circuits as generating Morse code? Although accuracy in sensory and production tasks is correlated (Ivry & Hazeltine 1995; Merchant et al. 2008), it is not known whether such correlations reflect a shared timer or common performance, memory or cognitive factors (Helmbold et al. 2007).
Centralized versus local. The notion of a ‘central clock’ has been prevalent in the timing field, and implies that temporal processing across sensory modalities relies on the same neural circuitry. The opposing view is that timing is local and distributed (Buonomano & Karmarkar 2002; Ivry & Spencer 2004)—an interval discrimination task in the auditory and visual modality would rely on distinct neural circuits. A number of recent experiments have favoured the local hypothesis (Johnston et al. 2006; Burr et al. 2007; Karmarkar & Buonomano 2007) leading to the possibility that the subsecond timing scale is performed locally, while longer conscious time estimation could rely on a centralized mechanism.
Dedicated versus intrinsic. An additional dichotomy, related to the issue of central versus local timing, is whether the neural mechanisms that are ultimately performing the temporal computations—independent of their location—are specialized for timing (Ivry & Schlerf 2008). Dedicated models maintain the presence of specialized neural mechanisms, such as an internal clock composed of a pacemaker and counter, whose primary or sole function would be to tell time. Intrinsic models hold that temporal processing is a general feature of neural circuits, and that these same circuits process both spatial and temporal information in a multiplexed fashion.
Determining the correct temporal taxonomy will be critical in establishing a coherent and consistent interpretation of the increasing number of experiments aimed at understanding temporal processing. These issues and the different models of temporal processing will not be addressed in detail here, as they have been discussed in a number of recent reviews (Lewis & Miall 2003; Ivry & Spencer 2004; Mauk & Buonomano 2004; Buhusi & Meck 2005; Ivry & Schlerf 2008) as well as in the accompanying articles in this issue. Here, we will focus primarily on describing what we will refer to as the state-dependent network (SDN) model, which relates to subsecond sensory timing, and experimentally examine some of its predictions.
(a) State-dependent network model
The SDN model proposes that temporal processing is inherently encoded in the state of neural networks (Buonomano & Merzenich 1995; Buonomano 2000). A useful analogy is dynamics in a liquid. A pebble thrown into a pond will create a spatial–temporal pattern of ripples, and the pattern produced by any subsequent pebbles will be a complex nonlinear function of the interaction of the stimulus (the pebble) and the internal state of the liquid (the current pattern of ripples). Ripples thus establish a short-lasting and dynamic memory of the recent stimulus history of the liquid. The state of a neural network includes ongoing activity (the active state) and the presence of time-dependent neuronal properties (the hidden state) (Buonomano & Maass 2009). In the case of an auditory interval discrimination task, there is an ‘empty’ period in the stimulus, during which the auditory cortex neurons generally stop firing, thus timing would rely primarily on the hidden state; i.e. the change in network state produced by properties such as short-term synaptic plasticity. In an interval discrimination task, the first tone will activate a population of neurons within a local cortical network; given the presence of many experimentally characterized neuronal and synaptic properties, with time constants in the order of hundreds of milliseconds, this local network should be in a different state before the arrival of the second pulse 100 ms later. For example, as a result of short-term synaptic plasticity (Zucker 1989; Reyes & Sakmann 1999) synapses may be stronger or weaker, which should alter the population response to the same input. Differences in the population response can in turn code for time. In a sense, in the same manner that long-term potentiation provides a memory of coincident activity between groups of synapses that occurred minutes or hours in the past (Brown et al. 1990; Karmarkar et al. 2002), short-term synaptic plasticity provides a memory of an event that happened a hundred milliseconds ago.
The SDN model can be considered an intrinsic model of timing, in that it does not rely on what most would consider specialized timing mechanisms—although it could be argued that one of the specialized functions of short-term synaptic plasticity is temporal processing. Similarly, this class of models is also local, i.e. any cortical network could potentially process temporal information. Furthermore, interval discrimination could potentially rely on temporal processing at multiple sequential stages in the sensory hierarchy, and the relative contribution of low- and high-level areas could depend on the nature and design of the task.
The SDN model predicts that the arrival of each sensory event is encoded in the temporal context of previous events. Specifically, the second tone of a 100 ms interval arrives in the network state established by the first tone, and thus the population response can encode this interval. However, if that 100 ms interval happened to be preceded by another tone, then it will be superimposed on yet another neural network state. In the same manner that previous ripples on the surface of a pond will establish a ‘context’ or state that will alter the ripples produced by the next pebble thrown in, each sensory event will alter the response to the next. Because the superposition of these states is highly nonlinear, this model predicts that there is no built-in linear metric of time, such as the ticks of a clock. Thus, during a two-interval forced-choice interval discrimination task, the presentation of the standard interval can interfere with the processing of the comparison interval if the network has not had time to ‘reset’. Here, reset would correspond to the network returning to some baseline state, the time required for this would be determined by the time constants of the relevant time-dependent neuronal properties. For short-term synaptic plasticity, this is in the range of a few hundred milliseconds. Recent experimental results have established that indeed, short interstimulus intervals (ISIs) in an interval discrimination task impair temporal processing (Karmarkar & Buonomano 2007). Importantly, however, if both intervals were presented using tones of different frequencies, little or no impairment was observed. Thus, suggesting that timing is occurring locally, i.e. one interval does not interfere with the timing of the next if it arrives in a different local network—as would be expected during the presentation of different tone frequencies as a result of the tonotopic organization of the auditory cortex. Here, we examine a number of related predictions generated by the SDN model.
2. Material and methods
Subjects consisted of paid undergraduate and graduate students who reported having normal hearing, and were between the ages of 18 and 30 from the UCLA community. All experiments were run in accordance with the University of California human subjects guidelines.
Two-interval forced-choice procedure (experiment 1). Subjects were presented with both a standard and comparison interval on each trial. The comparison interval was equal to the standard (100 ms)±Δt. Δt was varied adaptively according to a three-down and one-up procedure (Levitt 1971; Wright et al. 1997). The standard stimulus was always presented first. Following an ISI, which varied according to the experimental condition, the comparison interval was presented and subjects were asked to judge whether the first or the second stimulus was the longest. The point of subjective equality (PSE) and the difference limen (DL; just noticeable difference) were calculated from the psychometric functions (see below).
The mean ISI for the short and long conditions was 250 and 750 ms, respectively. For each trial, the ISI was chosen from a uniform distribution between ISI±ISI×0.25. Subjects responded by pressing one of two buttons on a computer mouse, and were provided with immediate visual feedback after each response. All stimuli were generated in Matlab and presented through headphones. Each interval was bounded by a 15 ms long tone including a 5 ms on and off ramp. In the same frequency conditions, both intervals were bounded by 1 kHz tones; in the different frequency conditions, the standard and comparison intervals were presented with 1 and 4 kHz tones, respectively. A total of 19 subjects participated in experiment 1.
Two-interval forced-choice procedure (experiment 2). The same two-interval forced-choice procedure described above was used except that the comparison interval was equal to the standard +Δt, and the presentation order of the standard and comparison interval was randomized. In this task, threshold was defined as the mean of the reversal values (after excluding the first three reversals), which corresponds to a 79 per cent correct performance level (Wright et al. 1997). As in the same frequency condition of experiment 1, all stimuli consisted of 1 kHz tones. A total of 15 subjects participated in experiment 2.
Learning experiments (experiment 3). As in experiment 1, a two-interval forced choice with an adaptive procedure that allowed for both ±Δt values was used. The threshold was defined as the mean of the reversal values. A total of 24 subjects participated in experiment 3.
Protocol. All experiments consisted of the presentation of at least three blocks of each condition. Each block was composed of 60 trials and presented in pseudo-random order. In the studies described in experiment 1, two 1 hour sessions (on consecutive days) were administered, with 12 blocks in each session—for a total of six blocks for each of the four conditions. In the learning experiments, during the 8 training days, subjects performed 12 blocks, all of a single condition. Feedback was presented after each trial in all the experiments presented here.
Estimation of PSE and DL. Analysis of experiment 1 consisted of fitting the data from the adaptive procedure with the logistic function for the estimation of the psychometric function (Kaernbach 2001). We used all the data from all blocks for a given condition except the first block—which was treated as a ‘practice’ session. The bisection point at p=0.5 was taken as the PSE, and the gain of the logistic function times log (0.75/0.25) as the DL (Lapid et al. 2008).
(a) Effects of different ISIs and frequency on accuracy and PSE
As mentioned above, a previous study determined that short ISIs impaired interval discrimination if both intervals were presented at the same frequency, but not if they were presented using different frequencies (Karmarkar & Buonomano 2007). This study, however, did not examine whether the ISI effect was produced by a shift in the PSE (corresponding to time compression or dilation). Here, we first examined the effect of ISI and of changing frequencies on the PSE and DLs using a two-interval forced-choice procedure that allowed for the estimation of the psychometric functions.
The standard interval was 100 ms. We used a 2×2 design, varying the ISI and frequency. The ISI was either short or long (mean of 250 or 750 ms, respectively). The standard and comparison interval were of the same or different frequencies (see §2), resulting in four conditions: ShortISI-SameFr; LongISI-SameFr; ShortISI-DiffFr; and LongISI-DiffFr. The fitted psychometric functions for all subjects are shown in figure 1a. Group data suggest different DLs, but similar PSE values for all four conditions (figure 1b,c). A two-way analysis of variance with repeated measures revealed a significant interaction between ISI and frequency on the DLs, indicating an increase in threshold in the ShortISI-SameFr condition (F1,18=33, p<0.0005). By contrast, there was no significant interaction or main effects on the PSE.
These results replicate the main finding of previously published experiments (Karmarkar & Buonomano 2007): that short ISIs impair interval discrimination. Since this effect is limited to cases in which both the standard and comparison intervals are presented at the same frequencies, it seems that it is not a result of a general or non-specific effect of the increased stimulus presentation rate, but rather a result of the interference of the preceding stimulus on subsequent processing of intervals coming in on the same channel. Additionally, these results indicate that the impairment was not produced by time compression or dilation effect since there was no detectable shift in the PSE. Rather, the decrease in performance is attributable to a change in the precision of temporal discrimination. In these experiments, the subjects received feedback after each trial, thus it is possible that the lack of a change in the PSE was due to ongoing ‘recalibration’ during each block. However, a separate set of experiments in which the feedback was omitted still revealed the same effect of the DL and no effect on the PSE.
(b) Effects of different ISIs on threshold
The above results and previously published data (Rammsayer 1999) are consistent with the notion that there is a transition between different neural mechanisms underlying timing somewhere in the range of hundreds of milliseconds. In order to gain insights as to where the boundary between millisecond and second timing lies, we performed further experiments in which we varied the ISI over five different intervals (50, 250, 500, 750 and 1000 ms), again using a 100 ms standard. Additionally, we performed a set of control experiments in which we examined the effect of three ISIs (250, 500 and 750 ms) on a frequency discrimination task.
To obtain accurate threshold estimates with fewer runs, we used the reversal values of the adaptive procedure as opposed to the estimation of the psychometric functions to quantify performance. As shown in figure 2, the thresholds were higher for the two shorter ISIs. A repeated-measure ANOVA revealed a significant effect of ISI (F4,56=6.6, p<0.001). A planned comparison revealed that the only significant difference between adjacent ISIs was between 250 and 500 ms (p=0.016; Bonferroni corrected). In contrast to the effect of ISI on interval discrimination, there was no significant effect of the three ISIs examined on frequency discrimination (F2,28=0.79, p=0.46).
These results demonstrate that the impairment of 100 ms discrimination produced by short ISIs is strongest at 50 and 250 ms. Interestingly, the magnitude of the impairments was not significantly different between the 50 and 250 ms ISIs. The presence of a significant difference in performance between the 250 and 500 ms ISIs, together with the absence of a difference between 500 and 750 ms, suggests that in the framework of the SDN model, local networks settle back to a baseline state between 250 and 500 ms.
(c) Learning and ISI specificity
Another approach to examining whether time is encoded in the population response of local networks, which in turn are influenced in a nonlinear fashion by the temporal context established by previous sensory events, is to examine generalization patterns of perceptual learning. Previous studies have used generalization to examine both the temporal specificity of interval learning and whether it generalizes across frequency channels and sensory modalities (Wright et al. 1997; Nagarajan et al. 1998; Westheimer 1999; Meegan et al. 2000; Karmarkar & Buonomano 2003). We next examined the results of training two groups of subjects on either the ShortISI-SameFr or LongISI-SameFr condition. The first goal of this study was to determine whether training on the ShortISI-SameFr could overcome the performance deficits observed above. The second goal was to examine, if learning occurred, whether it would generalize to the remaining three conditions.
Experiments were performed over 10 days. During the first and last days, subjects were administered three blocks of each of the four conditions. In the intervening 8 days, subjects ran 12 blocks on the trained condition (ShortISI-SameFr or LongISI-SameFr). Figure 3a,b show the learning curves of the subjects in both conditions. In each condition, eight subjects exhibited significant learning curves as determined by a significant linear trend using a one-way repeated ANOVA. The analysis of the generalization patterns was based on this subgroup of ‘learners’ (Wright et al. 1997; Karmarkar & Buonomano 2003). Importantly however, there was a significant difference in the pre- and post-test values for both groups when tested on their trained conditions (ShortISI-SameFr or LongISI-SameFr) independent of whether all subjects or the subset of learners were considered. To determine whether learning in each group generalized to the other three conditions, we performed a two-way ANOVA (repeated measures on both factors), where one factor was pre-test versus post-test, and the other, the three naive conditions. As shown in figure 3b, in the ShortISI-SameFr group, there was no significant main effect of training (F1,7=0.82, p=0.39) or of the interaction (F2,14=0.58, p=0.57). By contrast, in the subjects trained on the LongISI-SameFr conditions, there was a highly significant effect of training (F1,7=25.6, p<0.002) and no significant interaction (F2,14=0.68, p=0.52).
These results established that independent of whether subjects were trained on the ShortISI-SameFr or LongISI-SameFr conditions, they improved on the trained stimulus set. However, while the subjects trained on the ‘easy’ (LongISI-SameFr) showed robust generalization, those trained on the ‘hard’ condition (ShortISI-SameFr) did not show any significant transfer to the naive conditions. Interestingly, these transfer results are consistent with generalization patterns in other forms of perceptual learning, specifically training on an easy condition produces more robust generalization (Ahissar & Hochstein 1997). Indeed, training on LongISI-SameFr seemed to be as effective as actual training on the ShortISI-SameFr in improving ShortISI-SameFr performance. Specifically, post-test SameISI–SameFr threshold was on an average lower in the LongISI-SameFr group than in the SameISI–SameFr condition.
The above results provide a new set of constraints that must be accounted for by any general model of temporal processing in the millisecond range. While the results are largely consistent with the predictions made by the SDN model, they also highlight the need to further refine this model and cannot exclude a number of additional models. Below we address the implications of the current results.
(a) The boundary between time perception and time estimation
Some of the first psychophysical evidence that millisecond and second timing may rely on distinct mechanisms was provided by Rammsayer and colleagues who showed that discrimination of a 1 s interval was impaired when subjects performed an additional cognitive task, but 50 ms discrimination was not (Rammsayer & Lima 1991). Additionally, pharmacological manipulations of the dopaminergic system and benzodiazepines can differentially affect 50–100 ms and 1 s discrimination (Rammsayer 1997, 1999). The observation that the relationship between performance and the standard interval, as measured by the coefficient of variation, is higher for short intervals, has also been used to argue that there is a transition between timing mechanisms in the range of hundreds of milliseconds (Gibbon et al. 1997; Mauk & Buonomano 2004). Additionally, experiments which show that short intervals are more impaired in inter-modal timing tasks are consistent with the notion that millisecond processing may rely more on local channel-specific networks, while longer intervals may be more centralized and less influenced by channel manipulations (Rousseau et al. 1983). In a meta-analysis study, Lewis & Miall (2003) suggested that the differential patterns of blood-oxygen-level-dependent activity in short- and long-interval discrimination tasks are also consistent with distinct neural mechanisms. Recent results have further supported the presence of different mechanisms by showing that a distractor stimulus preceding the interval to be discriminated impairs 100 ms, but not 1 s discrimination (Karmarkar & Buonomano 2007).
While there is mounting evidence for distinct mechanisms for a perceptual and cognitive timing, the boundary and degree of overlap between them is unclear. One of the goals of experiment 2 was to use the hypothesis that perceptual timing relies on local state-dependent computations and thus is susceptible to interference by preceding stimuli, and to examine the issue of where the transition between short- and long-interval mechanisms lies. The results suggest that the boundary may lie between 250 and 500 ms. This range is consistent with the proposal that time-dependent neural properties such as short-term synaptic plasticity may underlie temporal processing, since many forms of short-term synaptic plasticity seem to take a few hundred milliseconds to ‘reset’, i.e. return to baseline PSP amplitude (Markram et al. 1998; Reyes & Sakmann 1999; Marder & Buonomano 2003).
An additional task that has been used to examine the boundary between different timing scales was one in which a variable ‘distractor’ is presented before the comparison interval. Similarly, to the task studied here, this distractor was predicted to alter the subsequent timing by placing the network in a different state during each trial. It was originally shown that a distractor with a 100 (50–150) ms mean significantly decreased discrimination of a 100 ms task, but a proportional distractor did not impair a 1 s discrimination task (Karmarkar & Buonomano 2007). A recent study replicated this finding for a 100 ms standard interval, but reports that a 300 (225–375) ms distractor did not alter discrimination of a 300 ms interval (Spencer et al. 2009); however, an additional study has reported a significant effect of a standard interval of 300 ms using variable distractors with the same mean but with a range of 150–450 ms (Rocca & Burr 2007).
Together, current studies suggest a boundary between perceptual and cognitive timing in the range of hundreds of milliseconds, and well below 1 s. However, it is important to stress that in addressing the existence of distinct mechanisms for millisecond and second timing, it is critical to note that an actual ‘hard’ boundary is unlikely, rather a transition range with a significant degree of overlap is likely to be present. Furthermore, within this transition zone, it is likely that both mechanisms could operate in parallel and their respective contributions could depend on the nature of the task at hand.
(b) Temporal perceptual learning
Previous studies on perceptual learning of interval discrimination have revealed that learning is temporally specific; learning of one interval does not generalize to other intervals (Wright et al. 1997; Nagarajan et al. 1998; Karmarkar & Buonomano 2003). However, these studies and other studies have also demonstrated that interval learning can generalize to different auditory frequencies (Wright et al. 1997; Karmarkar & Buonomano 2003), visual locations (Westheimer 1999) and from one modality to another (Nagarajan et al. 1998; Meegan et al. 2000). The interval specificity could be interpreted as meaning that there are specialized timing circuits for each interval; however, this is also what is expected from the SDN model if one assumes that learning consists of an improved readout of the population code specific to each interval (Buonomano 2000). By contrast, the generalization to different spatial channels could be used to argue that there is a central timer (see below).
The perceptual learning results presented here further establish that interval discrimination undergoes learning, and demonstrate that the severe impairment produced presenting the standard and comparison intervals in close temporal proximity can be overcome to the extent that performance becomes similar for both the short and long ISIs (figure 3b). Interestingly, however, training on the ShortISI-SameFr condition did not improve performance on the LongISI-SameFr condition. This result is unique in that it demonstrates a highly specific form of learning, i.e. there was no generalization to the same standard interval of 100 ms when the ISI was 750 ms—in other words, in this case learning was specific to both the ISI of 250 ms and the frequency condition. By contrast, training on the LongISI-SameFr condition transferred to other conditions. Thus, LongISI-SameFr training did result in improvement in the ShortISI-SameFr; however, there was still a significant difference between LongISI-SameFr and ShortISI-SameFr after training (p=0.003), which was not the case after ShortISI-SameFr training. Thus, the fact that the ISI impairment was erased after training on ShortISI-SameFr but not after LongISI-SameFr training suggests that qualitatively different learning strategies are being engaged (see below).
(c) Open questions in the SDN model
As recently pointed out by Ivry & Schlerf (2008), a number of critical issues remain unaddressed in most models of temporal processing, including in the SDN model. One issue relates to the transfer of interval discrimination learning to different sensory channels (Wright et al. 1997). First, in interpreting these psychophysical results, it is critical to recall the often implicit assumption that there exists a single mechanism or site of learning is unlikely to be true. Neurophysiological and psychophysical perceptual learning studies have indicated that there are probably a number of different forms and sites of plasticity operating in parallel (Gilbert et al. 2001; Ahissar & Hochstein 2004; Amitay et al. 2006). A perceptual task relies on a number of distinct cognitive mechanisms. In the case of interval discrimination, in addition to a means to measure time per se, it is also necessary to temporarily store the standard interval, compare the measured intervals and make a decision based on this comparison. While temporal perceptual learning may indeed rely primarily on improvement of the temporal component, there is little evidence that it could not be a result of improved memory of the standard interval or in the comparison of both intervals. Indeed, an improvement in either of these mechanisms could explain the interval specificity of learning as well as the spatial generalization. Additionally, it is important to emphasize that while the SDN model directly addresses the potential timing mechanisms, it does not make any strong predictions regarding the mechanisms of temporal perceptual learning.
Independent of the mechanisms of temporal perceptual learning, a critical question common to all local models of temporal processing, including the SDN model, remains: how are intervals on different channels compared? Specifically, if we assume that temporal computations occur in local cortical networks, how do we compare the interval at one frequency with that from another frequency or modality? The population response ‘signature’ to a 100 ms interval in the auditory and somatosensory cortices should be entirely unrelated. This is a fundamental problem, but not unique to timing; it is a restatement of the problem of how the brain performs invariant pattern recognition (Olshausen et al. 1995; Buonomano & Merzenich 1999; DiCarlo & Cox 2007). How do we know that the letter ‘A’ in the left hemifield corresponds to the same symbol when it is flashed to the right hemifield? Or similarly, how do we know that the same word spoken in a low or high-pitched voice is the same word? In both cases, the set of primary cortical neurons activated by both stimuli is non-overlapping. Although the mechanisms underlying invariant pattern recognition remain unknown, a number of proposed solutions require experience-dependent mapping of different sensory representations to a common higher order representation. It seems inevitable that all local models of temporal processing will have to rely on some similar mapping, which would allow intervals on different sensory channels to be mapped to a shared representation. A related possibility is that generalization across different frequencies or modalities could occur despite the fact that timing per se is occurring in different networks because the code could be the same. A simple example of a SDN model of this kind would be a ‘suppression code’. Specifically, it is well established that the neural and population response to the second of a pair of tones can be suppressed (forward masked) by the first, and the magnitude of this suppression is time dependent (Brosch & Schreiner 1997; Rennaker et al. 2007). Thus, the magnitude of the response could encode time, such a code could be considered a type of an energy model of timing, and could potentially be universally read out by downstream neurons.
(d) Nonlinear metrics of time
The strong prediction of the SDN model is that there is no linear metric of time. This means that the population code for a 100 ms interval is not inherently related in any linear fashion to the population code for a 200 ms interval—by contrast, in a clock model, if 100 ticks corresponds to 100 ms it can be immediately established that 200 ticks corresponds to 200 ms. However, it is important to note that SDN does not imply that the appropriate mapping of the network response to a linear metric cannot be learned through experience. Indeed, we interpret the fact that subjects improved in the ShortISI-SameFr as evidence of this (recall that in this condition the ISI varied between approx. 190 and 310 ms). Clearly, we learn to identify the same intervals in a multitude of different temporal contexts. For example, anyone fluent in Morse code must learn to identify whether the duration of a tone was short or long in the context of an extremely complex and rapid sequence of previous tones. Morse code and language are, of course, complex tasks, requiring years to learn, and some of this learning may be devoted to establishing that the same stimulus can produce different neural population codes depending on the temporal context. The interference between successive stimuli would be lessened by decreasing the presentation rate of the stimuli—which may be related to why the initial stages of Morse code and language learning are facilitated by slow rates.
It is clear the SDN and other models of temporal processing are not sufficient to explain all facets of temporal processing, particularly regarding the mechanisms underlying temporal perceptual learning. As we develop more elaborate models and theories of temporal processing, it will be important to distinguish between task components that reflect true temporal processing and those that correspond to more general cognitive components shared by non-temporal perceptual tasks, such as the buffering and comparison of stimulus features, and invariant forms of pattern recognition. Additionally, while our current focus remains on simple temporal tasks, such as interval and duration tasks, it is ultimately necessary that the same models account for complex forms of temporal processing, such as temporal sequences or Morse code. The SDN has this potential, but predicts that previous stimuli can interfere with the encoding of subsequent temporal features. This is both an inherent strength and weakness of the model. A strength because it naturally encodes complex temporal patterns as well as simple intervals (Buonomano 2000); a weakness because by encoding every object in the context of the previous, it becomes challenging to identify specific temporal objects embedded in a stream of stimuli (Knüsel et al. 2004).
This research was supported by the NIMH.
One contribution of 14 to a Theme Issue ‘The experience of time: neural mechanisms and the interplay of emotion, cognition and embodiment’.
- © 2009 The Royal Society