## Abstract

Number sense, a spontaneous ability to process approximate numbers, has been documented in human adults, infants and newborns, and many other animals. Species as distant as monkeys and crows exhibit very similar neurons tuned to specific numerosities. How number sense can emerge in the absence of learning or fine tuning is currently unknown. We introduce a random-matrix theory of self-organized neural states where numbers are coded by vectors of activation across multiple units, and where the vector codes for successive integers are obtained through multiplication by a fixed but random matrix. This cortical implementation of the ‘von Mises' algorithm explains many otherwise disconnected observations ranging from neural tuning curves in monkeys to looking times in neonates and cortical numerotopy in adults. The theory clarifies the origin of Weber–Fechner's Law and yields a novel and empirically validated prediction of multi-peak number neurons. Random matrices constitute a novel mechanism for the emergence of brain states coding for quantity.

This article is part of a discussion meeting issue ‘The origins of numerical abilities’.

## 1. Introduction

What is the origin of our ability to represent numbers? The evidence is now overwhelming for an approximate number system [1] shared between several species and relying on number neurons distributed in parietal and prefrontal cortex (PFC) [2]. Number neurons are cells whose firing varies systematically with the number of objects or events, typically with coarse tuning around a cell-specific preferred number, independently of low-level properties (e.g. size, spacing, intensity). Yet their tuning properties are debated [3], and the precise circuit that allows them to exhibit numerical tuning is still unknown.

An outstanding theoretical question about number sense concerns its presence in humans and other animals prior to learning. Number neurons have been found in untrained animals [4] and numerical discrimination is present in human neonates [5]. Such findings suggest that self-organizing properties of neural circuits, early on during development, are responsible for the emergence of number-coding neurons. Furthermore, the number sense possesses very specific empirical properties, labelled here as D1–7, that any convincing theory should confront: (D1) The number sense obeys Weber–Fechner's Law: our ability to discriminate between numbers varies as the log of their ratios [6]. (D2) Number neurons are tuned to a broad range of preferred numerosities up to 30 items [7], including quantity zero [8–10], with log-Gaussian tuning curves [7,11], and are found in monkeys even in the absence of numerical training [4]. (D3) Neurons tuned to low and high numbers are more frequent than neurons tuned to the middle of the tested range [7,11]. (D4) Monkeys trained to order small numerosities spontaneously generalize to larger numerosities [12,13]. (D5) Neonates, hours after birth, can discriminate numbers that are in a sufficiently large ratio [5]. (D6) In the course of development, number sense becomes increasingly precise [14,15]. (D7) In human parietal cortex, voxels selective for similar numbers are more likely to be contiguous, forming a macroscopic cortical map (‘numerotopy’ [16]).

Previous computational models based on number detector units have mostly focused on properties D1–3, but have typically failed to address the broader issue of the emergence of these properties without training. In order to obtain operational number detectors, these models relied either on fine-tuned hand-wired circuitry [17,18] or on learning through exposure to many numerical sets [19,20]. An exception is Miller [21], which reported evidence for D1 and D2 in a randomly connected network of spiking neurons subject to short-term plasticity. Indeed, random networks are known to mimic the selectivity and sparsity of neural activations in particular regimes [22].

Here, we present a random-matrix theory of number sense that approaches numerosity at the level of vector states. We reasoned that, if each number is represented by a vector of activity across a large number of neurons, passing from one number to the next could be achieved by matrix multiplication (see [23] for a related approach in computational linguistics). We here consider the simplest possibility, which is that this successor matrix is random. We therefore envisaged that the approximate number system could be built on the powers of a certain type of random matrix: starting from some initial vector, the vectors coding for successive numbers would be generated by successive multiplications by the same fixed matrix. This operation would constitute an analogue, within the approximate number system, of the successor function ‘+1’, a foundational concept of Peano arithmetic. We show how all of the above properties D1–7 inevitably follow from this simple theory. We implement the theory in two different models with different degrees of biological realism, which both possess very little initial structure and draw no statistics at all from the environment. From a computational neuroscience perspective, each model is a recurrent network with a specific form of random but topographic connectivity, thus causing an initially localized pattern of activity to progressively radiate across neural space, and thereby engendering a series of distributed vector codes for successive numbers.

## 2. Results

### (a) The minimal number sense model

We start by presenting the minimal implementation of the theory: the minimal number sense model, illustrated in figure 1. The model is entirely described by an *n* × *n* connectivity matrix *M*, an initial activation vector *S*_{0} of dimension *n* and discrete time dynamics for updating unit activations. *M* represents a random pattern of connectivity between number neurons which is stable for the timescale of an experiment. Each entry *M _{ij}* is a Gaussian random variable whose amplitude drops exponentially with the distance between units. The adjacency matrix of the minimal model is a well-studied object in mathematics—a random band matrix [24]

*—*which has been used to describe spatially constrained interactions in several physical systems (see [25] for an exposition). In our case, such a random band matrix is intended to capture both the sparsity of functional connectivity in the nervous system and its exponential decrease with the distance between neocortical neurons [26].

*S*

_{0}represents the initial firing rates of number neurons in the absence of any stimulus.

*S*

_{0}is localized: its non-zero components are clustered together on the left side of the line (although even this assumption can be relaxed: see electronic supplementary material, Note S1).

Our abstract model does not process external stimuli. Rather, what we strive for in this article is to introduce an algorithm that produces a certain sequence of brain activity states (*S _{k}*)

_{k}_{=0…n}appropriate to represent integers inasmuch as they are discrete and linked by a successor operation, as defined in Peano arithmetic, and to offer a close examination of the properties of those neural states. Given any state

*S*, the next number state

_{k}*S*

_{k}_{+1}is obtained by multiplying

*S*by

_{k}*M*, adding a small Gaussian random noise term

*ɛ*, before rectifying activations above zero and normalizing (figure 1

*d*). In this manner, different numbers are assigned distinct vector codes, while the successor function is implemented by matrix multiplication. We remain agnostic as to the exact neural mechanism that triggers matrix multiplication, though a global gating system acting on all number neurons would appear mandatory. We note that in the cortex, rectification could be the result of competition among neurons or of firing thresholds at the cell level [27], while normalization has been described as a canonical neural computation [28]. Critically, aside from the noise and the rectification enforced onto the system, this set-up is nothing else than the Power method, also known as ‘von Mises iteration’ [29,30], a well-known algorithm that identifies the eigenvector with largest eigenvalue of a matrix by iterative matrix multiplication from an arbitrary vector state.

Figure 2 compares our simulation results with the electrophysiological data on number neurons [7,11]. Although we start with a simple clustered vector and update it with a random matrix, units tuned to number spontaneously emerge: simulations show that, for every number *i*, including zero and up to 30, one can find units whose activity peaks at iteration *i* (see electronic supplementary material, Note S2). Furthermore, the average normalized activation of all model units that prefer a given number *i* parallels the log-Gaussian tuning curves reported in actual neuronal recordings [7]. On a linear scale, tuning curves overlap and show increasing skews, as reflected by increased responses to the numerosities immediately next to the peak of the curve, as numerosity increases. Once plotted on a log scale, however, tuning curves become symmetrical and of similar width (D2). These properties reflect the fact that under weak conditions on *M* and *S*_{0}, the von Mises iteration converges geometrically (see electronic supplementary material, Note S9). Furthermore, if units are binned according to their preferred number (figure 2*c*), a U-shaped distribution emerges (D3), with more units selective to the lowest and highest tested numerosities, in agreement with the experimental data (figure 2*d*).

Next, we ask whether the model is capable of assigning distinct vector states to larger numbers. Without changing any parameters, we produce 30 consecutive number states by successive application of matrix *M* to the initial vector *S*_{0}. We then compare these theoretical vectors with the empirical vectors of firing rates of PFC number neurons recorded in monkeys presented with numerosities 1–30 [7]. Figure 2*e,f* presents the normalized activities of model units and of PFC neurons, ordered by decreasing preferred number. In both cases, the sigmoidal white crests of maximal unit responses confirm the underlying U-shape distribution of number preferences previously observed for numbers 1–5. The increasing bandwidth of tuning curves is also reflected by the widening yellow regions around the white crest as numerosity increases. Interestingly, disconnected ‘islands’ of high activity appear in some rows, suggesting that some neurons and units may be sensitive to multiple numbers.

Figure 2*g,h* shows planar projections of number states in the model and in the data, obtained by multi-dimensional scaling. This procedure yields trajectories in state space, or multi-dimensional ‘number lines’. We observe that number lines in the data and in the model share three properties: they are compressed, with a diminishing portion of the curve devoted to increasing numbers; they are bounded, with each trajectory converging slowly to an attractor state; and they are sinuous, showing oscillations particularly for high numbers. Such oscillations are known to arise in the convergence of the von Mises iteration when the two dominant eigenvalues of the matrix are complex conjugates [31].

Figure 2*e,f* also shows that the Weber–Fechner Law scales up to numbers 1–30: the discriminability of number states, though higher in the model than in the data, is in each case to a good approximation proportional to the log of the number ratio (D1). Thus, our random-matrix algorithm captures the previously reported average and large-scale properties of the number sense: the average unit represents a log-Gaussian tuned number neuron, and discriminability is determined by log ratio.

We now describe how this minimal model accounts for additional behavioural data from adult monkeys, human neonates and children of different ages*.*

To capture ordinal knowledge in trained monkeys [12], we trained linear classifiers (Support Vector Machines, SVMs) to identify the larger number among all 12 possible distinct pairs of vector states between 1 and 4, and then tested them on the 72 number pairs between 1 and 9, thus evaluating their capacity to generalize to numbers outside of the original training range (see electronic supplementary material, Note S5). Figure 3*a* shows simulated accuracies averaged over all classifiers, for each possible number pair. Matching the empirically observed performance of monkeys (figure 3*b*), the classifiers' performance exhibits a distance effect and is above chance in comparing pairs of numbers that both fall outside of the training range 1–4 (novel/novel pairs, property D4). It may seem counterintuitive that an arbitrary vector, iteratively updated by a random matrix, produces a non-random sequence of states that contains generalizable information about numerical order. However, the presence of a spontaneous order in vector space arises naturally from the slow convergence of the ‘von Mises’ algorithm to the first eigenvector (i.e. with largest eigenvalue) of matrix *M*, and is attested by multi-dimensional scaling (figure 2*g,h*).

Infant recovery-from-adaptation paradigms [5] can be simulated by training a one-class support vector machine on noisy vector states obtained from different runs of the same model for the same numerosity (see electronic supplementary material, Note S6). When tested with a new vector state that either matches or differs from the familiar numerosity, emulating the procedure in [5], the classifier rejects novel numerosities more than habitual ones, and this effect reproduces infant looking times (figure 3*c*). Critically, this response to numerical novelty is larger when the habituation and deviant numerosities differ by a 3 : 1 ratio than when they differ by a 2 : 1 ratio, consistent with Weber's Law and in agreement with empirical data in neonates (figure 3*d*, D1, D5).

Using the same logic, we simulated the improved number discrimination that accompanies human development and education (D6) [14,15] (see electronic supplementary material, Note S7)*.* According to our theory, vector representations for numbers are innate and stable. Why, then, does the behavioural Weber fraction decrease with age and education? We attribute this evolution to a progressive refinement in the way numerical information is extracted from the neural population activity. This postulate fits with direct electrophysiological evidence that parietal number neurons are already present in untrained monkeys, in proportions that remain unchanged by training, and that only the proportion of prefrontal number neurons is enhanced when monkeys are trained in a numerical match-to-sample task [32]. We therefore simulated the development of numerical comparison abilities as the training of a classifier to decide which of two numbers is larger. When performance is assessed early on during training, classifiers initially display poor discrimination, with performance varying as a shallow function of the log ratio of the numbers involved, mimicking young children's data. With additional training, discrimination is increasingly better fitted by a steep sigmoidal, reflecting sharper number judgements and a shift in the apparent Weber fraction (figure 3*e,f*, D6).

We note that, if the same population of neurons also contained overlapping codes for non-numerical parameters such as physical size [33–35], our hypothesis that development consists in a progressive focusing of decision-making on the relevant numerical dimension could provide a natural explanation for why children initially confuse number with other physical dimensions (e.g. Piaget's classical ‘number-conversation errors’), and why such non-numerical interference decreases in the course of development [36]. We did not explicitly simulate this aspect of our theory here, however, because this would require additional assumption about the coding of those non-numerical dimensions.

### (b) The extended number sense model

The above model is deliberately abstract and lives in a space of dimension 1. In a second implementation of our theory, we aimed to study whether the same properties would hold in a more realistic model of neuronal dynamics, including a network of excitatory and inhibitory units embedded within a cortical plane, with sparse and local connectivity (figure 4). This extended model follows Dale's principle [37]: units can either be excitatory or inhibitory, but not both (see [38] for a similar application of Dale's principle to random networks). As observed in the number system, excitatory units in the extended model outnumber inhibitory units by 4 to 1 [2]. The absolute value for each entry *M _{ij}* is still given by a Gaussian random variable whose amplitude drops exponentially with the distance between units. We also introduce a more realistic sparsity constraint on all network connections. We now show that this model retains all the properties of the minimal model, and also accounts for several additional phenomena.

Figure 5 assesses the behaviour of the extended model on the same properties as figure 2. It shows that number states in the extended model continue to display the same properties as those of number neurons. Specifically, the tuning curves overlap and exhibit skews that increase with numerosity (D2, figure 5*a*), more units are selective for the first and last number in the tested range (D3, figure 5*c,e*), number states still appear to converge exponentially to an attractor (figure 5*g*) and the Weber–Fechner Law is conserved (D1, figure 5*i*). In addition, the distinction that we now make between excitatory and inhibitory units captures more detailed properties of number neurons. Figure 5*k* shows that model responses are similar for nearby pairs of excitatory units, but complementary for pairs of excitatory and inhibitory units (see electronic supplementary material, Note S4). This interaction closely mirrors the empirical relationship of tuning curves for adjacent number neurons in the PFC (figure 5*l*), as reported for broad and narrow spiking cells (corresponding to putative excitatory and inhibitory cells) recorded from the same electrode [2,39]. This finding suggests that even detailed properties of the micro-circuitry of the approximate number system can be captured, in first approximation, by our simple model.

Figure 6 likewise evaluates the extended model on the same number sense properties as figure 3, and shows that it can still account for properties D4 (figure 6*a*), D5 (figure 6*c*) and D6 (figure 6*e*). Importantly, our two-dimensional (2D) model now enables the investigation of mesoscale aspects of the number sense as detected by high-field imaging in human adults. We thus ask whether the kind of spontaneous ordering of number states shown by the model could account for the emergence of ‘numerotopy’ [16], i.e. the similar location of voxels selective for similar numerosities (D7). When the initial vector *S*_{0} is clustered on the left side of the grid, the preferred number diffuses locally from that initial cluster (figure 6*e*), mirroring the medial-to-lateral gradient of selectivity observed using functional magnetic resonance imaging in human subjects (figure 6*f*). On average, the location of a unit along the medial-to-lateral axis correlates with its number preference (D7, *r*_{Pearson} = 0.95, *p*_{Pearson}0.001; excluding initial state selective units: *r*_{Pearson} = 0.86, *p*_{Pearson}0.001), as observed experimentally (figure 6*g,h*). According to our model, however, such a medial-to-lateral direction is not universal but contingent upon the location of the units that are activated in the initial state *S*_{0}: different locations would produce different gradients, depending on the diffusion opportunities offered by the source. In fact, an initial vector *S*_{0}, containing multiple randomly placed clusters of units, results in multiple coexisting gradients that run in opposite directions on different parts of the simulated cortex, as is also manifest in the data [16] (figure 6*i,j*). Similarly, our simulations show that the convexity or concavity of the gradient is not universal, but depends on the spread of the initial state cluster. Overall, the diffusion process by which we explain numerotopy is not unrelated to the self-organizing Turing reaction–diffusion model [40], although in our case, the large-scale topography of number regions only obtains after smoothing of the selectivity map by a Gaussian filter. We thus predict that higher-resolution imaging should uncover a more intertwined topography of number preferences (see electronic supplementary material, Note S8). The fact that voxel-level properties such as numerotopy can be accounted for by exactly the same model as single-cell effects would also suggest that the approximate number system is self-similar across scales.

### (c) Drawing quantitative predictions from the theory: two critical tests

We now offer two quantitative predictions that set our theory distinctly apart from other alternatives. First, our theory predicts that individual units can respond to more than one numerosity (figure 2*e*), suggesting that the log-normal model more often captures an average firing rate profile over many neurons with the same preferred numerosity. To test this prediction and quantify the proportion of ‘multi-peak’ tuning curves in real data, we conducted a *post hoc* analysis on all neurons previously identified as number selective in the sample period of [7] (see electronic supplementary material, Note S3). Given a neuron with preferred numerosity *n*, the general logic of this analysis was to grant another peak at numerosity *n′* whenever the firing rate of the cell did not differ significantly in response to *n* and *n′*, but was significantly reduced for numerosities between *n* and *n′*.

We discovered that 10.6% of number neurons exhibit unambiguous tuning to multiple numbers, against 4.6% units in the minimal model and 3.4% in the extended model (see examples in figure 7). While the proportion of multi-peak neurons is smaller in the model than in data, increasing the noise in number states at each iteration increases the proportion of multi-peak neurons in both our models. Importantly, units with multiple number preferences are a generic behaviour of our random-matrix theory: they appear even in the absence of noise, as a counterpart at the unit level of the oscillatory behaviour previously reported at the population level. The existence of multi-peak neurons is an original prediction of our model, and the fact that actual number neurons conform to this prediction thus provides strong support for the random-matrix theory.

Our second quantitative prediction concerns the fundamental assumption of our theory, that of a random band matrix which implements a successor function. Because this matrix links number codes together, it is legitimate to ask whether its dimension *n* and bandwidth *w* would impact on number sense. In fact, a long-standing conjecture [41] in random-matrix theory states that for random band matrices whose underlying space is of dimension 1, a critical bandwidth *w** exists that marks a sharp phase transition between two regimes: a strongly disordered, ‘insulator’ regime for bands smaller than , where eigenvectors are localized in the sense that most of the norm of the vector is carried by a few elements, and a weakly disordered, ‘metallic’ regime for bands larger than , where eigenvectors are delocalized (i.e. more distributed). Though there is no exact counterpart of this conjecture for the more biologically motivated matrix used in the extended model, some results for dimension 2 random band matrices point to a critical bandwidth that would scale like [42]. We therefore examined whether the models changed behaviour when *w* approaches the critical bandwidth.

Figure 8 shows how Weber–Fechner scores vary as a function of *n* and *w* in the minimal and extended models. Both models exhibit a smooth landscape of Weber–Fechner scores. In the minimal model, global maxima (white crosses) are situated strikingly close to the conjectured critical bandwidths of , (figure 8*a*, dotted curve). In other words, whatever the dimension of the minimal model, the critical bandwidth conjectured in random-matrix theory is also precisely where Weber–Fechner's Law holds with the greatest precision. In the extended model, there is a sharp transition from high to low scores after a critical bandwidth that also scales approximately like the square root of *n*. Note that in the extended model, the precise shape of the relation between critical *w* and *n* depends exquisitely on the structure of the initial state (see electronic supplementary material, Note S11), though a common feature is that of critical bandwidths increasing monotonically with *n*. Thus, in our theory, the Weber–Fechner Law appears to be linked to conjectures on phase transitions in random band matrices.

This prediction sets our theory distinctly apart from those which hold that number representations do not exist in the abstract, but are transient constructs computed from combinations of low-level stimulus properties (e.g. in the case of visual dot displays, total surface area or density [18]): according to these views, one would not expect any simple relation to hold between the composite representations obtained for all consecutive numbers, let alone any link to criticality phenomena in random band matrices. Testing this prediction would require estimation of the properties of matrix *M*, for instance, by computing the cross-correlation matrix between a significant fraction of number neurons as a function of their cortical distance, for different subjects, trials and numbers. High-density 2D arrays of multi-electrodes could be sufficient for the task, as they are known to provide sufficient resolution to distinguish between inhibitory and excitatory cells [26], which would be necessary to test the predictions of the extended model.

## 3. Discussion

We have shown how a simple theory, based on random matrices, reproduces several important aspects of number sense, including some for which learning cannot be assumed. The theory conserves its explanatory power across a range of parameters and implementation choices (see electronic supplementary material, Note S10), but its key ingredients lie in the assumptions that (i) each number is coded by a sparse, rectified and normalized vector; and (ii) the vectors for consecutive numbers are iteratively linked through multiplication by a fixed random band matrix *M*. Number states in our theory are therefore originally sequential: activating *S _{n}* requires running through all the sequence of number states from 0 to

*n*. Not only does this theory provide a better account of number neurons than our previous log-Gaussian approach, particularly in predicting and accounting for the new discovery of multi-peak number neurons, but it also endows number codes with a ‘matrix’ successor function (+1 is implemented by one application of

*M*). In the present view, distinct numbers are not simply indexed by distinct populations of number neurons, but these neurons actually reflect a vector-based system of interrelated symbols linked by a fixed successor operation [43]. Indeed, other basic arithmetic operations such as +2 or −1 could be implemented by simple matrix operations (

*M*

^{2},

*M*

^{−1}). Our theory therefore offers a first glimpse of how uneducated adults, monkeys and even young infants, without training, could be endowed with a sensitivity for non-symbolic arithmetic and its violations [44–46]

*.*

### (a) Numerosity zero and the initial state

Our theory builds on an initial activation pattern, *S*_{0}, whose successors represent the numerosities one, two, etc. A straightforward interpretation of *S*_{0} is therefore that it represents numerosity zero. As previously noted, zero-selective units are consistent with several electrophysiological studies that have reported cells tuned to numerosity zero (i.e. the absence of objects) in the monkey brain ([9–11], for review, see [47]). In these reports, as in our theory, numerosity zero appears to be seamlessly integrated into the approximate number system, for instance, allowing monkeys and young children to infer that zero (absence of objects) is a smaller numerosity than any other set size [48,49]. However, the present model does not fully cover all of the stages of the cultural emergence of a concept of zero [47], which at the highest level involves assigning zero a precise symbolic role in the system of arithmetic (place-holder in Arabic notation, neutral element for addition, etc.).

### (b) The successor function

Our neural implementation of an analogue of Peano's successor function, a fundamental requisite of arithmetic, should not be confused with the successor function familiar to developmental psychologists, i.e. the laborious acquisition, during childhood, of a system of exact number symbols 1, 2, 3 … linked through counting. While our model accounts for how human and non-human primates, as well as other species, can be innately endowed with approximate number vectors, much remains to be understood about how these vectors are linked, on the perceptual side, to sets of objects, and on the abstract side, to symbols such as Arabic numerals or verbal number words. In fact, because the number states in our theory have exponentially vanishing distances, it is doubtful that exactly the same mechanism could be recycled for exact numbers, where the distance between consecutive numbers would presumably need to remain constant.

### (c) Levels of description

Although our theory for the origins of number vectors includes realistic properties such as Dale's principle, sparsity of neural activity patterns and sparse and distance-dependent connectivity, it describes brain activity at an abstract mathematical level similar in spirit to the classical Hopfield model for neural networks [50]. This abstract nature need not be a disadvantage, for it also allows knowledge acquired in the mathematical field of random matrices to shed light onto the neuroscience of number. Specifically, our simulations suggest that the Weber–Fechner Law for numbers arises from an eigenvector algorithm that runs on a population of neurons with random synaptic strengths, and a propensity to connect with neighbours that is set at criticality (figure 8; electronic supplementary material, Note S11).

Among the many steps that one might take towards more realistic models, using spiking units and differentiating the firing rate levels and connectivity ranges for excitatory and inhibitory units would no doubt help capture the micro-circuitry of number neurons. Most importantly, the present proposal leaves unspecified the nature of the gating mechanism that would implement the required step-like application of matrix *M* each time one moves from numerosity *i* to *i* *+* 1. This assumption sets our theory apart from otherwise similar proposals of ‘liquid computing’ or ‘echo-state networks’ where an internal vector representation is allowed to evolve continuously under the continuous application of a fixed and possibly random dynamics [51–53], and which capture several aspects of the neural representation of time [54].

### (d) Sequentiality and subitizing

One challenge facing our theory is to reconcile its inherent sequentiality with the absence of observed delays in neural firing patterns for different numerosities [4,11]. By analogy with the dynamic waves of spontaneous retinal activity that ultimately generate static retinotopic visual maps [55], we suggest that the postulated matrix iterations may occur dynamically during gestation, shaping number sense by generating ordered vector states that are only later being associated in parallel with external stimuli. Given the noise present at each iteration as well as the exponential nature of our theory, vectors for small numerosities are inherently more stable and more distant from each other: the resulting parallel associations for small numbers could thus possibly explain subitizing, one of the few number sense phenomena not addressed in this article. A more detailed description of the way such parallel associations are built would require specifying how our abstract models interact with the computational machinery that processes actual stimuli in the visual and auditory pathways. While the organization of such perceptual processing has been the main focus of previous theoretical models of number sense [17–20], the main virtue of the present theory is to address the problem from the opposite direction: we demonstrate how structured neural codes for number can emerge from sheer randomness through an endogenous mechanism that is minimal enough to be specified genetically. Such a spontaneous emergence, without training or specific sensory interactions, fits with the presence of number neurons in untrained animals [3,4], of parietal-lobe responses to number in human infants [56] and of normal parietal brain networks for number sense even in congenitally blind subjects [57].

### (e) Criticality and the number sense

One might wonder why, in the minimal model, the random band matrix that exhibits the most canonical number sense has a bandwidth set at criticality. We speculate that there may be an advantage in using the smallest possible bandwidth that still results in distributed eigenvectors. Small bandwidths mean fewer synapses to maintain, while distributed eigenvectors imply more distributed number states, which may, for instance, improve generalization performance for learning systems lying downstream. These two opposite pressures could conspire to bring the system near criticality.

### (f) Predictions

Our theory makes several novel predictions. The first, which we verified here for the first time, to our knowledge, through a reanalysis of existing data, is that individual number neurons can exhibit multiple peaks of numerosity preference (figure 7). It mirrors the recent finding of multi-peak time-coding cells in the hippocampus [58], and lends support to the notion that similar iterative processes could be at work in the cognition of time and space [54]. A second prediction is that the cross-correlogram between number neurons should be stable across numbers, and in normal subjects should assume the form of a random band matrix with bandwidth *w* set close to criticality, and away from criticality in subjects with an impaired number sense (figure 8). Third, numerotopy should only be an average property, and imaging with higher fields should reveal distributed number tuning in parietal neurons, as well as complex multi-peak neurons. Fourth, number neurons and numerotopy should already be present in the newborn brain. Given the presence of number sense in many species, our theory hints that the nervous system may have been harnessing the properties of random matrices as a self-organizing representational system for millions of years.

## 4. Methods

Our random-matrix theory of the number sense relies on an iterative process that constructs a sequence of number states of dimension *n*, using an (*n* × *n*) random band matrix *M* as a successor function, and a clustered initial state.

### (a) The minimal model

In the minimal model, each unit *i* = 1, … , *n* is disposed regularly on a line, and the distance between two units is defined as their absolute distance. Matrix *M* constitutes the adjacency matrix of the network defined on this line: each entry *M _{ij}* is a Gaussian random variable, rescaled so as to decrease exponentially with the distance between units

*i*and

*j*: 4.1where

The minimal model has very little structure: in the absence of locality (*α* = 0), we recover a standard Gaussian matrix.

Starting from a deliberately simple initial state *S*_{0}, in which activation is confined to the leftmost part of the line, each new state is obtained by multiplying the current one by *M*, adding a small standard Gaussian noise *ɛ* with amplitude , rectifying, and normalizing:
4.2Where stands for rectification above zero.

Hence, the minimal model has three main parameters: ** n**,

**, . In addition, the initial state requires another parameter**

*α*

*r*_{0}: before normalizing, all units in the initial state are set to 0, except for the leftmost

*r*_{0}

**units which are set to 1. Table 1 gives an exhaustive list of parameters in the minimal model.**

*n*### (b) The extended model

In the extended model, each unit *i* = 1, … , *n* is embedded in a square grid, with integral coordinates (*x _{i}*,

*y*) given by and

_{i}*.*The distance between two units is defined as the L2 norm. The matrix

*M*now constitutes the adjacency matrix of the network defined on this grid. Each entry

*M*is a half-normal random variable, rescaled so as to decrease exponentially with the distance between units

_{ij}*i*and

*j*, and multiplied by a Bernoulli variable:

where

Parameter *p* controls the proportion of inhibitory units, while parameters *ρ* and *α*, respectively, control the density and locality of the system. Notice that inhibitory weights are rescaled so as to ensure that the adjacency matrix always has mean 0, despite the imbalance between excitatory and inhibitory connections (when *p*≠0.5). The dynamics of the extended model are unchanged, and new model states are still iteratively obtained by application of equation (4.2).

The initial state in the extended model is still clustered and stable across trials, but the activation pattern on initially active units is now modelled as a Gaussian 2D bump of activation whose centre varies uniformly across subjects between on the *x*-axis, and [0, *n*] on the *y*-axis. The extended model therefore has six parameters in total: , , , , and , listed in table 2.

### (c) General simulation procedure

In all of our simulations, we distinguish between different subjects (captured by model runs with different initializations of the random matrix *M* and different initial states), different trials (model runs with the same matrix and same initial state) and numerosities (model runs with the same *M* and the same initial state, for a number of iterations). The exact numbers of subjects, trials and numerosities are adapted to the experiment under simulation. Unless otherwise specified, all simulations were run with the default values given in tables 1 and 2.

### (d) Code availability

The full python code used in the simulations is available upon request to the first author, and will be made publicly available on the Unicog website (http://www.unicog.org/biblio/) after publication of the article.

## Data accessibility

The methods and tools used for this article are described in detail in the electronic supplementary material.

## Authors' contributions

T.H. and S.D. conceived the theory; A.N. acquired the electrophysiological data; T.H., A.N., P.V. and S.D. analysed the data; T.H., A.N., P.V. and S.D. wrote and revised the manuscript.

## Competing interests

We declare we have no competing interests.

## Funding

This work was supported by INSERM, CEA, and the Bettencourt-Schueller Foundation. T.H. was supported by the European Commission Seventh Programme (FP7/2007-2013) under grant agreement 604102 (Human Brain Project).

## Acknowledgements

We gratefully acknowledge V. Izard, J. Touboul, E. Eger and G. Dubach for helpful discussions and V. Izard, E. Brannon and B. Harvey for data sharing.

## Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3935788.

One contribution of 19 to a discussion meeting issue ‘The origins of numerical abilities’.

- Accepted October 23, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.