## Abstract

Inferring resting-state connectivity patterns from functional magnetic resonance imaging (fMRI) data is a challenging task for any analytical technique. In this paper, we review a probabilistic independent component analysis (PICA) approach, optimized for the analysis of fMRI data, and discuss the role which this exploratory technique can take in scientific investigations into the structure of these effects. We apply PICA to fMRI data acquired at rest, in order to characterize the spatio-temporal structure of such data, and demonstrate that this is an effective and robust tool for the identification of low-frequency resting-state patterns from data acquired at various different spatial and temporal resolutions. We show that these networks exhibit high spatial consistency across subjects and closely resemble discrete cortical functional networks such as visual cortical areas or sensory–motor cortex.

## 1. Introduction

Functional magnetic resonance imaging (fMRI) has become an important neuroscientific tool for probing neural mechanisms in the human brain. Typical fMRI experiments have focused on the acquisition of T2*-sensitive MR images during periods of increased oxygen consumption (owing to neuronal response to externally controlled experimental conditions) and contrast the measured image intensities with recordings obtained at ‘rest’. Critically, some important quantitative concepts in fMRI analysis, such as the calculation of per cent signal change or the interpretation of de-activation, implicitly hinge on a suitable definition of this baseline/rest signal. The baseline ‘resting-state’ of the brain itself, however, is a somewhat ill-defined and poorly understood concept.

Of particular interest in this context are certain low-frequency fluctuations of the measured cerebral haemodynamics (around 0.01–0.1 Hz) which exhibit complex spatial structure reminiscent of fMRI ‘activation maps’ and which can be identified in fMRI data taken both under rest condition and under external stimulation. Recently, some attention has been focused on the characterization of these maps and the identification of possible origins of slow variations in the measured blood-oxygen level dependent (BOLD) signal. Various researchers have suggested that these signal variations, temporally correlated across the brain, are of neuronal origin and correspond to functional resting-state networks (RSNs) which jointly characterize the neuronal baseline activity of the human brain in the absence of deliberate and/or externally stimulated neuronal activity, and may reflect functionally distinct networks.

Biswal *et al*. (1995) first demonstrated the feasibility of using fMRI to detect such spatially distributed networks within primary motor cortex during resting-state by calculating temporal correlations across the brain with the time-course from a seed voxel whose spatial location was chosen from a prior finger-tapping study. The temporal signal from a seed voxel in the motor cortex was correlated with other motor cortex voxels and uncorrelated with other voxels, with major frequency peaks in the resting correlations at around 0.02 Hz. Lowe *et al*. (1998) found similar results using both single-slice dates with low time of repetition (TR) of 130 ms and whole-head volumes with longer TR (2000 ms), while Xiong *et al*. (1999) describe functional connectivity maps that cover additional non-motor areas. Based also on findings from positron emission tomography (PET) studies, the existence of a *default mode brain network* involving several regions including the posterior cingulate cortex has been proposed (Shulman *et al*. 1997; Mazoyer *et al*. 2001; Raichle *et al*. 2001). Using simultaneously acquired electroencephalogram (EEG) and fMRI data under rest, Goldman *et al*. (2002) have shown that the variation in alpha rhythm in EEG (8–12 Hz) is correlated with the fMRI measurements. In particular, the authors report that increased alpha power was correlated with decreased BOLD signal in multiple regions of occipital, superior temporal, inferior frontal and cingulate cortex, and with increased signal in the thalamus and insula. These results have important implications for interpretation of RSNs, as they suggest a neuronal cause for these fluctuations.

Alternatively, it has been argued that these effects simply reflect vascular processes unrelated to neuronal function, which would make RSNs of less interest to neuroscience (although still of potential clinical interest). Physiological noise in the resting brain and its echo-time (TE) and field-strength dependencies were investigated by Krüger & Glover (2001), who showed that physiological noise demonstrates a field-strength dependency, exceeds the thermal as well as scanner noise at 3T and is increased in grey matter (e.g. Woolrich *et al*. 2001). Various researchers have investigated the relation between low-frequency fluctuations in the measured BOLD signal and other physiological observations. Obrig *et al*. (2000) reviewed and studied low-frequency variations in oxygenation, cerebral blood flow and metabolism, and report significant correlations with similar fluctuations observed by near infrared spectroscopy (NIRS). More recently, Wise *et al*. (2004) have investigated the influence of arterial carbon dioxide fluctuations by using the endtidal level of exhaled carbon dioxide as covariate of interest in a general linear model (GLM) analysis. The most significant changes were concentrated in the occipital, parietal and temporal lobes, as well as in the cingulate cortex, and suggest that vascular processes (unrelated to neuronal function) play a significant role in the generation of such resting-state patterns.

Estimating the temporal and spatial characteristics of these low-frequency fluctuations from fMRI data presents a formidable challenge to analytical techniques. In the majority of existing studies, resting patterns are inferred by a correlation analysis of the voxel-wise fMRI recordings against a reference time-course obtained from secondary recordings (e.g. from EEG, NIRS or physiologic measurements such as the carbon dioxide concentration), or simply by regressing against a single voxel's time-course from resting data, which is believed to be of functional relevance (seed-voxel-based correlation analysis). These techniques fundamentally test very specific hypotheses about the temporal structure of these effects. Recently, however, independent component analysis (ICA) has successfully been applied to the estimation of certain low-frequency patterns (Goldman & Cohen 2003; Kiviniemi *et al*. 2003; Greicius *et al*. 2004). An important benefit of such exploratory techniques over more hypothesis-based techniques is the ability to identify various types of signal fluctuations by virtue of their spatial and/or temporal characteristics without the need to specify an explicit temporal model. Such flexibility in data modelling is essential in cases where the effects of interest are not well understood and cannot be predicted accurately.

This paper is organized as follows: in §2 we review a probabilistic approach to independent component analysis (PICA) specifically optimized for the analysis of fMRI data (Beckmann & Smith 2004). Section 3 discusses the constraints of this exploratory data analysis technique when used for the identification of large-scale noise fluctuations. In particular, we demonstrate that optimization for maximally independent spatial sources does not imply an inability to estimate largely overlapping spatial maps. We demonstrate the ability of PICA to extract resting fluctuations and apply the technique to fMRI resting data in order to test a set of important hypotheses about the structure of resting-state connectivity in the human brain. In particular, we will investigate (i) if and how estimated source processes are driven by less-interesting physiological effects such as the cardiac or respiratory cycle, (ii) the spatial characteristics of estimated maps in terms of locality within grey matter and (iii) the consistency of maps obtained from multiple subjects.

## 2. Decomposing fMRI data using ICA

Independent component analysis (Comon 1994; Bell & Sejnowski 1995; McKeown *et al*. 1998) is a technique which decomposes a two-dimensional (time×voxels) data matrix1 into a set of time-courses and associated spatial maps, which jointly describe the temporal and spatial characteristics of underlying hidden signals (components). A probabilistic ICA model extends this by assuming that the *p*-dimensional vectors of observations (time-series in the case of fMRI data) are generated from a set of *q*(<*p*) statistically independent non-Gaussian sources (spatial maps) via a linear and instantaneous ‘mixing’ process corrupted by additive Gaussian noise, * η*(

*t*)(2.1)Here,

**x**_{i}denotes the individual measurements2 at voxel location

*i*,

**s**_{i}denotes the non-Gaussian source signals contained in the data and

**η**_{i}denotes Gaussian noise3 .

The *p*×*q* dimensional mixing matrix * A* is assumed to be non-degenerate, i.e. of rank

*q*. Solving the blind separation problem requires finding a linear ‘unmixing’ matrix

*of dimension*

**W***q*×

*p*such thatis a good approximation to the true source signals

*.*

**s**The PICA model is similar to the standard GLM with the difference that, unlike the design matrix in the GLM, the mixing matrix * A* is no longer pre-specified prior to model fitting but will be estimated from the data. The spatial source signals correspond to parameter estimate images in the GLM, with the additional constraint of being statistically independent of each other.

### (a) Parameter estimation

Without loss of generality we can assume that the source signals have unit variance. If the noise covariance **Σ**_{i} is known, we can pre-whiten the data and obtain a new representation , where , i.e. where the noise covariance is isotropic at every voxel location. To simplify notation, we will henceforth assume isotropic noise and drop the additional bar.

Noise and signal are uncorrelated, so the data covariance matrix , i.e. the unknown mixing matrix * A* can be estimated as the matrix square root of

**R**_{x}−

*σ*

^{2}

*: let*

**I***be a*

**X***p*×

*N*matrix containing all

*N*different fMRI time-series in its columns and let be the singular value decomposition of

*. Then(2.2)where*

**X**

**U**_{q}and

**Λ**

_{q}contain the first

*q*eigenvectors and eigenvalues. The matrix

*denotes a*

**Q***q*×

*q*orthogonal rotation matrix, i.e. a matrix with

**QQ**

^{t}=

*. This matrix is not directly identifiable from the data covariance matrix since*

**I**

**R**_{x}is invariant under post-multiplication of

*by any orthogonal rotation given that .*

**A**Estimating the mixing matrix * A*, however, reduces to identifying the square matrix

*after whitening the data with respect to the noise covariance*

**Q****Σ**

_{i}and projecting the temporally whitened observations onto the space spanned by the

*q*eigenvectors of

**R**_{x}with largest eigenvalues. The maximum likelihood estimates of sources and

*σ*are obtained using generalized least-squares(2.3)

Solving the model in the case of an *unknown* noise covariance can be achieved by iterating estimates of the mixing matrix and the sources and re-estimating the noise covariances from the residuals . The form of **Σ**_{i} typically is constrained by a suitable parametrization; here we use the common approaches to fMRI noise modelling (Bullmore *et al*. 1996; Woolrich *et al*. 2001), and restrict the structure to autoregressive noise. However, since the exploratory approach allows modelling of various sources of variability, for example, temporally consistent physiological noise, as part of the signal in equation (2.1), the noise model itself can actually be quite simplistic.

A consequence of the isotropic noise model is that as an initial pre-processing step we will modify the original data time-courses to be normalized to zero-mean and unit variance. This pre-conditions the data under the null hypothesis of no signal: the data matrix * X* is identical (up to second order statistics) to a simple set of realizations from a (0,

*) noise process. Any signal will have to reveal itself via its deviation from Gaussianity.*

**I**The maximum likelihood estimators depend on knowledge of the number of underlying sources *q*. In the noise-free case, this quantity can easily be deduced from the rank of the covariance of the observations **R**_{x}, which is of rank *q*. In the presence of isotropic noise, however, the covariance matrix will be of full rank where the additional noise has the effect of raising the eigenvalues of the covariance matrix by *σ*^{2} (Roberts & Everson 2001). Inferring the number of estimable source processes amounts to testing for sphericity of eigenspaces beyond a given threshold level (Beckmann & Smith 2004). Simplistic criteria like the reconstruction error or predictive likelihood will naturally predict that the accuracy steadily increases with increased dimensionality. Thus, criteria, such as retaining 99.9% of the variability, result in arbitrary threshold levels (Beckmann *et al*. 2001). This problem is intensified by the fact that the data covariance **R**_{x} is being estimated by the sample covariance matrix. In the absence of any source signals, the eigenspectrum of this sample covariance matrix is not identical to *σ*^{2}**I**_{p}, but instead distributed skewed around the true noise covariance: the eigenspectrum will depict an apparent difference in the significance of individual directions within the noise (Everson & Roberts 2000), even in the absence of signal. In the case of Gaussian noise, however, this ‘skew’ of the eigenspectrum is of analytic form: the eigenvalues have a Wishart distribution and we can adjust the observed eigenspectrum by the quantiles of the predicted cumulative distribution of eigenvalues from Gaussian noise (Johnstone 2000), prior to estimating the model order. If we assume that the source distributions *p*(* s*) are Gaussian, the model then reduces to probabilistic principal component analysis (PCA) (Tipping & Bishop 1999) and we can use Bayesian model selection criteria. Within the PICA approach, we use the Laplace approximation to the posterior distribution of the model evidence that can be calculated efficiently from the adjusted eigenspectrum (Minka 2000; Beckmann & Smith 2004).

In order to complete the estimation of the mixing matrix and the sources, we need to optimize an orthogonal rotation matrix * Q* in the space of whitened observations:(2.4)where denotes the spatially whitened data.

Hyvärinen & Oja (1997) have presented an elegant fixed-point algorithm that uses approximations to negentropy in order to optimize for non-Gaussian source distributions, and gives a clear account of the relation between this approach to statistical independence. In brief, the individual sources are obtained by projecting the data * x* onto the individual rows of

*, i.e. the*

**Q***r*th source is estimated aswhere denotes the

*r*th row of

*. In order to optimize for non-Gaussian source estimates, Hyvärinen & Oja (1997) propose the following contrast function:(2.5)where*

**Q***ν*denotes a standardized Gaussian variable, denotes the expectation and

*F*is a general non-quadratic function that combines the high-order moments of

**s**_{r}in order to estimate the amount of non-Gaussianity in the individual sources.

From equation (2.5), the vectors are optimized to maximize using an approximative Newton method (Hyvärinen & Oja 1997).

### (b) Inference

After estimating the mixing matrix , the source estimates are calculated by projecting each voxel's time-course onto the time-courses contained in the columns of the unmixing matrix . In the case where the model order *q* was estimated correctly, the estimated noise is a linear projection of the true noise and is unconfounded by residual signal. At every voxel location we have pre-conditioned the data such that **x**_{i} has unit standard deviation and the estimate of the noise variance at each voxel location will approximately equal the true variance of the noise. We can thus convert the individual spatial IC maps **s**_{r.} into ‘*Z*-statistic maps’ **z**_{r.}, by dividing the raw IC maps by the standard error of the residual noise.

In order to assess the *Z*-maps for ‘significantly activated’ voxels, we employ mixture modelling of the probability density of the *Z*-statistic spatial maps.

From equation (2.3), it follows that , i.e. the noise term in equation (2.1) manifests itself as additive Gaussian noise in the estimated sources. We therefore model the distribution of the spatial intensity values of each *Z*-map by a mixture of one Gaussian and two Gamma distributions, to model background noise and positive and negative BOLD effects (Hartvig & Jensen 2000; Beckmann *et al*. 2003). The mixture is fitted using an expectation-maximization algorithm (Dempster *et al*. 1977). In cases where the number of ‘active’ voxels is very small, the relative proportions of the Gamma densities in the overall mixture distribution might be estimated as zero. In this case, a simple transformation to spatial *Z*-scores and subsequent thresholding is appropriate, i.e. reverting to null-hypothesis testing instead of the otherwise preferable alternative-hypothesis testing. Otherwise, we can evaluate the fitted mixture model to calculate the posterior probability of ‘activation’ as the ratio of the probability of intensity value under the ‘noise’ Gaussian relative to the sum of probabilities of the value under the ‘activation’ Gamma densities.4

Any threshold level, although arbitrary, directly relates to the loss function we like to associate with the estimation process, e.g. a threshold level of 0.5 places an equal loss on false positives and false negatives (Hartvig & Jensen 2000).

### (c) PICA algorithm overview

The individual steps that constitute the PICA are illustrated in figure 1. The de-meaned original data are first normalized to unit variance at each voxel location. If appropriate spatial information is available, this is encoded in the estimation of the sample covariance matrix **R**_{x}. Individual voxel weights, e.g. grey-matter segmentation, can be used to calculate a weighted covariance matrix, while voxel-pair weightings can be used to calculate the within-group covariance (Beckmann & Smith 2004). Probabilistic PCA is used to infer upon the unknown number of sources and results in an estimate of the noise and a set of spatially whitened observations. We can estimate the noise covariance structure **Σ**_{i} from the residuals in order to voxel-wise (temporally) pre-whiten and re-normalize the data and iterate the entire cycle. Estimation of **Σ**_{i} from residuals in the case of autocorrelated noise can be achieved as described by Woolrich *et al*. (2001). In practice, the results do not suggest a strong dependency on the form of **Σ**_{i}, and preliminary results suggest that it is sufficient to iterate these steps only once. From the spatially whitened observations, the individual component maps are obtained using a modified fixed-point iteration scheme (FastICA; Hyvärinen & Oja 1997) to optimize for non-Gaussian source estimates via maximizing the negentropy. These maps are separately transformed to *Z*-scores. In contrast to raw IC estimates, which only encode the estimated signal, these *Z*-score maps depend on the amount of variability explained by the entire decomposition at each voxel location relative to the residual noise similar to statistical parametric maps from a GLM analysis. This is an important aspect of the probabilistic ICA model, as now these maps also reflect the degree to which the signal explained within this model fits to the data and, unlike standard ICA, no longer ignores the signal variation which remains unaccounted for. Finally, Gaussian/Gamma mixture models are fitted to the individual *Z*-maps in order to infer voxel locations that are significantly modulated by the associated time-course.

## 3. Estimating overlapping maps using ICA

The choice of optimizing for independence between spatial maps could equally well be replaced by optimizing for independence between time-courses. Different authors have argued in favour of one or the other technique, where the main objection appears to revolve around the question of whether orthogonality (i.e. uncorrelatedness) between estimated sources should be enforced in the temporal or spatial domain (Friston 1998; Petersen *et al*. 2000). At a conceptual level, the notion of orthogonality is overly restrictive in either domain: for temporal modes, the existence of stimulus correlated effects (e.g. motion artefacts or higher-order brain function) means that enforced orthogonality necessarily results in a misrepresentation of underlying temporal signals. Similarly, for spatial modes, Friston (1998) has argued that even though different brain function might be spatially localized, the principle of ‘functional integration’ might imply that neuronal processes share a large proportion of cortical anatomy. These arguments suggest that independence and implied orthogonality are always suboptimal for the analysis of data which is as complicated as that obtained from functional MRI experiments.

From a signal detection point of view, however, it is important to consider the extent to which signal ‘appears’ in space or time. Within the temporal domain, signal often spans the entire length of an experiment. If the ‘true’ temporal characteristics of different signals are partially correlated (e.g. stimulus-correlated motion), a decomposition which enforces orthogonality in the temporal domain will necessarily misrepresent at least one of the time-series in order to satisfy the constraint. In the spatial domain, however, ‘signals’ in fMRI are sparse and typically are contained in a small proportion of all voxels. Even for what in fMRI are considered ‘large’ activation clusters or for artefactual sources with large spatial extent (e.g. image ghosts), only a fraction of intracranial voxels are involved.5 In the presence of noise, the majority of voxels in any spatial maps have random ‘background noise’ value and will reduce the observed spatial correlation, such that even when ‘true’ spatial maps are significantly overlapping, a decomposition which enforces orthogonality between estimated spatial maps can still give a relatively accurate representation of the signal.

Formally, consider the case of two source signals *s*_{1} and *s*_{2}, represented as column vectors of length *N*, and (zero-mean) Gaussian noise *η*_{1} and *η*_{2} with variance and . In the presence of noise, the correlation changes fromto

When signals are sparse, Var(*s*_{1}) and Var(*s*_{2}) are small and the denominator of is dominated by the noise variance. The reduction in correlation between the noise-free and noisy case is a function of the signal amplitude modulation, the sparseness and the relative noise level.

As an example, figure 2*a* shows two partially overlapping spatial ‘signals’ each occupying approximately 17% of the total image areas, together with two artificial time-courses. Owing to their partial overlap, these source signals are spatially correlated with a correlation of *ρ*∼0.5. In the absence of noise, these maps cannot be estimated accurately by any technique enforcing orthogonality between estimated spatial maps. In the presence of noise,6 however, the spatial correlation between linear estimates reduces significantly: figure 2*b* shows the spatial maps obtained from performing linear regression of the data against the ‘true’ time-series. The spatial maps obtained from a PCA decomposition (figure 2*c*) have ∼0 spatial correlation, and fail to identify the ‘true’ spatial maps. Also, the temporal characteristics of the signal are not well represented. By comparison, the estimated spatial maps from an ICA decomposition (figure 2*d*) well represent signal in space and time. Although the spatial sources are clearly visible, the spatial correlation between the estimated spatial maps is still ∼0. This is a consequence of the optimization for maximally non-Gaussian source projections. Final thresholded ICA maps derived from a Gaussian/Gamma mixture model on the noisy maps give a reasonably good spatial representation for the original sources: the estimated thresholded maps (figure 2*e*) have large spatial correlation (*ρ*∼0.47).

This example demonstrates that the mathematical constraint of orthogonality within the set of spatial maps does not necessarily imply that large areas of ‘activation’ which overlap significantly between maps can no longer be extracted. Instead, the amount to which this mathematical constraint restricts the estimation of partly overlapping sources is a function of (i) the overall sparseness of signals and (ii) the signal-to-noise ratio. This suggests that, in practice, the constraints induced by optimizing for independence are less restrictive in the spatial domain than the temporal domain. Although compensating for partial correlation of ‘signal’ by anticorrelating ‘noise’ conceptually is also possible in the temporal domain, the significantly lower number of time-points does not typically provide a sufficient number of ‘background’ time-points that could be utilized to ensure orthogonality while not altering ‘interesting’ portions of the estimated source signals. This property is particularly important for investigating RSNs, because it means that functionally distinct systems can overlap anatomically as long as they have sufficiently distinct time-courses.

## 4. Experimental method

In order to characterize the low-frequency structured noise components in ‘resting’ data, we acquired different datasets to address four specific questions:

To compare seed-voxel correlation techniques with PICA, we collected 200 volumes from a single subject under rest and active finger tapping (30 s on/off block design). Data were acquired on a Phillips NT 1.5T MRI system with a notional 2×2×8 mm resolution, a TR of 3 s and a TE of 40 ms.

To evaluate the extent to which neural effects can be distinguished from non-neural physiological effects such as aliased cardiac or respiratory cycles, we collected resting data with a high temporal resolution (B0=3T, TR=120 ms, TE=30 ms, 3.75×3.75 mm in-plane resolution). The data consist of 2160 slices through a single axial plane chosen to intersect the sensory–motor cortices bilaterally. In addition, we collected 60 volumes under a 30 s on/off bilateral finger tapping paradigm at a typical TR of 3 s. All data were acquired on a Varian-Siemens 3T MRI system.

To determine whether low frequency resting fluctuations appear within grey matter or are instead driven by contributions from larger blood vessels, we collected 300 volumes (12 slices) of resting data at 3T with spatial resolution of 2×2×6 mm and TR=3 s.

Finally, to investigate the spatial consistency of resting-state patterns across subjects, data were collected from 10 subjects during rest. For each, 200 volumes of whole head functional data were acquired at 3T with typical fMRI resolution (3×3×3 mm, TR=3.4 s, TE=40 ms). In addition, a high-resolution T1-weighted reference scan (1×1×1.5 mm resolution) was also acquired for the purpose of anatomical localization.

Subjects were lying supine in the MRI scanner and were instructed to keep their eyes closed and not to fall asleep during functional scanning.

### (a) Analysis methods

The individual datasets were pre-processed before running correlation-based or ICA-based statistical analyses using tools from the FMRIB Software Library (www.fmrib.ox.ac.uk/fsl). Time-series were first realigned to correct for small head movements (Jenkinson *et al*. 2002). Then, non-brain (e.g. scalp and CSF) were removed using an automated brain extraction tool (Smith 2002). Finally, the data were spatially smoothed using a Gaussian kernel (5 mm). After statistical analysis (whether correlation- or ICA-based), the resulting statistical maps were thresholded using histogram mixture modelling as described above.

### (b) Seed-voxel-based correlation analysis versus multivariate PICA regression

Activation-seeded correlation analysis is based on the hypothesis that in resting data, the low-frequency temporal fluctuations are correlated in regions which are functionally associated (Biswal *et al*. 1995; Lowe *et al*. 1998). In this approach, an activation dataset (e.g. under a simple motor paradigm) is acquired along with data at rest. The activation data are first analysed to identify areas which activate significantly. The coordinates of the highest activating voxel are then used to define a seed voxel in the resting data. A statistical map is generated by calculating the correlation of all time-courses in the resting data against the time-course of the seed voxel in order to find a temporally consistent resting network. The applicability of this technique, however, is limited by the fact that seed-voxel-based analysis relies on the time-course at the seed-voxel location being a ‘good’ representative for the set of correlated voxels under rest. As a consequence, the seed-voxel-based approach is restricted to cases where seed areas can be inferred accurately and robustly from activation studies (like motor area). Furthermore, the choice of the seed voxel is rather arbitrary (as, indeed, is the exact location of a peak *Z*-stat) and can be biased by different types of fMRI noise. In particular, the usefulness of such a correlation analysis is severely limited in cases where the reference time-course itself is a mixture of time-courses, e.g. different low-frequency fluctuations, spatially structured high-frequency signals such as that induced by the *N*/2 ghost, head motion, and so on. These problems are analogous to those that characterize the difference between simple correlation analysis and the GLM for model-based fMRI analysis: a multiple regression model can account for temporal effects which coincide at a single voxel location.

Figure 3 illustrates the difference between seed-voxel-based correlation analysis and PICA using data from a simple finger-tapping experiment and data acquired under rest (i.e. the first dataset). Model-based analysis of the activation data produced plausible motor cortex activation in the contra-lateral hemisphere (figure 3*a*), although in this case, the peak *Z*-score was located in the post-central gyrus rather than in the motor cortex. Using this voxel as the seed time-course in a subsequent correlation-based analysis of the resting data (i.e. excluding the motor task blocks) shows significant correlation in similar areas, such as the motor cortex bilaterally and the supplemental motor area (SMA) along the midline (figure 3*b*). On the other hand, medial posterior cortical areas and frontal parts also show significant correlation, despite not being identified as parts of the motor system engaged in the finger tapping contrast (figure 3*a*). Based on the Laplace approximation of the Bayesian model evidence, the PICA approach estimates 40 components, including various artefacts such as Nyquist ghosting, head-motion and large blood vessels. Two of the remaining components (shown in figure 3*c*) jointly cover almost identical post-thresholded areas as the map obtained from seed-voxel-based correlation analysis. Within the PICA approach, these areas are separated into different spatial maps owing to the fact that the associated time-courses are sufficiently different (as can be seen by the different power spectra). The PICA decomposition suggests that the voxels shown in figure 3*b* are part of two different spatial patterns which appear in the single correlation map by virtue of the fact that the seed voxel has partial correlation with voxels shown in the two PICA-derived maps. The multiple regression analysis underlying a PICA decomposition, by comparison, can separate these effects and gives a more plausible representation of a motor network.

### (c) Relation between physiological noise and resting-state fluctuations

One question that arises with respect to RSNs such as the one shown in figure 3, is whether the findings are functionally significant, or whether they are simply a consequence of aliased physiological effects such as the cardiac and respiratory cycles. In order to investigate the frequency characteristics of resting fluctuations we acquired single slice data covering the motor cortex at low TR (120 ms) and at a more typical TR (3 s). The high temporal sampling data are necessary to separate low-frequency effects from signal fluctuations due to cardiac or respiratory effects. These occur naturally at frequencies of 0.3–1 Hz, and so can easily become aliased by normal TR sampling to give significant power at the low frequencies (0.015–0.035 Hz) typical of resting-state fluctuations. By collecting low TR data, such aliasing will be avoided. These data have about twice as many samples in the temporal domain than voxels across space. In order to reduce computational load, therefore, we assumed a block-diagonal form of the data covariance matrix for the initial PCA dimensionality reduction, which is part of the spatial PICA decomposition.

Figure 4 shows PICA estimates obtained from low-TR resting data (left) and analysis results from activation data (self-paced bilateral finger tapping) acquired at a more typical sampling rate of 3 s (right). On the low-TR data, the separate components found include a single cardiac-cycle-related map with peak frequency of ∼1 Hz (figure 4*a*), a respiration-related map with peak frequency of approximately 0.3 Hz (figure 4*b*), as well as a map showing the spatial extent of lower frequency resting fluctuations (0.02–0.1 Hz, figure 4*c*)—in this case largely contained within sensory–motor areas. With high temporal sampling, the PICA approach clearly separates simple physiological noise components from resting fluctuations. By comparison, figure 4*d*–*f* shows corresponding PICA maps estimated from data acquired with a more typical TR. Here, the spatial maps of the respiratory and cardiac fluctuations were identified purely based on their spatial correspondence with maps shown in figure 4*a*,*b* (spatial correlation of 0.64 and 0.42, respectively). At this more normal temporal sampling rate, the simple frequency characteristics of these effects are no longer detectable owing to aliasing. The primary activation map, however, relates to activation owing to a 30 s on/off block design (periodicity of 0.01677) and can be identified easily in both the spatial and frequency domain. This suggests that even at higher TR, PICA-derived spatial maps separate cardiac and respiratory effects from other effects such as activation maps or low-frequency resting-state fluctuations.

### (d) Spatial characteristics of resting-state fluctuations

The functional relevance of low-frequency patterns depends on the spatial locality in grey matter. At typical spatial fMRI resolution, the estimated low-frequency patterns indeed appear to be ‘grey matter networks’. Figure 5*a* shows example spatial maps found using PICA on the data with higher in-plane spatial resolution (2×2 mm; third dataset). The significant voxels generally lie within grey matter and include little or no white matter. In almost all cases, a PICA decomposition also generates spatial maps depicting ‘blood vessel networks’ (BVNs). These have a similar power spectrum with peak frequencies of around 0.2 Hz, but mainly show larger blood vessels and surrounding tissue (figure 5*b*).

### (e) Consistency of resting-state fluctuations across subjects

Our initial analysis suggests that low-frequency resting patterns in fMRI are not only predominantly contained within grey matter, but also appear to be localized within discrete areas of functional significance. In order to investigate this further, we performed an exploratory group analysis for data from 10 subjects obtained under rest (fourth dataset).

Data were first motion corrected and coregistered using affine linear registration (Jenkinson & Smith 2001) into a common space defined by the mid-transformation of all 10 transformation matrices which take the individual low-resolution datasets to the space of the Montreal Neurological Insitute (MNI) template. This resulted in 10 new echo-planar imaging (EPI) datasets, which all experienced the same average amount of displacement owing to coregistration, and which were kept at the original EPI resolution. The individual datasets were then spatially smoothed by a Gaussian kernel with FWHM 7 mm and pre-processed by performing voxel-wise variance normalization and de-meaning as described in §2. The PCA eigenbasis, however, was calculated from the mean data covariance matrix, i.e. from the covariance matrix of the spatially concatenated data matrix of size (time×(voxels×subject)). All temporal eigenvectors showed dominant low-frequency content; the initial data reduction stage therefore effectively amounts to a temporal low-pass filtering of the original data by projecting each of the 10 datasets onto the 30 dominant eigenvectors. The 10 individual (dimensionality-reduced) datasets were then temporally concatenated to form a data matrix of size 300×58 032. Using the Laplace approximation to the Bayesian evidence for model order selection, the ICA decomposition estimated 23 spatio-temporal processes. The resulting spatial maps were thresholded using the Gaussian/Gamma alternative hypothesis testing approach. Note that this methodology differs from the Group-ICA methodology introduced by Calhoun *et al*. (2001) in that the individual data are not projected onto an individual sub-space, but instead, initial data reduction is performed by projecting each dataset onto a common PCA eigenbasis.

Figure 6 shows example sagittal, coronal and axial slices for eight spatial patterns (out of 23), overlayed onto the mean subjects' high-resolution structural image (1×1×1.5 mm) aligned to the MNI template (all coordinates are in mm from the anterior commissure). The final thresholded maps can be classified as follows:

*Medial visual cortical areas*: these include primary visual areas located in the calcarine sulcus bilaterally as well as medial, but not lateral, extrastriate regions such as the lingual gyrus. In addition, activation was seen in the inferior division of precuneous cortex and in the lateral geniculate nucleus of the thalamus, the primary ‘relay-station’ linking visual input to primary visual cortex. Indeed, previous diffusion weighted imaging (DWI) studies have shown that this region of thalamus connects to the occipital lobe with high probability (Behrens*et al*. (2003); the thalamic connectivity atlas derived from DWI is available at http://www.fmrib.ox.ac.uk/connect/). Although each of these areas (including thalamus) fall within the vascular territory of the posterior cerebral artery, the pattern of results is unlikely to be due to simple temporal patterns of blood supply for two reasons. First, lateral occipito-temporal regions are also supplied by the posterior cerebral artery and these do not exhibit the same resting time-course. In fact, one region of the middle temporal gyrus (MT, blue) demonstrated deactivation at rest. Second, the blood supply to the thalamus is also driven by contributions from other arteries via the circle of Willis, and thus would be unlikely to exhibit the same temporal pattern of blood flow as the posterior cerebral artery alone. Instead, these activations may correspond to a set of connected regions primarily composed of the primary visual system.*Lateral visual cortical areas*: these included the occipital pole extending laterally towards the occipito-temporal junction, encompassing non-primary regions of visual cortex. In addition, activation was seen more dorsally in superior parietal regions. This set of regions is frequently found to be coactivated in functional studies of visual attention or visuo-spatial attention. De-activation is found in posterior cingulate cortex (Brodmann's area 30), a region which has been found to coactivate in studies of orientation and navigation in large-scale spaces (Maguire 2001), and, in a more recent study, has been found to de-activate in a delayed-match-to-sample task under sleep deprivation (Habeck*et al*. 2004).*Auditory system*: activation in this component encompassed primary and secondary auditory cortices, including Heschl's gyrus, planum polare and planum temporale, the lateral superior temporal gyrus and posterior insular cortex (Rivier & Clarke 1997; Rademacher*et al*. 2001). Additional activation was observed in the anterior cingulate cortex, anterior supramarginal gyrus and thalamus. Although these regions are primarily supplied by the middle cerebral artery (MCA), like previous RSNs, this appears to be confined to functionally related areas rather than encompass the entire fronto-parietal area supplied by the MCA.*Sensory–motor system*: these included activation in pre- and post-central gyri extending from the superior bank of the Sylvian fissure to the medial wall of the interhemispheric fissure and included the SMA. This pattern of activation corresponds closely to that seen in bimanual motor tasks. Similar patterns have been identified in previous RSN studies (Biswal*et al*. 1995).*Visuo-spatial system*: activation was observed in the posterior parietal cortex at the occipito-parietal junction, along the midline in the pre-cuneus and posterior cingulate cortex, and in the frontal pole. Within this spatial map, deactivation was mainly found in the pre-SMA. These regions are the same as the physiological baseline areas proposed by Gusnard & Raichle (2001). The authors hypothesize that activity within the posterior cingulate cortex and adjacent pre-cuneus is associated with the representation of the world around us. Similarly, previous studies have demonstrated that lesions affecting the lateral posterior parietal areas lead to severe deficits in spatial attention (Mesulam 1981; Posner*et al*. 1984), while disrupting the input to the region prevents patients from benefiting from spatial cues (Alivisatos & Milner 1989; Koski*et al*. 1998; Petrides & Pandya 2002). Similar findings in macaques show that neuronal responses in a corresponding region (area 7a) are suppressed when stimuli appear at a cued spatial location (Constantinidis & Steinmetz 2001). Taken together, these findings suggest that the posterior parietal cortex is engaged in orienting to salient visuo-spatial cues. Interestingly, the only region to show deactivation was the pre-SMA, a region consistently implicated in internally (i.e. memory-driven), rather than externally cued, tasks (Rushworth*et al*. 2004). Overall then, this pattern of findings is consistent with findings by Gusnard & Raichle (2001), that activation in this component may reflect a system of cortical regions engaged in attending to visuo-spatial information. A similar spatial map has been identified using both seed-voxel-based correlation analysis and PICA (Greicius*et al*. 2003, 2004), and has been shown to have reduced activity in the posterior cingulate cortex in Alzheimer's disease.*Executive control*: areas of activation include superior and middle prefrontal cortices, anterior cingulate and paracingulate gyri, and ventrolateral prefrontal cortex. In addition, there was subcortical activation in a region of the thalamus which connects to prefrontal cortex with high probability (Behrens*et al*. 2003; Johansen-Berg*et al*. in press). These areas have been hypothesized to provide bias signals to other areas of the brain in order to implement cognitive control (Miller & Cohen 2001).*Dorsal visual stream*: unlike all of the previous components, these two components showed primarily lateralized activations corresponding roughly to the dorsal visual stream. For instance, (*g*) shows activation in right lateral occipital complex, right inferior parietal cortex, bilateral intraparietal sulcus, and right middle and superior frontal gyri. The complementary pattern is seen in the left hemisphere in (*h*), including the bilateral intraparietal activation. In fact, this is the only area to show a bilateral response in these two components and activation in the region overlaps in the two components, demonstrating that PICA can be used to reveal separate neural systems that contain spatial overlap (i.e. functional integration; cf. Friston 2002).

## 5. Discussion

Various researchers have demonstrated that an ICA decomposition can be used to identify patterns of activation, image artefacts and physiologically generated components including RSNs (DeLuca *et al*. 2002*a*,*b*, submitted; Goldman & Cohen 2003; Kiviniemi *et al*. 2003; Beckmann & Smith 2004; Greicius *et al*. 2004). Here, we extend the scope of previous investigation into the use of ICA in fMRI, investigating a variety of important aspects relating to the applicability of ICA to resting-state studies. We have demonstrated that an ICA approach can identify a variety of fluctuations (even in cases where these signals coincide at a particular voxel's location) and that ICA is able to estimate even largely overlapping spatial processes. Furthermore, using high- and low-TR data, we have provided evidence that ICA-derived spatial maps of RSNs are unaffected by respiratory/cardiac fluctuations even though at normal TR the temporal structure of the latter becomes aliased into a frequency range which overlaps that of resting-state fluctuations. Our results suggest that the resting patterns, which qualitatively resemble fMRI activation maps, are largely contained within grey matter and have a different spatial characteristic to ICA maps of major blood vessels. Finally, using data from 10 subjects, we have shown that resting-state patterns are spatially consistent across subjects, clearly identifying networks of functional significance including areas such as visual, sensory or motor cortex as being reproducibly found across subjects.

The above results do not necessarily imply that these spatial patterns are of neuronal origin; they might simply relate to changes in local physiology, such as fluctuations linked to local cytoarchitecture and/or local vasculature, which would make RSNs of less interest to neuroscience. As an example, Wise *et al*. (2004) have demonstrated that large cortical areas of the occipital, parietal and temporal lobes exhibit fluctuations which covary significantly with the endtidal level of exhaled carbon dioxide. Further studies are required to characterize the extent to which these non-neuronal processes relate to the ICA findings. The maps generated by the ICA-group analysis separate these large areas into smaller networks, suggesting that, while there might be a common underlying low-frequency signal induced by vascular processes owing to the arterial carbon dioxide fluctuations, these networks have characteristic additional signal fluctuations which can be detected by ICA.

Regardless of their underlying cause, however, RSNs are a major source of structured non-modelled noise in fMRI, and as such, deserve to be better understood. Not only do they contribute significantly to the residual variance (lowering sensitivity to true activation), but because they often correlate temporally with experimental paradigm timings, can cause positive or negative bias in the estimated activation (i.e. cause false positives or false negatives).

A limiting factor to the interpretability of PICA-derived maps clearly stems from the fact that the fMRI BOLD signal is an indirect measure of neuronal activity. The spatial sensitivity and specificity from using ICA on resting fMRI data, however, is relatively high, which could be utilized for more advanced approaches, e.g. joint temporal and spatial exploratory data decompositions such as those introduced by Martinez-Montes *et al*. (2004), which simultaneously explain data from imaging modalities with high temporal (EEG) and high spatial resolution (fMRI).

## Acknowledgments

The authors thank Heidi Johansen-Berg and Tim Behrens for access to DWI-derived thalamic connectivity data. The authors gratefully acknowledge the financial support from the UK EPSRC, MRC and GlaxoSmithKline.

## Footnotes

One contribution of 21 to a Theme Issue ‘Multimodal neuroimaging of brain connectivity’.

↵Here, we only discuss the case of a decomposition into spatially independent source signals; the reason for this will become apparent later.

↵For simplicity we assume de-meaned data.

↵The covariance of the noise is allowed to be voxel dependent in order to encode the vastly different noise covariance observed within different tissue types (Woolrich

*et al*. 2001).↵In this case, ‘activation’ is to be understood as a signal that ‘cannot be explained as random correlation coefficient’.

↵For residual motion artefacts, every voxel is theoretically influenced by an associated motion time-series, but only voxels near intensity boundaries are detectable.

↵Zero-mean Gaussian noise with

*σ*=3; the maximum signal amplitude modulation is 2.

- © 2005 The Royal Society