## Abstract

The power of today's laboratory equipment allows more and more data channels to be easily recorded. However, some misunderstandings about processing such multivariate data may still be found in the literature. The typical mistake is to treat a multichannel system as comprising pairs of channels; this approach does not use all the collected information about the investigated system, and may lead to erroneous conclusions. In this paper, the differences between single- and multichannel approaches will be briefly summarized and some examples of problems will be described.

## 1. Introduction

One important problem in data analysis is determining causal relations between signals. Precise estimation of the pattern of transmissions can significantly increase knowledge about functional relations in the investigated system. Of course, this question arises not only in the fields of biology and medicine but in practically every case where multichannel data can be recorded, e.g. in geophysics or economics. There have been many attempts to create a method which could give information about causal relations between data channels. Early publications, mostly from the political and social sciences from the 1950s, presented methods that attempted to deal with this problem. Unfortunately, they cannot be directly used in the field of biomedical data analysis because of their lack of objectiveness in obtaining the result—they assume certain relations within the dataset *a priori*. Sometimes they do not even use the temporal relations between channels. One of the pioneers in the field of causality estimation research is Hirotugu Akaike, who proposed a relative power contribution (RPC) method based on a multichannel (multivariate) autoregressive model in 1968 (Akaike 1968). The formalism presented was later used and redeveloped by many others. Since the late 1970s, when computers were introduced into biomedical data processing, many works have been published proposing various methods dealing with the problem of causal dependences estimation. For instance, Saito & Harashima (1981) defined directed coherence, Gersch (1972) developed a method of regression analysis for pairs and triplets of channels for identifying source signals and engineers considered causal relations in terms of feedback loops in multichannel systems (e.g. Caines & Chan 1975). In the next decade, measures estimating nonlinear coupling between channels appeared in the literature. The notion of directed influence and dependence between random variables was also considered in another approach called Bayesian networks (or belief networks and graphical models). This formalism is popular in the fields of artificial intelligence and the systems theory (Jensen 2001).

Unfortunately, practically all of the proposed measures were defined for pairs of channels only. Today, even average laboratory equipment allows for recording dozens of data channels simultaneously. using a bivariate method to process every pair of channels from a multichannel set is troublesome but possible, and often performed. However, such an approach is basically incorrect for what will be discussed in this paper. In certain situations, typical to biomedical experimental practice, pairwise estimation of causal relations will give erroneous results.

Kamiński & Blinowska (1991) proposed the directed transfer function (DTF) function, a truly multichannel method for estimating information flow between signals. The method will be used in the presented examples for comparison between the multichannel and bivariate approaches. However, it should be noted that the general conclusions based on the results presented in this paper do not depend on the bivariate approach used.

## 2. Methods

### (a) The AR model

In this paper, a linear approach to the evaluation of time-series, namely, the AR model, will be considered. In terms of the model, all discussed properties will be presented. For electro-encephalogram (EEG) signals linear approximation is sufficient (except for a case of certain epochs of an epileptic seizure) as demonstrated by Pijn *et al*. (1991, 1997), Stam *et al*. (1999), Blinowska & Malinowski (1991) and Achermann *et al*. (1994). The AR model assumes that a value of the signal *x* at time *t* depends on its previous values (*p*) with a random component *ϵ*(2.1)

The *a*_{j} constants are called the model coefficients and *p* is the model order. This type of approach, despite its simplicity, is widely and successfully used in biomedical signal analysis, and is especially useful in EEG analysis. All the quantities describing our data can now be expressed in terms of the AR model parameters *a*. In a multivariate (*k* channels) case, the process value at time *t* is a vector of size *k*, the model coefficients are *k*×*k* matrices and the noise component is a vector of size *k*. Since spectral properties of signals are the most interesting, we can transform equation (2.1) to the frequency domain. The details of this procedure are in the literature (e.g. Marple 1987). Finally, we get(2.2)where * X*(

*f*),

*(*

**E***f*),

*(*

**A***f*) are Fourier transforms of the signal

*x*, the noise components

*ϵ*and model coefficients

*a*, respectively. The matrix

*(*

**H***f*)=

**A**^{−1}(

*f*) is called the transfer matrix of the system. The power spectrum

**S**(

*f*) is then given by(2.3)where

*is the noise component variance and * denotes the transposition and the complex conjugate. Note that the variance of noise does not depend on the frequency.*

**V**### (b) Non-directional measures

When analysing a single channel of data, several quantities describing the data, such as variance, spectrum and many others, can be calculated. In the case of two signals these quantities can be calculated for each of the channels. But these so-called auto-quantities would not describe inter-relations between the two channels. Inter-dependencies in the system are described by cross-functions. The well-known measures of that type are cross-correlations and cross-spectrum or its normalized version, the (ordinary) coherence. The cross-spectrum is described by the off-diagonal elements of the spectral matrix * S*. The ordinary coherence between channels

*i*and

*j*is defined as:(2.4)where

*S*

_{ij}are elements of the spectral matrix

*. The ordinary coherence describes the amount of in-phase components in both signals at the given frequency*

**S***f*. The modulus of

*K*

_{ij}(

*f*) takes values from the [0,1] range; values close to one represent signals highly correlated at the frequency

*f*while zero indicates the lack of such a relation.

When we consider a system of more than two channels of data, new possibilities of inter-channel relations appear.

As shown in figure 1, one channel can be related to several other channels simultaneously, or two channels can be indirectly connected via other channels. For systems consisting of more than two channels, new types of coherences can be applied (Jenkins & Watts 1968). The partial coherence, defined as:(2.5)describes the amount of in-phase components in signals *i* and *j* at the given frequency *f* when the influence of other channels of the set is statistically removed. Its modulus takes values from [0,1] range, similar to ordinary coherence, but it is only non-zero when the relation between channel *i* and *j* is a direct one. **M**_{ij} is a minor (determinant) of matrix * S* with

*i*th row and

*j*th column removed.

The multiple coherence:(2.6)describes the amount of in-phase component in the channel *i* and all the other channels (the rest) of the set. Similarly, values of its modulus are in the [0,1] range.

In a two channel system, partial and multiple coherences are identical to the ordinary coherence. Starting from three channel systems, the behaviour of different coherences depends on interrelations in the whole system, not just between pairs of signals. The example in figure 2 shows moduli of all the three types of coherences calculated for a seven channel system of correlated data. Ordinary coherences (above diagonal) are quite large, all above 0.5, and they form a pattern. Multiple coherences (on the diagonal) are all close to 1, indicating that every channel is correlated with the rest of the set. However, partial coherences (below the diagonal) show a quite different connection pattern of direct relations only.

The partial coherence can be a very important tool in multichannel data analysis. It helps to distinguish which relation in the set is direct. A practical example of the application of the coherences will be given later in the text.

### (c) Directional measures

Each coherence, and in particular the ordinary coherence, is a complex number which can be expressed as modulus and phase. The modulus expresses the connection strength and the phase expresses phase difference. It means that the coherence phase may be used to determine the direction of causal influence between channels. In reality, this is only possible for certain simple situations; for instance, for narrow band signals with clear unidirectional relation. Moreover, the phase estimated by means of Fourier transform is defined modulo 2*π*, which makes estimation of the direction ambiguous. Because of this property, the coherences cannot be in practice considered as quantities able to show causal relations between signals. They only show that a relation exists so the direction of influence should be estimated in a different way.

To be able to speak precisely about causal relations in time-series, a definition of the notion of causality is needed. This need was satisfied by Granger (1969). Based on the work of Wiener (1956), he presented a definition of causality in terms of time-series, called Granger causality, which became widely accepted. The idea is based on the predictability of time-series. Prediction of signal *X*_{1} from its *p* previous values can be expressed by the equation:(2.7)where *E* is the prediction error. If, by adding to the prediction *q* past values of a process *X*_{2}(2.8)we can lower the variance of the prediction error (*E*′), then we say that *X*_{2} Granger causes *X*_{1}. This definition can be expressed in terms of linear modelling of time-series, especially with the AR model, and has received a great deal of attention in the field of biological signal analysis.

The DTF function, a multichannel causality measure proposed by Kamiński & Blinowska (1991), is based on an AR model fitted to the data. The normalized version of DTF was defined using elements of the transfer matrix * H* of the system:(2.9)

It describes a ratio between the inflow from channel *j* to channel *i* to all the inflows to the channel *i*. This function takes values from a [0,1] range. In some cases, especially when comparing results of different experiments, the non-normalized version is more suitable. It is given by:(2.10)

In its non-normalized form, DTF is related to Akaike's RPC method, but being based on the transfer function, DTF describes properties of the system generating the signals not including multivariate autoregressive (MVAR) model input noise properties.

Granger causality definition was originally given for a pair of channels. This concept may be extended to a multivariate case of *k* channels. In this approach we will try to predict a value of signal *X*_{1} using all available signals from the set:(2.11)

We say that *X*_{2} causes *X*_{1} in the sense of Granger causality in multichannel cases when, by adding the past values of *X*_{2} to the prediction (2.11), we can lower the variance of the prediction error in comparison to the situation when *X*_{2} was not included. In Kamiński *et al*. (2001) it was shown that the non-normalized version of DTF can interpret Granger causality in the multivariate sense. In this paper, the non-normalized DTF will be used in calculations.

## 3. Bivariate versus multivariate approach: simulation examples

As noted earlier, the application of a bivariate method for each pair of channels from the multichannel set does not make use of all its covariance structure information. A pairwise approach would not detect some features characteristic for multichannel systems, which may lead to erroneous interpretations of the results. The case of a common source is especially problematic.

The first simulated example will concern the situation when one channel is a source of a signal which appears in several other channels. Such a case is quite typical for EEG analysis practice. Let us consider simulation 1, a three channel system, where channel 1 is driving channels 2 and 3 with different time delays. The signal in the channel 1 was constructed from a bandpass-filtered EEG signal (1280 samples) with noise added. Signals in channels 2 and 3 were constructed from the delayed (of one and two samples, respectively) channel 1 signal with additional uncorrelated noise added (of variance equal to delayed signal variance). In figure 3, the scheme of the simulated flows is presented together with the results of DTF calculations performed on all three channels simultaneously and DTF applied to each pair of channels separately. A substantial difference in results can be observed. The pairwise approach will detect correctly transmissions 1→2 and 1→3. Additionally, because channels 2 and 3 will contain the same signal from channel 1 (time shifted), the transmission 2→3, non-existent in the system, will be detected as well. During calculations for the 2–3 pair (separately) the information about the true source, channel 1, was simply not included. We also see that a multichannel method, using information from all channels simultaneously, gave correct results.

The coherences for the simulation 1 dataset are presented in figure 4. They are calculated using the multichannel method (note that the partial and multiple coherences make sense only for multichannel cases). The ordinary and multiple coherences in this case are high and only demonstrate the fact that all the channels are related to each other. However, partial coherence is very low for the 2–3 pair, which indicates that channels 2 and 3 are not directly connected, as expected.

The detection of the phenomenon of fake transmission between channels 2 and 3 in simulation 1 is responsible for the occurrence of ‘sinks’ of activity at certain locations. Simulation 2 shows the mechanism of appearance of such a spurious sink. The simulated signal was constructed analogically to simulation 1. Now, a source of activity drives several other areas with different delays (e.g. owing to different distances). The DTF results of simulation 2 with four channels are presented in figure 5. In the case of the bivariate approach, channel 4, with the biggest delay in relation to the real source signal, seems to be driven by channels 1, 2 and 3 and a target for many fake ‘sources’ of activity. Again, the multichannel measure was able to detect the true pattern of connections.

The indirect transmission case is approached in simulation 3. The signal in channel 1 contained bandpass-filtered EEG signal and a noise. The channel 2 signal was constructed by delaying the signal from the channel 1 by one sample and adding an uncorrelated noise. The *n*th channel was created the same way, using signal from channel *n*−1. We can see here a set of transmissions in every pair of channels. Let us focus on the 1→4 transmission. Of course, there is no direct relation between channels 1 and 4. However, the signal from channel 1 was transmitted through channels 2 and 3 and finally appeared in channel 4 as a part of its signal. The DTF is not a partial measure itself. It means that if a transmission occurs in a system where the signal is not transmitted directly from channel *i* to channel *j*, but through other channels from the set, the DTF will still show a relation between channels *i* and *j*. To deal with this problem several solutions were proposed. Kamiński *et al*. (2001) proposed the DC method, which is based on testing values of *a*_{j} coefficients (equation (2.1)). Sameshima & Baccalá (1999) presented partial directed coherence method defined by means of * A*(

*f*) parameters (equation (2.2)). Korzeniewska

*et al*. (2003) defined a function called direct DTF (dDTF). This function is based on an idea of combining DTF and partial coherences. Partial coherences have non-zero values only for direct relations between channels. A quantity obtained by multiplication of DTF and modulus of partial coherence should remain non-zero only when there is a causal relation between channels with the correct direction, and when these channels are related directly. The result of dDTF function based on this approach applied to data from simulation 3 is presented in figure 6

*d*and figure 6

*e*. The resulting pattern of direct transmissions is as expected.

## 4. Discussion

In this paper a quick introduction of possible problems resulting from improper multichannel data analysis was presented. Simulations were designed to be very simple to clearly show the presented ideas. Although the DTF function and Granger causality were used here, it was more for demonstration purposes. This paper was not intended as an overview of any particular method of causal relations estimation. Moreover, some details important in real data calculations were omitted for simplicity. For a method of estimating the statistical significance of the results, detailed presentations of the applied functions with extended discussions and additional examples can be found elsewhere (e.g. Kuś *et al*. 2004). A comparison between bivariate and multivariate analysis performed on a real multichannel EEG data record is presented by Blinowska *et al*. (2004).

As was shown before, even in very simple situations presented in the simulated examples, analysis of causal influences performed on separate pairs of channels can lead to incorrect results. What should be stressed at this point is the fact that the conclusions presented here do not depend on the bivariate method used. Considering the presented simulations, the existence of fake relations in such situations (situations quite common in biomedical signal practice) will be shown by any causality measure applied in a bivariate way. Depending on the properties of the bivariate method used—if it operates in frequency or time domain; if it is a nonlinear or a linear measure—the numerical value describing properties of the given connection will be different, but the fact that such channels are causally related will be detected. This should always be considered a property of bivariate approaches and the estimated results should be interpreted accordingly. Granger (1980) gives a strong statement in his later work that a test for causality is impossible unless the set of interacting channels is complete. Unfortunately, there is no mathematical way to assure that all the relevant information was included in the measured set of data. Careful analysis of the investigated problem and precise planning of the experiment is the only possibility. This concerns a proper selection of investigated sites, which must include all possible sources of information. On the other hand, when the number of data channels at our disposal is too big to be analysed simultaneously by a single multivariate autoregressive (MVAR) model, the number of channels must be reduced. Again, this procedure should not exclude important sources of activity because it would corrupt the investigated pattern of connections. Moreover, in a case when a bivariate measure should be applied to a selected pair of signals from a bigger set, it would be advisable to first apply a multichannel measure to the bigger set to get a preliminary insight into the general pattern of connections, and to avoid incorrect interpretations. The use of combined specialized multivariate functions like DTF and partial coherences can provide more accurate insight into the system being explored.

## Acknowledgments

I thank my collaborator, R. Kuś, for the preparation of the presented simulations. This work was partly supported by the Polish Committee for Scientific Research grant 4T 11E 02823.

## Footnotes

One contribution of 21 to a Theme Issue ‘Multimodal neuroimaging of brain connectivity’.

- © 2005 The Royal Society