## Abstract

Mapping evolutionary trajectories of discrete traits onto phylogenies receives considerable attention in evolutionary biology. Given the trait observations at the tips of a phylogenetic tree, researchers are often interested where on the tree the trait changes its state and whether some changes are preferential in certain parts of the tree. In a model-based phylogenetic framework, such questions translate into characterizing probabilistic properties of evolutionary trajectories. Current methods of assessing these properties rely on computationally expensive simulations. In this paper, we present an efficient, simulation-free algorithm for computing two important and ubiquitous evolutionary trajectory properties. The first is the mean number of trait changes, where changes can be divided into classes of interest (e.g. synonymous/non-synonymous mutations). The mean evolutionary reward, accrued proportionally to the time a trait occupies each of its states, is the second property. To illustrate the usefulness of our results, we first employ our simulation-free stochastic mapping to execute a posterior predictive test of correlation between two evolutionary traits. We conclude by mapping synonymous and non-synonymous mutations onto branches of an HIV intrahost phylogenetic tree and comparing selection pressure on terminal and internal tree branches.

## 1. Introduction and background

Reconstructing evolutionary histories from present-day observations is a central problem in quantitative biology. Phylogenetic estimation is one example of such reconstruction. However, phylogenetic reconstruction alone does not provide a full picture of an evolutionary history, because evolutionary paths (mappings) describing trait states along the phylogenetic tree remain hidden. Although one is rarely interested in detailed reconstruction of such mappings, certain probabilistic properties of the paths are frequently used in evolutionary hypotheses testing (Nielsen 2002; Huelsenbeck *et al*. 2003; Leschen & Buckley 2007). For example, given a tree and a Markov model of amino acid evolution, one can compute the expected number of times a transition from a hydrophobic to a hydrophilic state occurs, conditional on the observed amino acid sequence alignment. Such expectations can inform researchers about model adequacy and provide insight into the features of the evolutionary process overlooked by standard phylogenetic techniques (Dimmic *et al*. 2005).

Nielsen (2002) introduced stochastic mapping of trait states on trees and employed this new technique in a model-based evolutionary hypothesis testing context. The author starts with a discrete evolutionary trait *X* that attains *m* states. He further assumes that this trait evolves according to an evolutionary model described by a parameter vector ** θ**, where

**consists of a tree**

*θ**τ*with

*n*tips and branch lengths , root distribution and a continuous-time Markov chain (CTMC) generator

**={**

*Q**q*

_{ij}} for . Let mapping be a collection of CTMC trajectories along all branches of

*τ*and

*H*(

*M*_{θ}) be a real-valued summary of

*M*_{θ}. Clearly, even when parameters

*θ*are fixed,

*h*(

*M*_{θ}) remains a random variable. Nielsen (2002) proposed to test evolutionary hypotheses using prior and posterior expectations and , where are trait values observed at the

*n*tips of

*τ*. Since these expectations are deterministic functions of

**and**

*θ***, they can be used as discrepancy measures for posterior predictive**

*D**p*-value calculations (Meng 1994; Gelman

*et al*. 1996).

A major advantage of Nielsen's stochastic mapping framework is its ability to account for uncertainty in model parameters, including phylogenies. A major limitation of stochastic mapping is its current implementation that relies on time-consuming simulations. In describing his method for calculating and , Nielsen (2002) wrote ‘In general, we can not evaluate sums in equations 5 and 6 directly, because the set [of all possible mappings] is not of finite size.’ However, the infinite number of possible mappings does not prevent one from explicitly calculating for some choices of *H*. For example, if(1.1)then , the familiar phylogenetic likelihood that can be evaluated without simulations (Felsenstein 2004). Therefore, hope remains that other choices of *H* may also permit evaluation of and without simulations.

In this paper, we consider a class of additive mapping summaries of the form(1.2)where is a one-dimensional summary of the Markov chain path along branch *b* and *Ω* is an arbitrary subset of all branches of *τ*. Moreover, we restrict our attention to the two most popular choices of function *h*. Let be a set of ordered index pairs that label transitions of trait *X*. For each Markov path {*X*_{t}} and interval [0,*t*), we count the number of labelled transitions in this interval and arrive at(1.3)where we omit dependence on ** θ** and

*t*for brevity. Although our second choice of

*h*is more abstract, it is motivated by Huelsenbeck

*et al*. (2003), who used Nielsen's stochastic mapping algorithm to calculate the mean dwelling time of a trait in a particular state. Let be a set of rewards assigned to each trait state. Trait

*X*is ‘rewarded’ the amount

*t*×

*w*

_{i}for spending time

*t*in state

*i*. We obtain the total reward of Markov path {

*X*

_{t}} by summing up all rewards that

*X*accumulates during interval [0,

*t*),(1.4)To obtain dwelling times of

*X*in a predefined set of trait states, we set

*w*

_{i}=1 if

*i*belongs to the set of interest and

*w*

_{i}=0 otherwise.

For these two choices of function *h*, we provide an algorithm for exact, simulation-free computation of and . Similar to phylogenetic likelihood calculations of Pr(** D**), this algorithm relies on the eigen decomposition of

**and requires traversing**

*Q**τ*. Our work generalizes and unifies exact calculations previously developed for stochastic mapping (Holmes & Rubin 2002; Guindon

*et al*. 2004; Dutheil

*et al*. 2005). Despite the restricted form of the considered mapping summaries, our results cover nearly all current applications of stochastic mapping. We conclude with two applications of stochastic mapping illustrating the capabilities of exact computation. In our first example, we examine coevolution of two binary traits and demonstrate that a previously developed simulation-based test of independent evolution can be executed without simulations. We then turn to a large codon Markov state space, on which simulation-based stochastic mapping generally experiences severe computational limitations. Using our exact computations, we study temporal patterns of synonymous and non-synonymous mutations in intrahost HIV evolution.

## 2. Local, one branch calculations

In this section, we provide mathematical details needed for calculating expectations of stochastic mapping summaries on one branch of a phylogenetic tree. We first motivate the need for such local computations by making further analogies between calculations of Pr(** D**) and expectations of stochastic mapping summaries. The additive form of

*H*reduces calculation of and to compute branch-specific expectations and . Recall that according to the most phylogenetic models, trait

*X*evolves independently on each branch of

*τ*, conditional on trait states at all internal nodes of

*τ*. This conditional independence is the key behind the dynamic programming algorithm that allows for efficient calculation of Pr(

**) (Felsenstein 1981). For this likelihood calculation algorithm, it suffices to compute finite-time transition probabilities , where(2.1)for arbitrary branch length**

*D**t*. Similarly, to obtain and , we require means of computing local expectations , where(2.2)and 1

_{{.}}is the indicator function. After illustrating how to compute and without resorting to simulations, we provide an algorithm that efficiently propagates local expectations

**(**

*E**h*,

*t*) and finite-time transition probabilities

**(**

*P**t*) along

*τ*to arrive at and .

### (a) Expected number of labelled Markov transitions

Abstracting from phylogenetics, let count the number of labelled transitions of a CTMC {*X*_{t}} during time interval [0,*t*). It follows from the theory of Markov chain-induced counting processes that(2.3)where (Ball & Milne 2005). Since most evolutionary models are locally reversible, we can safely assume that ** Q** is diagonalizable with eigen decomposition , where eigenvectors of

**form the columns of**

*Q***, are the real eigenvalues of**

*U***, and is a diagonal matrix with elements on its main diagonal. Such analytic or numeric diagonalization procedure permits calculation of finite-time transition probabilities , needed for likelihood calculations (Lange 2004). Minin & Suchard (2008) have shown that one can use the same eigen decomposition of**

*Q***to calculate local expectations(2.4)where ,**

*Q*

*E*_{i}is a matrix with zero entries everywhere except at the

*ii*-th entry, which is one, and(2.5)

### (b) Expected Markov rewards

For the reward process *R*_{w}(*t*), we define a matrix cumulative distribution function , where(2.6)Neuts (1995, ch. 5) demonstrated that local reward expectations can be expressed as(2.7)where(2.8)is the Laplace–Stieltjes transform of ** V**(

*x*,

*t*). It is easy to see that the matrix exponential in equation (2.8) satisfies the following differential equation:(2.9)Differentiating this matrix differential equation with respect to

*s*, exchanging the order of differentiation and evaluating both sides of the resulting equation at

*s*=0, we arrive at the differential equation for local expectations(2.10)where

**(**

*E**R*

_{w}, 0) is the

*m*×

*m*zero matrix. Multiplication of both sides of equation (2.10) by integrating factor from the right and integration with respect to

*t*produces the solution,(2.11)Similarity between equations (2.3) and (2.11) invites calculation of the expected Markov rewards via spectral decomposition of

**,(2.12)In summary, formulae (2.4) and (2.12) provide a recipe for exact calculations of local expectations for the number of labelled transitions and rewards.**

*Q*## 3. Assembling pieces together over a tree

### (a) Notation for tree traversal

Let us label the internal nodes of *τ* with integers {1, …, *n*−1} starting from the root of the tree. Recall that we have already arbitrarily labelled the tips of *τ* with integers {1, …, *n*}. Let be the set of internal branches and be the set of terminal branches of *τ*. For each branch *b*∈, we denote the internal node labels of the parent and child of branch *b* by *p*(*b*) and *c*(*b*), respectively. We use the same notation for each terminal branch *b* except *p*(*b*), which is an internal node index, while *c*(*b*) is a tip index. Let denote the internal node trait states. Then, the complete likelihood of unobserved internal node states and the observed states at the tips of *τ* is(3.1)We form the likelihood of the observed data by summing over all possible states of internal nodes,(3.2)Clearly, when data on the tips are not observed, the prior distribution of internal nodes becomes(3.3)

### (b) Posterior expectations of mapping summaries

Consider an arbitrary branch *b*^{*} connecting parent internal node *p*(*b*^{*}) to its child *c*(*b*^{*}). First, we introduce restricted moments,(3.4)The expectation (3.4) integrates over all evolutionary mappings consistent with ** D** on the tips of

*τ*. Invoking the law of total expectation and the definition of conditional probability, we deduce(3.5)The last expression in derivation (3.5) illustrates that in order to calculate the posterior restricted moment (3.4) along branch , we merely need to replace finite-time transition probability with local expectation in the likelihood formula (3.2). Similarly, if , we substitute for in (3.2). Given matrices for and , we can sum over internal node states using Felsenstein's pruning algorithm to arrive at the restricted mean and then divide this quantity by Pr(

**) to obtain .**

*D*This procedure is efficient for calculating the posterior expectations of mapping summaries for one branch of *τ*. However, in practice, we need to calculate the mapping expectations over many branches and consequently execute the computationally intensive pruning algorithm many times. A similar problem is encountered in maximum-likelihood phylogenetic estimation during differentiation of the likelihood with respect to branch lengths. An algorithm, reviewed by Schadt *et al*. (1998), allows for computationally efficient, repeated replacement of one of the finite-time transition probabilities with an arbitrary function of the corresponding branch length in equation (3.2). This algorithm finds informal use since the 1980s in pedigree analysis (Cannings *et al*. 1980) and is implemented in most maximum-likelihood phylogenetic reconstruction software such as PAUP (J. Huelsenbeck 2007, personal communication).

Let be the vector of forward, often called partial or fractional, likelihoods at node *u*. Element *F*_{ui} is the probability of the observed data at only the tips that descend from the node *u*, given that the state of *u* is *i*. If *u* is a tip, then we initialize partial likelihoods via . In case of missing or ambiguous data, *D*_{u} denotes the subset of possible trait states, and forward likelihoods are set to . During the first, upward traversal of *τ*, we compute forward likelihoods for each internal node *u* using the recursion(3.6)where *b*_{1} and *b*_{2} are indices of the branches descending from node *u* and *c*(*b*_{1}) and c(*b*_{2}) are the corresponding children of *u*. We denote the directional likelihoods in square brackets of equation (3.6) by and and record together with *F*_{u}. Finally, we define backward likelihoods , where *G*_{ui} is the probability of observing state *i* at node *u* together with other tip states on the subtree of *τ* obtained by removing all lineages downstream of node *u*. A second, downward traversal of *τ* yields *G*_{u} given the precomputed *S*_{b}.

For each branch *b*^{*}, we can sandwich among the forward, directional and backward likelihoods and write(3.7)where *b*′ is the second branch descending from the parent node *p*(*b*^{*}). Therefore, with *F*_{u}, *S*_{b} and *G*_{u} precomputed for all nodes and branches of *τ*, we can replace with any other quantity for any arbitrary branch *b* without repeatedly traversing *τ*. In particular, the posterior restricted moment for branch *b*^{*} can be expressed as(3.8)In figure 1, we use an example tree to illustrate the correspondence between each quantity in sandwich formula (3.8) and the part of the tree involved in this quantity computation. We summarize all steps that lead to the computation of the global mean in algorithm 1. Note that Pr(** D**), needed to transition between conditional and restricted expectations in formula (3.4), is computed with virtually no additional cost in step (iv) of the algorithm.

### (c) Pulley principle for evolutionary mappings

Suppose that we are interested in a mean mapping summary , obtained as a sum of local mapping summaries over *all branches* of the phylogenetic tree *τ*. We would like to know whether quantity changes when we move the root of *τ* to a different location.

Recall that reversibility of the Markov chain {*X*_{t}} makes Pr(** D**) invariant to the root placement in

*τ*if the root distribution

**is the stationary distribution of {**

*π**X*

_{t}} (Felsenstein 1981). Felsenstein's pulley principle rests on the detailed balance condition(3.9)and Chapman–Kolmogorov relationship(3.10)Applying both formulae (3.9) and (3.10) once in equation (3.2) allows one to move the root to any position along the two root branches without changing Pr(

**). Therefore, Pr(**

*D***) is invariant to moving the root to any position on any branch of**

*D**τ*since we can repeatedly apply the detailed balance condition and the Chapman–Kolmogorov equation.

Invariance of Pr(** D**) to root placement together with formula (3.4) suggests that root position invariance of conditional expectations holds if and only if invariance of joint expectations is satisfied. Consider a two-tip phylogeny with branches of length

*t*

_{1}and

*t*

_{2}leading to observed trait states

*D*

_{1}and

*D*

_{2}, respectively. According to formulae (1.2) and (3.5), we may expect that(3.11)depends only on the sum

*t*

_{1}+

*t*

_{2}. Therefore, we can move the root anywhere on this phylogeny without altering the expectations if {

*X*

_{t}} is reversible. It is easy to see that repeated application of derivation (3.11) readily allows for extension of the root invariance principle to

*n*-tip phylogenies.

In derivation (3.11), we use identities(3.12)and(3.13)Equation (3.12) splits computation of the expected summary on interval [0, *t*_{1}+*t*_{2}) into calculations bound to intervals [0,*t*_{1}) and [*t*_{1},*t*_{1}+*t*_{2}) with the help of the total expectation law and Markov property. This derivation parallels that of Chapman–Kolmogorov equation (3.10). Identity (3.13) requires more care as it *does not hold* for all choices of function *h*. Using equation (2.12), detailed balance condition (3.13) holds for *h*_{2}=*R*_{w}. However, equation (2.3) suggests that we only can guarantee the detailed balanced condition (3.13) for when if and only if .

### (d) Prior expectations of mapping summaries

In many applications of stochastic mapping, one wishes to compare the prior to posterior expectations of summaries (Nielsen 2002). In this section, we derive the formulae necessary for computing prior expectations. Similar to our derivation of posterior expectations, we begin by considering an arbitrary branch *b*^{*} and use the law of total expectation to arrive at(3.14)where is the marginal local expectation of the mapping summary.

Identity allows us to eliminate summation over some internal node states in formula (3.14) and consider only those internal nodes that lie on the path connecting the root of *τ* and *c*(*b*^{*}). If *π* is the stationary distribution of {*X*_{t}}, then formula (3.14) together with identities and simplifies prior local expectations even further,(3.15)

The fact that prior local expectations at stationarity compute with virtually no additional burden has immediate practical implications. In the context of the posterior predictive model checking, researchers often need to simulate *L* independent and identically distributed (iid) realizations of data at the tips of *τ* and then calculate the summary-based discrepancy measure (Nielsen 2002). Since we know the ‘true’ model under simulation, we can approximate this discrepancy measure with the prior expectation(3.16)where *D*^{*} ranges over all possible trait values at the tips of *τ*. In molecular evolution applications, *L* is of the order of 10^{3}–10^{5}, and hence approximation (3.16) should work well.

### (e) Comparison with simulation-based stochastic mapping

We comment earlier that the Monte Carlo algorithm for stochastic mapping is an alternative and a very popular way to compute expectations of mapping summaries. This algorithm consists of two major steps. In the first step, the tree is traversed once to compute *F*_{u} for each node. In the second step, internal node states ** i** are simulated conditional on

**(Pagel 1999). Then, conditional on**

*D***, one simulates CTMC trajectories on each branch of the tree and computes summaries of interest. This second step is repeated**

*i**N*times producing a Monte Carlo sample of mapping summaries whose averages approximate the branch-specific expectations.

The running time of the algorithm depends on *N* and the computational efficiency of generating CTMC trajectories. The number of samples *N*, required for accurate estimation, varies with ** Q** and

**. Unfortunately, this aspect of stochastic mapping is largely ignored in the literature and by practitioners. Although more than one way exists to simulate trajectories conditional on starting and ending states (Rodrigue**

*T**et al*. 2008), rejection sampling is the most common (Nielsen 2002). Assessing the efficiency of rejection sampling is complicated, because the efficiency depends not only on the choice of CTMC parameters, but also on the observed data patterns

**. We illustrate these difficulties using a CTMC on a state space of 64 codon triplets. We take two sites from an alignment of 129 HIV sequences that we discuss later in one of our examples. We call the first site slow evolving as it obtains only three different codons. The second, fast evolving site emits nine different codons. We take one-parameter slice of our Markov chain Monte Carlo (MCMC) sample and run rejection sampling to estimate the expected number of mutations for all branches of**

*D**τ*. For each trial, we record the total number of rejections on all branches of

*τ*and the absolute errors of the estimated mean number of synonymous mutations summed over all branches. We summarize the results of this experiment in table 1. We see a 400× increase in the number of rejections required to simulate trajectories conditional on the fast site compared to equivalent simulations based on the slow site. The Monte Carlo error decreases as the number of Monte Carlo iterations increases, but not as fast as one would hope.

In summary, simulation-based stochastic mapping requires simulation of CTMC trajectories; this is not a trivial computational task. Assessing accuracy of methods is cumbersome and difficult to automate. Our algorithm 1 replaces both simulation components from stochastic mapping calculations and therefore should be a preferred way of calculating expectations of mapping summaries.

## 4. Examples

### (a) Detecting coevolution via dwelling times

In this section, we reformulate a previously developed simulation-based method for detection of correlated evolution (Huelsenbeck *et al*. 2003) in terms of a Markov reward process. We consider two primate evolutionary traits, oestrus advertisement (EA) and multimale mating system (MS), analysed by Pagel & Meade (2006). These authors first use cytochrome *b* molecular sequences to estimate the posterior distribution of the phylogenetic relationship among 60 Old World monkeys and ape species. Using 500 MCMC samples from the posterior distribution of phylogenetic trees, Pagel and Meade run another MCMC chain, this time with a reversible jump component that explores a number of CTMC models involving EA and MS traits and assess the models' posterior probabilities to learn about EA/MS coevolution. The authors find support in favour of a hypothesis stating that EA presence correlates with MS presence. The trait data are shown in figure 2 together with a phylogenetic tree, randomly chosen from the posterior sample. While this is not the case for Pagel & Meade (2006), reversible jump MCMC for model selection can be difficult to implement, especially as the number of trait states grows. Methods that simply require data fitting under the null model are warranted. Consequentially, we revisit this dataset and apply posterior predictive model diagnostics to test the hypothesis of independent evolution between EA and MS traits.

Our null model assumes that EA and MS evolve independently as 2 two-state Markov chains , , where 0 and 1, respectively, stand for trait absence and presence. Let the infinitesimal generators of the EA and MS CTMCs be(4.1)We form a product Markov chain on the state space that keeps track of presence/absence of the two traits simultaneously assuming that they evolve independently. The generator of the product chain is obtained via the Kronecker sum (⊕),(4.2)The Kronecker sum representation extends to general finite state-space Markov chains and to an arbitrary number of independently evolving traits (Neuts 1995). Computationally, this representation is advantageous, because eigenvalues and eigenvectors of a potentially high-dimensional product chain generator derive analytically from eigenvalues and eigenvectors of low-dimensional individual generators (Laub 2004).

To test the independent evolution model fit via posterior predictive diagnostics, we need a discrepancy measure (Meng 1994). Following Huelsenbeck *et al*. (2003), we employ mean dwelling times to form a discrepancy measure. Let be the tree length of *τ*. We define the mean dwelling times and of traits and in state *i*, and the mean dwelling time *Z*_{ij} of the product chain *Y*_{t} in state (*i*, *j*) for *i*, *j*=0, 1. More formally, we set(4.3)where , , , , , and *D*^{(1)}, *D*^{(2)} are observations of the two traits on the tips of *τ*.

Using the dwelling times, we define ‘expected’ *ϵ*_{ij} and ‘observed’ *η*_{ij} fractions of time the two traits spend in states *i* and *j*,(4.4)We use quotation marks because both quantities are not observed. To quantify the deviation from the null hypothesis of independence seen in the data, we introduce the discrepancy measure,(4.5)This measure returns small values under the null and implicitly depends on *τ* and branch lengths ** T**. We account for this dependence and phylogenetic uncertainty by averaging our results over a finite sample from the posterior distribution of

*τ*and

**, obtained from the molecular sequence data.**

*T*We use the software package BayesTraits to accomplish this averaging and to produce a MCMC sample from the posterior distribution of *Q*_{EA} and *Q*_{MS}, assuming the null model of independent evolution of the two traits (Pagel *et al*. 2004). Each iteration sample from the output of BayesTraits consists of drawn from their posterior distribution. Given these model parameters, we generate a new dataset *D*^{rep} and compute the observed and predicted discrepancies for each iteration. We then compare their marginal distributions by plotting their corresponding histograms (figure 3). In figure 3, we also plot the observed against predicted discrepancies to display the correlation between these two random variables. The apparent disagreement between observed and predicted discrepancies is a manifestation of poor fit of the independent model of evolution. The observed discrepancy consistently exceeds the predicted quantity. To illustrate the performance of predictive diagnostics when the independent model fits data well, we simulate trait data under this model along one of the 500 *a posteriori* supported phylogenetic trees. Figure 3*c*,*d* depicts the result of the posterior model diagnostics applied to the simulated data. In contrast to the observed primate data, the simulated data do not exhibit disagreement between the observed and predicted discrepancies.

Disagreement between the observed and predicted discrepancies can be quantified using a tail probability, called a posterior predictive *p* value,(4.6)where the tail probability is taken over the posterior distribution of the independent model. In practice, given *N* MCMC iterations, one estimates posterior predictive *p* values via(4.7)where are the parameter values realized at iteration *g*, and *D*^{rep.g} is a dataset simulated using these parameter values. Following this recipe, we estimate ppp for the primate data and the artificial data. The disagreement between the ‘observed’ and ‘predicted’ discrepancies is reflected in a low ppp=0.0128. By contrast, the ppp=0.3139, for the simulated data, supports agreement between the observed and predicted distributions of *Δ*.

### (b) Mapping synonymous and non-synonymous mutations

In this section, we consider the important task of mapping synonymous and non-synonymous mutations onto branches of a phylogenetic tree. Our point of departure is a recent ambitious analysis of HIV intrahost evolution by Lemey *et al*. (2007), who use sequence data originally reported by Shankarappa *et al*. (1999). The authors attempt to estimate branch-specific synonymous and non-synonymous mutation rates and then project these measurements onto a time axis. This projection enables one to relate the time evolution of selection processes with clinical covariates. Lemey *et al*. (2007) find fitting codon models computationally prohibitive in this case. Instead, they first fit a DNA model to the intrahost HIV sequences, obtain a posterior sample of phylogenies with branch lengths, and then use these phylogenies to fit a codon model to the same DNA sequence alignment. Instead of fitting two different models to the data, we propose to use just a DNA model and exploit mapping summaries to predict synonymous and non-synonymous mutation rates.

Suppose we observe a multiple DNA sequence alignment of a protein-coding region with *L* sites and that all *C*=*L*/3 codons are aligned to each other such that the coding region starts at site 1 of ** D** (in other words, there is no frameshift). Following the codon partitioning model of Yang (1996), we assume that sites corresponding to all first codon positions, , evolve according to a standard HKY model with generator , where

*κ*

_{1}is the transition–transversion ratio and

*π*

_{1}is the stationary distribution, both just for the first codon position appropriately constrained. Similarly, we define CTMC generators and for the other two codon positions with independent parameters. Assuming that all

*L*nucleotide sites in

**evolve independently together with the three codon position HKY models induces a product Markov chain model on the space of codons , where codons are arranged in lexicographic order with respect to our nucleotide order , the generator of this product CTMC is(4.8)**

*D*With this Markov chain on the codon space, we define a labelling (*s*) that contains all possible pairs of codons that translate into the same amino acid. All other codon pairs are collected into a labelling set (*n*). Clearly, transitions between elements of (*s*) constitute synonymous mutations, and non-synonymous mutations are represented by transitions between elements of (*n*). In this manner, counting processes map synonymous and non-synonymous mutations onto specific branches of *τ*. We consider HIV sequences from patient 1 of Shankarappa *et al*. (1999) and approximate the posterior distribution of our DNA model parameters using MCMC sampling implemented in the software package BEAST (Drummond & Rambaut 2007). The serially sampled HIV sequences permit us to estimate the branch lengths ** T** in units of clock time, months in this case. For each saved MCMC sample, we compute branch-specific rates of synonymous and non-synonymous mutations,(4.9)where we denote data at codon site

*c*by . We also record the fraction of non-synonymous mutations. Similar to Lemey

*et al*. (2007), we summarize these measurements by projecting them on the time axis. To this end, we form a finite time grid and produce a density profile of the synonymous and non-synonymous rates, and of the non-synonymous mutation fractions for each time interval between grid points (figure 4). Both synonymous and non-synonymous rate density profiles are consistently bimodal across time. Interestingly, the modes also stay appreciably constant. The density profile of the non-synonymous mutation fractions is multimodal and fairly complex. There are a considerable number of branches that exhibit strong negative and positive selection. In general, when comparing branches that occur earlier in the tree to those that occur later, the non-synonymous mutation fraction has first a modest upward trend through time and then descends to lower values, consistent with other patterns of evolutionary diversity reported by Shankarappa

*et al*. (1999).

Intrigued by the multimodality observed in figure 4, we investigate this issue further. Lemey *et al*. (2007) consider several branch categories in their analysis, e.g. internal and terminal. We decide to test whether differences exist between selection forces acting on internal and terminal branches of *τ*. We define the fractions of non-synonymous mutations on internal and terminal branches as(4.10)and(4.11)We plot their posterior histograms in figure 5. These histograms do not overlap, suggesting different fractions of non-synonymous mutations for internal and external branches. To test this hypothesis more formally, we form a discrepancy measure(4.12)As in our previous example, we compare the observed discrepancy with the expected discrepancy , where *D*^{rep} is a multiple sequence alignment simulated under the codon partitioning model with parameters *Q*_{codon}, *τ* and ** T**. Evoking approximation (3.16) and recalling that our model assumes the same substitution rates for each branch of

*τ*, we deduce that(4.13)for all parameter values

*Q*_{codon},

**and**

*T**τ*and replicated data

*D*^{rep}. Plugging in our new discrepancy measures into equation (4.7), we find that ppp<0.001. Therefore, our posterior predictive test suggests that there is a significant heterogeneity in selective pressure among the branches of

*τ*.

## 5. Discussion

In this paper, we develop a computationally efficient framework for mapping evolutionary trajectories onto phylogenies. Although we aim to keep this mathematical framework fairly general, our main interest with evolutionary mappings lies in computing the mean number of labelled trait transitions and the mean evolutionary reward that depends linearly on the time a trait occupies each of its states. These two mapping summaries have been the most promising building blocks for constructing statistical tests.

We build upon our earlier work involving single branch calculations for Markov-induced counting processes (Minin & Suchard 2008). In our extension, we introduce single branch calculations for evolutionary reward processes and devise algorithms to extend single branch calculations to mapping expectations of counting and reward processes onto branches across an entire phylogeny. Our main result generalizes Felsenstein's pruning algorithm that forms the workhorse of modern phylogenetic computation. The generalized pruning algorithm warrants two comments about its efficiency for performing simulation-free stochastic mapping. First, when the infinitesimal rates are unknown, a traditionally slow component of phylogenetic inference is the eigen decomposition of their matrix. Fortunately, this decomposition finds immediate reuse in our algorithm to calculate posterior expectations of mappings. Second, the algorithm requires only two traversals of the phylogenetic tree, and is therefore at most two times slower than the standard likelihood evaluation algorithm. In practice, we find our algorithm approximately 1.5 times slower than the likelihood calculation. We achieve this advantage because during the second traversal *n* terminal branches are not visited. Finally, the Felsenstein's algorithm analogy yields a pulley principle for stochastic mapping and reduction in computation for prior expectations.

Our examples demonstrate how our novel algorithm facilitates phylogenetic exploratory analysis and hypothesis testing. First, we use simulation-free stochastic mapping of occupancy times to re-implement Huelsenbeck *et al*.'s (2003) posterior predictive test of independent evolution. In our second example, we attempt to recover synonymous and non-synonymous mutation rates without resorting to codon models and instead use an independent codon partitioning model. We overcome this gross model misspecification with stochastic mapping, find intriguing multimodality of synonymous and non-synonymous rates, and use a posterior predictive model check to test differences in selection pressures between terminal and internal branches. We stress that our predictions are only as good as the model we use. For example, the terminal/internal branch differences may be due to a general bad fit of our purposely misspecified model to the intrahost HIV data. However, we find this scenario unlikely in light of the recent demonstration of the excellent performance of codon partitioning models during analyses of protein coding regions (Shapiro *et al*. 2006).

Our examples illustrate importance of hypothesis testing in statistical phylogenetics. In recent years, it has become clear that an evolutionary analysis almost never ends with tree estimation. Importantly, phylogenetic inference enables evolutionary biologists to tackle scientific hypotheses, appropriately accounting for ancestry-induced correlation in observed trait values (Huelsenbeck *et al*. 2000; Pagel & Lutzoni 2002). Several authors demonstrate that mapping evolutionary histories onto inferred phylogenies provides a convenient and probabilistically grounded basis for designing statistically rigorous tests of evolutionary hypotheses (Nielsen 2002; Huelsenbeck *et al*. 2003; Dimmic *et al*. 2005). Unfortunately, this important statistical technique has been hampered by the high computational cost of stochastic mapping. Our general mathematical framework and fast algorithms should secure a central place for stochastic mapping in the statistical toolbox of evolutionary biologists.

## Acknowledgments

We would like to thank Philippe Lemey for sharing his HIV codon alignments. We are also grateful to Andrew Meade for his help on using the BayesTraits software and Ziheng Yang and two anonymous reviewers for their constructive criticism. M.A.S. is supported by an Alfred P. Sloan Research Fellowship, John Simon Guggenheim Memorial Fellowship and NIH R01 GM086887.

## Footnotes

One contribution of 17 to a Discussion Meeting Issue ‘Statistical and computational challenges in molecular phylogenetics and evolution’.

- © 2008 The Royal Society