## Abstract

Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular, the fossilized birth–death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized datasets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two datasets of 5.7% and 13.2%, respectively. The median relative standard error (RSD) was 9.2% and 7.2%, respectively, suggesting good precision, although with some outliers. In fact, in the two datasets we analyse, the phylogenetic estimate of fossil age is on average less than 2 Myr from the mid-point age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the ‘morphological clock', and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses.

This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.

## 1. Introduction

Contention between palaeontologists and molecular biologists over which data provides the most accurate inferences about evolutionary history has previously fostered an adversarial relationship between the two fields [1]. Although there has indeed been much controversy surrounding apparent discrepancies between palaeontological and molecular phylogenetic inferences [2], it is also clear that fossil and molecular data both produce broadly concordant views of evolutionary history [3]. The continual improvement of models and methods for statistical phylogenetic inference from molecular sequence data is well documented [4,5], and in recent years, it is arguably the case that molecular phylogenetics has taken primacy over the fossil record in providing a timescale for evolutionary history [1]. Nevertheless, molecular phylogenetic inference of evolutionary timescales relies critically on calibration by the fossil record [1].

Traditionally, the practice has been to use one or more fossils as ‘node calibrations' by associating their geologically derived age to a particular divergence in a molecular phylogeny. The age of the fossil is determined either by radiometric ageing of strata above and/or below the fossil, or more commonly by biostratigraphy. The difficulty lies in determining the appropriate ancestral divergence in the molecular phylogeny to associate the fossil with and the details of how this should be achieved within a full statistical inference framework [6–8]. Once achieved, node calibration confers age estimates to the remaining ancestral divergences in the phylogenetic tree by the assumption of a strict or relaxed molecular clock [9–13].

It may be less widely appreciated by molecular evolutionary biologists that the statistical phylogenetic revolution in molecular evolution has also been mirrored in the increasing application of statistical phylogenetic reasoning in macroevolutionary and systematic studies of the fossil record [14–17]. Here, we extend this tradition of applying phylogenetic reasoning to the fossil record by focusing on the question of what phylogenetic inference techniques can tell us about the age of a fossil, based solely on its morphological characteristics and through them, its phylogenetic and temporal relationships with a set of reference fossils.

The phylogenetic estimation of the age of a taxon based on its molecular sequence has been previously described [18,19] and applied to both ancient subfossil remains and rapidly evolving viral taxa. For example, this technique has been successfully employed to estimate the age of human subfossil remains based on an ancient mitochondrial genome sequence [20]. The same technique has also been used to estimate the age of viral samples based on molecular sequence data (e.g. [21]).

We extend this approach into the realm of discrete morphological evolution by presenting a statistical model of evolution that generates an expectation on the distribution of fossils, their morphological characters. This model has been previously presented in the context of divergence time dating [22–24].

In order to use discrete morphological comparative data to estimate fossil ages, it is necessary to assume a (relaxed) morphological clock. There is a long history of the study of the evolutionary rates of phenotypic characters [25–29], going at least back to Darwin's *Origin of Species* [30]. Darwin noted that ‘Species of different genera and classes have not changed at the same rate’ and illustrated this point with examples of ‘living fossils' such as the Silurian mollusc *Lingula* [30, p 313]. However in the same chapter Darwin goes on to say ‘In members of the same class the average amount of change, during long and equal periods of time, may, perhaps, be nearly the same’ (p 315). Nevertheless, phenotypic evolution has more typically been characterized as not evolving in a clock-like manner, especially when compared to molecular evolution [31]. While there are many examples of extremely slow and fast rates of phenotypic evolution in the literature, we would argue that this is also true for molecular rates. We are not aware of a comprehensive and systematic comparison of variation in evolutionary rates at the phenotypic and molecular levels. Regardless, for the datasets that we analyse, we adopt the point of view that variation in the rate of phenotypic evolution across the phylogeny can be accommodated with a relaxed morphological clock.

Our approach is distinct from alternative divergence time dating approaches in that it provides an explicit treatment of the temporal information contained in fossil remains, whether or not related molecular sequence data are available. This leads to an estimate of the age of the most recent common ancestor of a group of fossil and extant taxa. A key difference between this approach and earlier approaches to tip-calibrated ‘total-evidence’ dating [32] is the admission of a probability that each fossil taxon may represent a sampled ancestor of one or more taxa in the tree [22]. We exploit this framework to attempt the estimation of the age of individual fossils based solely on morphological data and their phylogenetic affinities to related taxa of known age. The method is applied to two rich and well-characterized morphological datasets: (i) 19 extant penguins and 36 fossil relatives [33,34] and (ii) a sample of nine extant canids and 116 fossil relatives [35].

## 2. Material and methods

Gavryushkina *et al.* [23] described a ‘total-evidence’ approach implemented in BEAST2 [36] for phylogenetic estimation of time-trees that employs both morphological data from fossils and extant taxa and molecular sequence data as equal partners under the rule of probability for estimating a time-tree. An equivalent method [24] is introduced within MrBayes [37]. The model of time-tree phylogeny employed is the so-called fossilized birth–death process [38], which forms a prior probability distribution on the space of sampled-ancestor trees [22,39].

We extend the approach in Gavryushkina *et al.* [23] further by investigating the consistency between the phylogenetic estimate of the age of a fossil and the corresponding fossil age range determined by geological and biostratigraphic evidence. This allows for the age of some of the fossils to be estimated solely based on their morphological characters and the phylogenetic affinities of their morphology to other fossils with known ages in the time-tree. We refer to this as the phylogenetic estimate of the fossil's age. In phylogenetically estimating the age of each of the fossils in turn, two questions can be answered: (i) how much information about an individual fossil's age is available from phylogenetic analysis of morphological data alone and (ii) what is the level of phylogenetic evidence in support of the palaeontological age range for a fossil? These two questions are investigated using two morphological datasets: one of 36 fossil penguins and their extant relatives [23,33,34] and one of 116 canid fossils and their extant relatives [35].

### (a) Phylogenetic estimates of the ages of penguin fossils

We used a dataset originally published by Ksepka *et al*. [34] consisting of morphological data from fossil and living penguin species. We used the same subset of the morphological data as Gavryushkina *et al.* [23], but we did not use the molecular sequence data from the living species. The morphological data matrix we used contains 36 fossil species, 19 extant species and 202 characters (ranging from binary to *k* = 7). The majority of these characters (more than 95%) have fewer than four states and 48 of the binary characters were encoded as presence/absence. The fossil age intervals had median values ranging from 5.55 to 61.05 Myr. As did Gavryushkina *et al.* [23], we treat 34 characters that were ordered in Ksepka *et al.* [34] as unordered. (See [23] for further details of data selection.)

For each of the 36 penguin fossils we performed a separate Bayesian phylogenetic analysis in which the focal fossil's palaeontological age constraints were replaced by the fossilized birth–death process prior, and thus we obtained a phylogenetic estimate of the fossil's age.

### (b) Phylogenetic estimates of the ages of canid fossils

The second dataset that we investigated was a morphological data matrix of 125 canid species (nine extant and 116 fossil; [35]) with 122 characters (ranging from binary to *k* = 5). The nine extant species represent about 25% of the extant canid species and include representatives of four genera (six *Canis*, one *Cuon*, one *Lycaon* and one *Urocyon*) and both tribes (eight Canini and one Vulpini). We had stratigraphic ranges based on palaeontological data for all 116 fossils (Graham Slater 2016, personal communications).

As with the penguin dataset, we performed an analysis for each of the 116 canid fossils in turn. Unlike the original study [35], we did not apply any other constraints or priors on ancestral divergence times beyond the ages of the fossils.

### (c) Phylogenetic analyses

Given the morphological data *D* and a stratigraphic age range for each fossil (with *l _{i}* being the lower age bound for fossil

*i*, and

*u*being the upper age bound for fossil

_{i}*i*), we sample phylogenetic trees with the fossils being tips or sampled ancestors, and each fossil having a specified age in the phylogenetic tree within its stratigraphic age range. The parameters of the fossilized birth–death model are summarized in

*η*, and the parameters of the model for morphological character evolution are summarized in

*θ*. More formally, we sample from where if each fossil age is within its stratigraphic age range specified in

*a*, and else. When we replaced the focal palaeontological age constraints of fossil

*i*by the fossilized birth–death process prior, we simply set

*l*= 0 and

_{i}*u*=

_{i}*T*(where 0 is present time and

*T*is the total height of the tree) to estimate the phylogenetic age of the focal fossil.

The fossilized birth–death model is defined by the following parameters *η* = (*T*, *d*, *r*, *s*): the time of the start of the process *T* prior to present time 0, the net diversification rate *d* (= speciation rate – extinction rate), the turnover *r* (=extinction rate/speciation rate) and the sampling probability with which a fossil is observed *s* (=sampling rate/(extinction rate + sampling rate)).

Following Gavryushkina *et al.* [23], we apply the Lewis-Mk model [40] for discrete morphological character evolution, which assumes a character can take *k* states, and the transition rates from one state to another are equal for all states.

We applied two phylogenetic models to the penguin dataset, Mk-1 and Mk-8, and we applied Mk-1 to the canid dataset. Mk-1 assumed a strict morphological clock (*μ*) and no gamma-distributed rate heterogeneity among sites [40]. Model Mk-8 [23] partitioned the alignment into partitions (six for the penguins), with the *i*th partition containing all characters that had *k* = *i* + 1 character states across the sampled taxa. Model Mk-8 also uses an uncorrelated lognormally distributed relaxed molecular clock [7] with parameters *μ* and *S* for the mean rate and log standard deviation of the rates, and an additional parameter *α* governing the shape for gamma-distributed rate variation across sites [41]. The prior distribution for *α* was uniform in the interval (0, 10).

We followed Yang & Rannala [11] in having a broad lognormal (*M* = −5.5, *S* = 2) prior on *μ* for all analyses and a gamma (*α* = 0.5396, *β* = 0.3819) prior on *S* for the relaxed clock analyses. For the penguin analyses, the parameters of the fossilized birth–death model tree prior were specified as described in the section ‘Computing the phylogenetic evidence for an age range’. Since we did not perform Bayes factor (BF) analyses for the canid dataset, we used the standard parametrization *η* = (*T*, *d*, *r*, *s*), with the following priors: uniform prior in the interval (0, 120) million years for origin *T*, lognormal (*M* = −3.5, *S* = 1.5) prior for diversification rate *d*, and unit uniform prior (0, 1) for turnover *r* and sampling proportion *s*.

### (d) Computing the phylogenetic evidence for an age range

The BF computes the evidence for one hypothesis (*H*_{1}) over another (*H*_{2}) as the ratio of the marginal probability of the data under each of the two hypotheses and a model *M*
2.1

We are interested in computing the BF that quantifies the amount of phylogenetic evidence in support of the palaeontological age range for each fossil. In this case, *H*_{1} is the hypothesis that the true fossil age is within the given palaeontological age range, and *H*_{2} is the alternative hypothesis that the true fossil age is outside the palaeontological range. A indicates strong support for *H*_{1}, given the model *M* is appropriate for the considered data.

The model *M* consists of two parts, . The model specifies the tree generation process giving rise to the number of observed samples and sampling times. The model *M*_{m} specifies the morphological evolution along the tree giving rise to the morphological characters for the samples. The data are the number of samples together with the sampling times () and the morphological characters for each sample (*D*_{m}).

For calculating the BF, the probabilities *p*(*H*_{1}|*D*, *M*) and *p*(*H*_{2}|*D*, *M*) are obtained directly from the Markov chain Monte Carlo (MCMC) output. It remains to calculate the probabilities *p*(*H*_{1}|*M*) and *p*(*H*_{2}|*M*). Since *H*_{1} and *H*_{2} are independent of *M*_{m}, we have

One way to determine would be to simulate trees under the model and record the fraction of sampling times within a given palaeontological age range. However, such a simulation approach turns out to be very time-consuming, and the procedure below provides a much faster evaluation of .

We derive some analytic results for evaluating . The model is the fossilized birth–death process with priors on its parameters *η* = (*T*, *d*, *r*, *s*). We derive the probability density of sampling a fossil at time *t* in the past, given the model . This probability density will allow us to directly determine .

For a given *T*, *d*, *r* and *s*, the probability density of sampling a fossil at time *t*, given the process does not go extinct for time *T*, is
with being the sampling rate, and *p _{i}*(

*t*;

*d*,

*r*) being the probability of a single lineage producing

*i*surviving lineages at time

*t*. The equation above calculates the required probability and the left term in that probability conditions on survival of the process (1 −

*p*

_{0}(

*T*;

*d*,

*r*)). Then we calculate the probability to have

*k*lineages at time

*t*before the present (

*p*(

_{k}*T*−

*t*;

*d*,

*r*)), multiply by the sampling rate

*kψ*, and weight by the probability that at least one lineage of the

*k*lineages survives to the present (1 −

*p*

_{0}(

*t*;

*d*,

*r*)

*). This expression is then summed over*

^{k}*k*= 1, … , ∞.

We simplify, using the equations for *p _{i}*(

*t*;

*d*,

*r*) given in Kendall [42], to obtain, with .

Next, we need to evaluate with *p*(*T*), *p*(*d*), *p*(*r*) and *p*(*s*) being the prior distributions for the parameters. This is done by sampling parameters from the prior distributions and then evaluating *p*(*t*|*T*, *d*, *r*, *s*).

We determined for the prior distributions as in Gavryushkina *et al*. [23],

This prior specification leads to a distribution of sampling time with almost all probability mass close to the present (figure 1, dashed-dotted line). Thus, is essentially zero, which leads to a huge BF. This means we always reject *H*_{2}, not because we necessarily agree with the palaeontological age range, but because our model has no prior weight for the palaeontological age range.

Inspection of our prior identifies two problems: (i) if we draw a large *T* and large *d*, we obtain very large trees with arbitrarily many species close to the present, thus we have most of the sampling times close to the present; and (ii) if we draw *r* and *s* close to 1, then we obtain a very large per-lineage sampling rate . Thus, these parameter combinations govern the probability density curve and cause again most prior weight to be close to the present.

We therefore assumed new prior distributions. The net diversification rate *d* : lognormal(*M* = −3.5, S = 0.5) was chosen with a smaller standard deviation which avoids too much weight on very fast growing trees. The turnover *r* : Uniform (0, 1) was set as before.

For *s*, we assume an implicit prior: we assume lognormal(−2, 1) for *ψ*, and
(with extinction rate *μ* = *rd*/(1 − *r*)). This avoids very high sampling rates.

For *T*, we also assume an implicit prior. We assume a uniform distribution on [1, 100] for the number of present day species, *N*. In expectation, we have species after time *T*. This leads to
Overall, this prior produces a sampling time distribution where old sampling times have a non-negligible weight (figure 1, solid line). The choice of an implicit prior for both *T* and *s* was important: only specifying the implicit prior on *T* yields the dashed line in figure 1, while only specifying the implicit prior on *s* yields the dotted line in figure 1. We used this new prior for our analyses and the BF calculation.

Changing to our new prior has immense impact on the BF analysis, but in our case has a minor effect on the posterior distribution of trees/parameters compared to using the prior in Gavryushkina *et al*. [23]. This investigation of the prior distribution on trees and sampling times highlights that whenever using BFs to test a hypothesis, we have to first investigate what our prior on the hypothesis is. In our example, the prior from Gavryushkina *et al*. [23] seemed reasonable for the parameters specified, however, this prior puts a negligible weight on hypothesis *H*_{1} for older fossils.

We want to note that the stepping-stone sampling approach [43] to calculate BFs would not have been directly applicable in our case. In stepping stone, sampling the is treated as part of the model, not part of the data. However, using a birth–death model, the sampling times are part of the data. The approach is valid when choosing a coalescent tree prior, as in that case sampling times are conditioned upon (and thus can be seen as part of the model assumptions) rather than being modelled (and thus are a realization of the model which means they are data). It is not clear if the stepping stone approach can be directly applied for models with number of tips being part of the data. In general, even if stepping stone approaches are appropriate, we recommend inspection of *P*(*H*_{1}|*M*) to ensure that the prior on the hypothesis to be tested is sensible. Such an investigation reveals if the cause of a high (or low) BF is due to the prior or due to signal in the data.

## 3. Results

### (a) Penguins conform well to a morphological clock

Although Mk-1 is a very simple model, the phylogenetic estimates of the ages of the penguin fossils were remarkably consistent with their palaeontological age ranges. Figure 2*a* plots the geological age and range against the phylogenetic estimates of fossil age. The points in this plot have *R*^{2} = 0.903. The median error (where the *error* is the difference between the phylogenetic median and the geological median) is 1.96 Myr. The median relative error (where the *relative error* is the *error* divided by the geological median) was 5.7% and the median relative standard deviation (RSD; defined as the standard deviation of the marginal posterior divided by the posterior median estimate) was 9.2%. A summary of the individual estimates are tabulated in table 1.

As judged by BFs, only one fossil exhibited strong evidence (i.e. log BF <− 3.0) that the phylogenetic estimate of fossil age was inconsistent with the geological age range. The log BF for *Paraptenodytes antarcticus* was −3.4. In fact, the majority of the fossils (23/36 = 64%) had strong positive evidence for the geological age range (i.e. log BF > 3.0). Likewise, if we consider only the posterior probability that the fossil is in the geological age range, then three of the 36 fossils have a posterior probability less than 0.05, suggesting low posterior support for the phylogenetic estimate of fossil age being within the palaeontological age range. These three fossils were *Madrynornis mirandus*, *Paraptenodytes antarcticus* and *Sphenicus muizoni*, with posterior probabilities that the phylogenetic estimate of fossil age is in the palaeontological range of 0.007, 0.001 and 0.001, respectively. All other fossils have posterior probabilities of more than 0.05 of their age being in the palaeontological range. It is worth noting that the absolute discrepancy in the ages is still quite moderate for the three fossils with low posterior probabilities, with *M. mirandus*: 6.3 Myr versus 10 Myr (phylogenetic estimate of fossil age versus palaeontological age), *P. antarcticus*: 29.9 versus 22, and *S. muizoni*: 5.2 versus 9.1. The small posterior probabilities are partially caused in these cases because the corresponding palaeontological age range is narrow, apparently suggesting very precise geological knowledge of the ages of these three fossils.

### (b) Relaxing the clock, site partitions, rate variation among sites

Mk-8 was the best-fitting model for the penguin dataset according to the analysis of Gavryushkina *et al.* [23]. As with Mk-1, this model produced phylogenetic estimates of fossil age that were very concordant with the geological age ranges of the fossils (figure 2*b*), with an overall *R*^{2} = 0.924. The median error was 2.05 Myr across all 36 fossils. In this analysis, none of the fossils exhibited any evidence (i.e. log BF < 0.0) that the phylogenetic estimate of fossil age was inconsistent with the geological age range. However, if we consider the posterior probability that the fossil is in the geological age range, then five of the 36 fossils had a posterior probability less than 0.05 for Mk-8, suggesting low posterior support for the phylogenetic estimate of fossil age being within the palaeontological age range. These five fossils were *Madrynornis mirandus*, *Paraptenodytes antarcticus*, *Perudyptes devriesi*, *Sphenicus muizoni* and *Waimanu manneringi*, with posterior probabilities that the phylogenetic estimate of fossil age is in the palaeontological range of 0.035, 0.018, 0.046, 0.004 and 0.037, respectively. All other fossils have posterior probabilities of more than 0.05 of their age being in the palaeontological range. Again the absolute discrepancy in the ages are quite moderate for the five fossils with low posterior probabilities, with *M. mirandus*: 6.7 versus 10 Myr (phylogenetic estimate of fossil age versus palaeontological age), *P. antarcticus*: 28.0 versus 22, *P. devriesi*: 49.0 versus 40, *S. muizoni*: 5.1 versus 9.1 and *W. manneringi*: 56.7 versus 61.05. A summary of all the individual estimates are tabulated in table 2. The individual marginal posterior distributions of phylogenetic estimates of fossil age under Mk-8 and the corresponding geological range are shown in figures 3 and 4.

### (c) Comparison of simple and complex model results

Overall the results of analysing the penguin dataset with the Mk-1 and Mk-8 models were strikingly concordant. Figure 5 shows four regressions between the two models: (i) regression of estimated phylogenetic estimates of fossil ages of Mk-1 against Mk-8, (ii) regression of the error in the phylogenetic estimates of fossil ages of Mk-1 against Mk-8, (iii) regression of posterior probability of palaeontological range of Mk-1 against Mk-8 and (iv) regression of BF for palaeontological range of Mk-1 against Mk-8. Under Mk-8, all fossils have positive evidence for their geological age range, whereas under Mk-1 there are a handful of fossils with negative evidence for the corresponding geological age range. Furthermore, assuming the median geological age is the truth, the variance in the phylogenetic estimation error of the fossil ages is larger under Mk-1 than under Mk-8. This evidence, along with the previous result that Mk-8 has a higher marginal likelihood than Mk-1 [23], suggests that the relaxed model is overall a better fit to the data. Under both models, there is a positive correlation between the precision of the age estimate and the number of non-ambiguous characters coded for the fossil taxon (figure 6).

### (d) Canids conform well to a morphological clock

The canid dataset shows remarkable consistency between stratigraphic age ranges and phylogenetic estimates of fossil ages, even with the simple strict morphological clock model (Mk-1). The *R*^{2} is 0.897 between the phylogenetic and stratigraphic age ranges (figure 7). Only 13 out of 116 fossils (11%) do not have the mean stratigraphic age in the 95% credible interval of the phylogenetic estimate of fossil age and there are no extreme outliers. The median error is 1.56 Myr, which in absolute terms is more accurate than the age estimates for the penguin dataset. However, the median relative error was 13.2%, more than twice that for the penguin fossils.

This dataset contains half as many morphological characters as does the penguin dataset (122 versus 245); nevertheless, the individual age estimates are much more precise in absolute terms (median HPD range = 4.2 Myr for canids as opposed to 9.6 Myr for penguins). However, this is mainly due to the fact that the average age of the penguin fossils is considerably larger and the median relative precision (i.e. RSD) was 7.2%, only slightly better than the value for the penguin fossils of 9.2%.

Figure 8 shows a sample from the posterior distribution of the analysis of the canid dataset. The tree has three main clades: one clade with extant representatives and two extinct clades (*Hesperocyoninae* and *Borophaginae*).

Table 3 shows that the rate of morphological evolution in canids is faster than that estimated in the penguins; however, this could be a simple reflection of the shorter geological time scale (and shorter average branch lengths) over which the rate has been estimated [29].

## 4. Discussion

In this paper, we have demonstrated that even a small number of morphological characters (some of the fossils had as few as seven morphological traits coded) can be used in the context of a rich fossil reference dataset to provide an accurate age of the fossil based on a phylogenetic model. In all cases, we used the new fossilized birth–death tree prior, which is a crucial ingredient in allowing for the estimating of fossil ages under a birth–death tree prior.

We found that although a strict morphological clock does a surprisingly good job of estimating fossil ages, there is evidence that phylogenetic estimation of penguin fossil ages is improved by a model that includes a variation in rates of morphological evolution among lineages. However, in the penguin dataset the variation in evolutionary rates was not too extreme and the estimated log standard deviation of the relaxed morphological clock (*S* = 0.69; refer to table 3) is comparable to values obtained for molecular clocks. The median error in age estimates for the two datasets investigated were 2 Myr and 1.6 Myr, respectively, using either a very simple or more complex models of discrete morphological change.

In absolute terms, the fossil estimates were both slightly more accurate and more precise on average in the canid dataset. One might think that the larger reference set of fossils in the canid dataset (115 versus 35) makes up for the smaller number of characters (122 versus 245) with regards to accuracy and precision of fossil age estimates. However, since the average age of the canid fossils is considerably younger than that for the penguin fossils, a more appropriate comparison uses relative error and relative precision. By these measures, the penguin dataset actually provides the more accurate estimates, whereas relative precision is overall slightly better for the canid dataset. Future work is needed to investigate in a more systematic manner how the amount of morphological data available for a new fossil and the number of related reference fossils of known age affect the accuracy and precision of the phylogenetic estimate of a fossil's age.

Another difference between the two datasets analysed here is that the penguin fossils were largely single specimens, or at least single localities, so that the age range specified for the fossil represents uncertainty in the geological age of the horizon the fossil was associated with (for example, uncertainty in radiometric dates from the volcanic layers above or below the fossil-carrying horizon and uncertainty about the age difference between the volcanic layers and the horizon the fossil is in). On the other hand, most of the canid species were assigned stratigraphic age ranges based on multiple specimens from multiple localities spanning a substantial time range. For example, there are thousands of specimens of *Hesperocyon gregarious* from multiple sites in North America spanning more than 5 Myr (Graham J. Slater, personal communication). In this context, it is interesting to note the canids that fall off the *x* = *y* line in figure 7 are mostly (but not exclusively) taxa represented by singletons and therefore those with relatively short stratigraphic ranges. This raises the question of whether multiple specimens of a single species that span a significant time frame and/or different localities should be coded as separate taxa as input for the fossilized birth–death method. Even if not coded as separate taxa it may be possible to extend the method used here to explicitly account for multiple specimens and associated ages when a fossil species is represented by more than one fossil. We leave these considerations for future work.

There are diverse potential applications for this methodology. The most obvious is the estimation of the date of a fossil that is temporally unconstrained, due to poor knowledge of the age of the sediments in which it was found, or because the fossil was not associated with a horizon of known age (e.g. [44]) or because of a complete lack of provenance data (e.g. a recent fossil described as a ‘four-legged snake’ has excited controversy for a lack of provenance; http://news.sciencemag.org/paleontology/2015/07/four-legged-snake-fossil-stuns-scientists-and-ignites-controversy). It can also be used as a way of testing the ‘morphological clock’ and to discover potential problems in the data by identifying outlier fossils with respect to model fit. Overall, we anticipate that this approach will help to promote the application of a consistent probabilistic framework to consider both molecular and fossil evidence. Our results are encouraging in suggesting that the statistical models presented are adequate for inference of phylogenetic time-trees from morphological fossil data.

## Data accessibility

All BEAST2 xml input files and R analysis scripts required to reproduce the results in this paper are available at https://github.com/alexeid/fossilDating.

## Competing interests

We declare we have no competing interests.

## Funding

A.J.D. and T.S. were partially funded by Marsden grant UOA1324 from the Royal Society of New Zealand (http://www.royalsociety.org.nz/programme?s/funds/marsden/awards/2013-awards/). A.J.D. was funded by a Rutherford Discovery Fellowship from the Royal Society of New Zealand (http://www.royalsociety.org.nz). T.S. was supported in part by the European Research Council under the 7th Framework Programme of the European Commission (PhyPD; grant agreement 335529). A.J.D. and T.S. also thank ETH Zürich for funding.

## Acknowledgements

The authors would like to thank Daniel Ksepka and Graeme T. Lloyd for insightful comments and suggestions on early drafts of this manuscript. We are also very grateful to Graham J. Slater for comments on the phylogenetic analysis of canids and for providing stratigraphic age ranges for the canid fossils. We thank Mario dos Reis and one anonymous reviewer for very helpful comments during revision.

## Footnotes

One contribution of 15 to a discussion meeting issue ‘Dating species divergences using rocks and clocks’.

- Accepted May 3, 2016.

- © 2016 The Authors.

Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.