## Abstract

One of the major challenges in cultural evolution is to understand why and how various forms of social learning are used in human populations, both now and in the past. To date, much of the theoretical work on social learning has been done in isolation of data, and consequently many insights focus on revealing the learning processes or the distributions of cultural variants that are expected to have evolved in human populations. In population genetics, recent methodological advances have allowed a greater understanding of the explicit demographic and/or selection mechanisms that underlie observed allele frequency distributions across the globe, and their change through time. In particular, generative frameworks—often using coalescent-based simulation coupled with approximate Bayesian computation (ABC)—have provided robust inferences on the human past, with no reliance on *a priori* assumptions of equilibrium. Here, we demonstrate the applicability and utility of generative inference approaches to the field of cultural evolution. The framework advocated here uses observed population-level frequency data directly to establish the likely presence or absence of particular hypothesized learning strategies. In this context, we discuss the problem of equifinality and argue that, in the light of sparse cultural data and the multiplicity of possible social learning processes, the exclusion of those processes inconsistent with the observed data might be the most instructive outcome. Finally, we summarize the findings of generative inference approaches applied to a number of case studies.

This article is part of the theme issue ‘Bridging cultural gaps: interdisciplinary studies in human cultural evolution’.

## 1. Introduction

Understanding how human populations acquire, and use, social information is one of the central challenges of cultural evolution and the focus of a highly active, interdisciplinary debate [1]. Social learning, or cultural transmission, is defined as learning that is facilitated by observations of, or interactions with, another individual or their cultural products [2,3]. It supports the spread of adaptive information, accumulated over generations [4–6], yet also bears the risk of transmitting outdated, misleading or inappropriate information, especially in changing environmental conditions [7]. But there is no unique way in which social information can be acquired; in fact, a large number of social learning processes have been identified in the literature (e.g. [5,6,8]). Research aimed at identifying learning processes in human populations can be roughly divided into two groups: experimental laboratory-based and theoretical modelling-based approaches. Laboratory-based experiments, in particular ‘microsocieties’ (e.g. [9–12]) and diffusion chain experiments (e.g. [13–16]), have focused on uncovering the variety and subtlety of human social learning strategies, providing a powerful framework for studying cultural evolution empirically (see [1,17] for comprehensive review).

In the following, we focus on theoretical modelling-based approaches. These evolutionary models of learning have mainly focused on understanding which individual and social learning strategies are expected to have to evolved in spatially and temporally changing environments (see [18] for a comprehensive review of this literature). These models provided an elegant characterization of the long-term outcomes of evolution through natural selection, as well as their associated evolutionary trajectories, and therefore produce predictions of which learning processes are expected to be present in the population.

However, in order to verify those predictions, social learning processes would need to be observed directly so that fine-grained individual-level data detailing who learns from whom can be generated. But outside of controlled experimental conditions, large longitudinal datasets of this kind are difficult to obtain, especially in historical or anthropological contexts (e.g. [19], but see [20,21] for two cultural evolutionary case studies and [22] for a research program dedicated to addressing this issue). This is not to say that no such data exist, but in many case studies of interest the available data are in the form of frequencies of different variants of a cultural trait in the population at one or several points in time. While many modern datasets possess a rich temporal resolution (e.g. those describing the choice of first names in modern populations, which record the number of instances of a specific name each year), prehistoric or anthropological datasets—the focus of this paper—typically describe the frequencies of different cultural variants in sparse samples from the whole population.

So if we want to infer social learning processes from available data, we face a classical inverse problem: we can only observe aggregated, population-level frequency data but aim at identifying the underlying individual-level learning processes that gave rise to them. Recent approaches to address this inverse problem have focused on, among other things: the shape of adoption curves (e.g. [4,23,24]); the comparison between observed levels of cultural diversity or cultural accumulation and the ones expected under various processes of social learning (in particular unbiased transmission (or neutral evolution) (e.g. [25–28])); the shape of rank-abundance distributions (e.g. [29–31]); and the comparison between observed turnover rates and the ones expected under unbiased transmission (e.g. [32,33]) or phylogenetic analyses (e.g. [34]). This research has clearly shown that robustly inferring the underlying processes of social learning from population-level frequency data becomes a challenging task, especially in the light of equifinality, i.e. in situations where various learning processes can result in very similar population-level characteristics (e.g. [35,36]).

However, the inverse problem described above is of course not unique to cultural evolution. In fact, other scientific fields have successfully overcome similar challenges, in particular population genetics, which aims to understand the evolutionary mechanisms that produced the allele frequency distributions observed both now and in the past. Here, recent developments have provided elegant means for building complex evolutionary models, and allowing the application of efficient generative inference frameworks, which made possible the statistical testing of increasingly realistic demographic hypotheses. In general, the generative approach proceeds by building a fully specified probabilistic model, in which the hypothesized causal mechanisms are explicitly defined. This model is then used to repeatedly simulate pseudo-datasets under known parameter values, such that their expected distribution can be statistically compared with observed data, through techniques such as approximate Bayesian computation (ABC). This comparison allows certain hypothesized mechanisms to be rejected as inconsistent with the empirical data, and the estimation of model parameters that provide the best fit.

Our goal in this paper is to demonstrate how the generative inference approach can help answer the question of how human populations use social information, based on observable empirical evidence. We note that the general idea of generative modelling has already been applied to socio-cultural evolution. Significant early examples include Schelling's segregation model [37] and the influential agent-based economic modelling framework of *Sugarscape* [38]. These approaches and subsequent generalizations (see e.g. [39]) investigated the effects of explicitly defined individual-level causal mechanisms on population-level outcomes, which could then be compared with observed data. While one of the major advantages of this line of work is that the complex nature of the models considered allowed for more realistic expected outcomes, the principal limitation has been the lack of a robust statistical methodology capable of comparing these outcomes to empirical data. However, careful application of techniques like ABC, as mentioned above, is beginning to remove this limitation to inference (e.g. [40]).

We believe that the generative inference approach reviewed in this paper may link theoretical and empirical work in cultural evolution closer together by providing a framework that is able to evaluate the consistency between different individual-level processes and observed population-level patterns; in our case, between different processes of social learning and observed patterns of cultural change. Similarly to population genetics, such an inference framework consists of building a generative model that establishes a causal link between individual-level learning processes and observable population-level frequency data that then are evaluated for statistical consistency. The outcome of this approach is not only the identification of the most likely underlying learning process given the empirical data but a description of the breadth of processes that could have produced these data equally well, which in turn can be interpreted as an informal measure of the level of equifinality. Additionally, the inference framework may provide insight into the temporal and/or spatial resolution of the population-level frequency data that are needed to reliably distinguish between different processes of social learning (see [41] for a related discussion). In §1a, we briefly review some of the relevant key developments in population genetics, before exploring the applicability of the generative inference approach to cultural frequency data in §2.

### (a) Population genetics and generative inference

Classical population genetics—with its *prospective* approach [42]—provided many important theoretical insights into how the processes of mutation, drift, selection, migration and demographic change may shape the genetic variation expected in a population at equilibrium (e.g. [43–45]). But the development of coalescent theory in the early 1980s [46] (see also [47–49]) offered an alternative *retrospective* view of genetic evolution, providing a statistical model for the genealogical relationships between just a sample of individuals rather than the entire population. One major advantage of this coalescent framework is that, given an explicit model of demographic history and a mutation model, it allows for very efficient simulation of genetic—or genomic-scale—data for an observed sample with no *a priori* assumption of equilibrium. This has proved very useful in inferring population history, and while there is a wide array of other methodological approaches (e.g. [50–55]), the generative approach—in which simulated genetic data are statistically compared to the observed data—is growing in popularity, with the models of demographic history becoming increasingly complex and realistic (e.g. [56–58]).

However, generative inference crucially relies on the ability to make an evaluation of the quality of the model used. Rather than simply rejecting those demographic models or hypotheses that generate genetic variation inconsistent with what we observe (as in [59,60]), there exists a large and growing body of statistical techniques that allow for the explicit comparison of competing scenarios and the estimation of their underlying parameters. One such approach, ABC [61,62], was developed by statistical and population geneticists to circumvent the difficulty, or impossibility, of specifying the likelihood functions for complex models. ABC relies on repeatedly simulating pseudo-data under an explicitly specified model and, by retaining just those parameter values that generate data ‘close’ to the observed data, allows estimation of their posterior distributions (full details are given in §2b). A number of researchers have used this pairing of coalescent-based simulation and ABC to answer diverse questions about human demographic history, from early population differentiation in sub-Saharan Africa [63], to the global expansion of modern humans during the Late Pleistocene [58], to hunter–gatherer population replacement in Europe [64] and the initial colonization of the Americas [57] at the end of the last Ice Age.

## 2. Generative inference for cultural evolution

In the following, we demonstrate how generative inference procedures can be constructed and used to infer social learning processes from cultural data in the form of time-series detailing the usage or occurrence frequencies of different cultural variants. Similar to the population genetic applications, the inference procedure consists of two steps. First, we develop a non-equilibrium generative model capturing the main cultural and demographic dynamics of the considered system. This model describes the frequency evolution of different cultural variants present in a population at given time points under an assumed social learning hypothesis. Second, ABC techniques are used to derive conclusions about which (mixtures of) learning strategies are consistent with the observable frequency data and which are not. The aim of this framework is to allow researchers to ‘reverse engineer’, which learning strategies are likely to have been used in current or past populations, given knowledge of how frequencies have changed over time, independent of optimality or equilibrium assumptions. Figure 1 summarizes the steps of the generative inference framework described in this section.

We stress that this particular inference framework is designed to analyse the temporal dynamic of cultural change, defined as the change in frequency of different variants of cultural traits. If the observed data are of a different nature, e.g. describing the continuous variation of certain attributes of cultural artefacts, such as the dimensions of an arrowhead, then researchers have to first construct a hypothesis about the relationship between temporal variation of the attribute and the social learning processes considered in order to apply a similar inference procedure.

In §2a,b we describe the two steps of the inference framework and discuss in §2c the theoretical limits to inference; specifically, we ask how much information about underlying social learning processes we should expect to infer from population-level frequency data of a given temporal resolution. In §2d(ii) we show how the generative approach has been applied to cultural case studies. Lastly, in §2e we discuss some issues researchers should consider before applying the proposed, or a similar, inference framework.

### (a) Generative model

As mentioned above, the generative model aims at capturing the main cultural and demographic dynamics of the cultural system. Importantly, the generative model has to produce pseudo-data—in our case, population-level frequencies of different variants of a cultural trait at different points in time conditioned on the assumed social learning process—so that theoretical predictions can be compared to empirical observations. Thereby different learning processes are expressed by different model parameterizations; the model parameters are denoted by *θ* = (*θ*_{1}, …, *θ*_{k}) in the following. In other words, the generative model establishes an explicit causal relationship between the assumed processes of social learning defined by *θ* and observable population-level patterns of cultural change.

We note that there are no restrictions on the type of generative model used. Models ranging from systems of partial differential equations to agent-based simulations have also been used successfully; in fact, a number of the models mentioned in §1 could, with an appropriate choice of generative model, feasibly be adapted for use within this inference framework. As we want to generate frequency data at different time points, we advocate the use of non-equilibrium models, which can also account for temporal changes in demographic properties of the cultural system (e.g. variations in the total size of the population of cultural variants). This modelling choice aims at reducing the risk of misinterpreting non-equilibrium dynamics as evidence for the presence or absence of particular social learning processes (see [65] for a detailed discussion). For instance, the rejection of the hypothesis of neutral cultural evolution, based on empirical data, has usually been interpreted as evidence for the existence of selective biases in the population. But it has been pointed out that such a rejection can also be indicative of non-equilibrium dynamics or simply violations of the inherent assumptions of the neutral model (e.g. [28,66]). We note, however, that the relaxing of the equilibrium assumption requires accurate knowledge about e.g. the time points at which the observed frequencies are recorded. We return to this issue in §2e.

### (b) Statistical inference

To infer which learning strategies are consistent with the observed data we would ideally determine the likelihood function of the generative model. However, in many cases (if not most in reality) the likelihood functions cannot be determined easily. As introduced in §1a, ABC [61,62] was developed to circumvent this difficulty. Given observed data *D*, this likelihood-free approach directly approximates the joint posterior density of the model parameters *P*(*θ* | *D*). It does this through repeatedly simulating data *D*^{⋆} under a generative model with parameter values drawn from their prior distributions *P*(*θ*). These prior distributions describe the possible values that the parameter can assume or summarize all prior knowledge researchers may have. Retaining those parameter sets that generate data sufficiently ‘close’ to the observed data *D*, and rejecting the rest, results in a random sample from the distribution *P*(*θ*|*d*(*D*, *D*^{⋆}) ≤ *ɛ*), where *d*( · , · ) is a distance metric between the observed and simulated data, and *ɛ* is a tolerance level determining the approximation to the true posterior *P*(*θ* | *D*). Modal values and credible intervals for each model parameter can then be obtained from this approximate joint posterior.

Due to the high-dimensionality of most real-world datasets, the data *D* are often reduced to a summary statistic (or a set of summary statistics) *S*, so that we are really sampling from *P*(*θ* | *d*(*S*, *S*^{⋆}) ≤ *ɛ*) to approximate the posterior *P*(*θ* | *S*). The choice of appropriate summary statistics to maximize sufficiency (i.e. such that ) is not straightforward, and is an active area of statistical research (e.g. [67] and see also §2e). There have been many extensions to this initial basic—and inefficient—rejection algorithm, including weighting the retained parameter sets dependent on their exact distances *d*( · , · ) through regression methods (e.g. [61,68]) or increasing the efficiency of sampling from the prior distributions (e.g. [69,70]).

The output of any ABC procedure is the joint posterior distribution of the model parameters *θ* = (*θ*_{1}, …, *θ*_{k}) (and derived from that the marginal posterior distributions), indicating the range of the parameter space that is able to produce frequency data within a given tolerance level *ɛ* of the observed data, and consequently the learning strategies that are consistent with the data. We stress that the obtained posterior distribution is only a good approximation of the ‘true’, posterior distribution for small tolerance levels *ɛ*. Therefore if the obtained *ɛ* is large—and cannot be improved upon—the inferred parameter spaces are likely not meaningful. This situation may point to an inadequacy of the model, and therefore the assumed social learning processes, to explain the data. The explanatory value of the obtained posterior distribution can be investigated by posterior predictive checks [71]. These assess how well the parameter ranges specified by the posterior distribution explain the observed data (see [65] for further detail and §2d(i)). Additionally, cross validation tests or coverage plots have been developed to further investigate the accuracy of the results of the ABC analysis [40,72,73]. In practice, performing ABC analyses has been made relatively straightforward since the release of software such as DIY-ABC [74], ABCtoolbox [75], and R packages abc [72], abctools [76] and EasyABC [77].

Finally, we note that as well as estimating parameters, ABC has also been used to test between multiple competing models, by estimating Bayes factors from the relative proportions of simulations accepted from each model (e.g. [62]). While it has been shown that this approach is not theoretically justified [78] when reducing the data *D* to summary statistics *S*—as owing to the loss of information this approximation does not necessarily converge on the true Bayes factors—a number of authors have successfully applied various simulation-based power analyses to mitigate this problem (see for example [63,64,79]). And more recently another approach utilizing machine learning algorithms—and in particular random forests—has begun to prove successful for complex ABC model selection [80,81].

### (c) Limits to inference

It is well-known that efforts to understand learning processes based on population-level data may be confounded by equifinality (e.g. [35,36] for a recent discussion). The inference framework introduced above generates posterior distributions of the model parameter describing the learning strategies that are consistent with the observed data. Therefore, the widths of these distributions, or their credible intervals, may provide an informal measure of the level of equifinality [82]. If the posterior distributions are narrow then only a small region of the parameter space is consistent with the data and therefore a large number of learning processes are *not* able to produce the observed frequency changes. In this case, the data carry a relatively strong signature of the underlying processes of social learning. By contrast, if the distributions are wide, a large region of the parameter space is consistent with the data and therefore many social learning processes are able to generate very similar population-level frequency patterns.

In this way the inference framework itself provides a way of exploring the inferential limits of population-level data of a given temporal resolution. For this, the generative model is used to simulate frequency data with a specific parameterization *θ*, i.e. under a *known* process of social learning. Applying the inference procedure to this data produces posterior distributions, and while we know that the data have been generated with a specific parameter value, these distributions indicate all other values (and therefore learning processes) that could produce the ‘observed’ frequency changes equally well. Wide posterior distributions then mean that researchers should not expect cultural data—which is likely to be more noisy compared to pseudo-data produced by the generative model—with a similar temporal resolution to provide much information about underlying mechanisms.

But when do we consider a marginal posterior distribution narrow? One possibility is to compare the widths of prior and posterior distributions of the parameter in question. As mentioned above, the prior distribution describes the possible values that the parameter can assume or summarizes all prior knowledge researchers may have (see blue, solid line in figure 2 for an example of a uniform, uninformative prior distribution). If the parameter range covered by the posterior distribution is smaller compared to the range covered by the prior distribution (see the red, dashed line in figure 2 as an example) then the inference procedure led to the exclusion of some learning hypotheses: social learning processes described by parameter values not covered by the posterior distribution cannot generate theoretical data sufficiently close to the observed data and are consequently not considered to be consistent with the observations. Naturally, the smaller the credible interval the more the pool of potential hypotheses can be reduced, and the stronger the signature of underlying social learning processes in the observed population-level data. If, however, the parameter ranges covered by prior and posterior distributions are almost identical (see the red, dotted line in figure 2 as an example), then *a priori* knowledge of the researchers cannot be improved by analysing such data at the given resolution.

Additionally, cross validation analyses as suggested in [72] provide an alternative way of demonstrating how informative the data are about underlying social learning processes. In this context, we showed in [83] that we should not expect to be able to distinguish between unbiased transmission and moderately strong frequency-dependent selection based on frequency information of a population of cultural variants at two different points in time.

### (d) Application to studies of cultural evolution

#### (i) Cultural change in the linearbandkeramik (LBK) period

To demonstrate the applicability and utility of the generative inference framework described above, we summarize in the following the analysis of a cultural dataset from the earliest-known farming population in Central Europe, the so-called linearbandkeramik (LBK) from approximately 7500–7000 years ago (see [84] for the complete analysis). The dataset records the frequencies of different types of decorated vessels at seven different points in time, denoted by *t*_{j}, *j* = 1, … , 7 defining six phases of cultural change that vary in duration. The aim of this study was to explore whether observed frequency changes in different types of pottery between the beginning and the end of each of the six phases are consistent with a specific hypothesis about the underlying social learning processes, in particular unbiased transmission, frequency-dependent selection and pro-novelty selection. For the sake of brevity, we consider in the following unbiased and frequency-dependent selection only.

The first step of the inference framework is the development of the *generative model*. To make use of all available archaeological information, we used a simulation approach that accounted for the fact that the observed frequencies describe a sample and not the population of pottery types. Starting from observed data, the absolute frequencies *D*(*t*_{j}) = [*n*_{1}, …, *n*_{k}] of *k* different variant types in the sample of size *n*(*t*_{j}) at the beginning of the phase, *t*_{j}, we generated a population of cultural variants **P**(*t*_{j}) = [*R*_{1}, …, *R*_{k}, *R*_{k+1}] from which the sample could have been drawn at random using the Dirichlet distribution approach [71]. The variables *R*_{i} represent the absolute frequency of variant type *i* in the population. Importantly, the population consists of *k* + 1 variant types, where the type *k* + 1 contains all variants of types not observed in the sample at *t*_{j}.

Based on this population **P**(*t*_{j}) = [*R*_{1}, … , *R*_{k}, *R*_{k+1}] and an estimate of the population size *N*(*t*_{j}) at time *t*_{j} (if no other information is available the population size *N*(*t*_{j}) at time *t*_{j} is inferred from the size of the sample at this time), we generated population-level frequencies of the *k* + 1 variant types conditioned on a specific process of social learning at each time step *t* = 1, … , *t*_{j+1} − *t*_{j}. For that, we assumed that in each time step a fraction *r* of the population of cultural variants is removed and new variants are subsequently added (in this way the framework can accommodate temporal changes in population size). While the removal process is random, the replacement process is defined by the assumed process of social learning. In detail, a variant type *i*, *i* = 1, … , *k* is chosen to be added to the population according to the probability
2.1where *N*(*t*) denotes the population size at time *t*, *N*_{i}(*t*) is the number of variants of type *i*, *u* is the total number of variants removed at this time step, *u*_{i} is the number of variants removed of type *i*, *b*_{freq} controls the strength of frequency-dependent selection and is the number of variant types present at time *t*. Importantly, choosing *b*_{freq} = 0 in equation (2.1) models unbiased transmission, whereas *b*_{freq} > 0 describes the selective advantage for high-frequency variant types and *b*_{freq} > 0 for low-frequency types. Further, the variable *μ* defines the probability with which a novel variant type not previously seen in the population is introduced into the system. A similar probability as in equation (2.1) is defined for variant type *k* + 1, containing all variant types not observed in the sample at *t*_{1} and, per definition, all subsequent innovations.

Lastly, to generate theoretical samples at the end of the phase *t*_{j+1}, we randomly drew *n*(*t*_{j+1}) cultural variants from the (theoretical) populations **P**(*t*_{j} + *t*), *t* = 1, … , *t*_{j+1} − *t*_{j}.

In summary, the output of this framework is sample frequencies of the variant types that were present at the beginning of the phase, *t*_{j}, and an additional type containing all unobserved variants at the end of the phase, *t*_{j+1}, conditioned on the social learning process specified by the parameter *b*_{freq} in equation (2.1). which controls the strength of the frequency-dependent selection.

To infer the learning processes consistent with the observed changes in frequency between the beginning and the end of the phases, we applied an ABC procedure—specifically SMC ABC (e.g. [70])—and determined the joint posterior distributions of (*b*_{freq}, *r*). The replacement fraction *r* cannot be estimated from external sources and therefore has to be inferred from the data as well. Thereby the comparison between empirical and theoretical patterns was based on the absolute difference of the theoretical and observed frequencies of the *k* variant types present at the beginning of the simulation. Additionally, we required the same number of initially present variant types to have gone extinct at the end of the phase. The general scheme of the proposed generative inference framework is illustrated in figure 3.

Applying this analysis to all six phases, we concluded that

(i) frequency-dependent selection does not describe the cultural dataset from the earliest farming population in Central Europe better than unbiased transmission. In fact, the credible intervals of all six marginal posterior distributions for

*b*_{freq}contained the value 0, which means that unbiased transmission cannot be excluded as a potential explanation of the data by this analysis (see figure 4*a,b*for an example);(ii) frequency-dependent selection and unbiased transmission may not be the best model to explain the observed data as the achieved tolerance levels (i.e. the ‘distance’, between empirical and observed patterns) of the ABC analysis were relatively large.

Point (ii) suggests that the social learning hypotheses considered are not consistent with the data, which requires a re-evaluation of the generative model. Indeed, we showed in [84] that pro-novelty selection, which captures the preference for ‘young’, or recently introduced, cultural variant types, is able to replicate the observed frequency changes between the different phases and is therefore *a possible* explanation of the data.

Posterior predictive checks further highlighted the problem raised in point (ii). To perform this, we sampled values of the model parameters from the joint posterior distribution, inserted these into the generative model and produced theoretical frequencies at the end of each phase. Repeating this procedure generated theoretical expectations of the frequency ranges for each individual variant type based on the joint posterior distribution. The comparison of the observed frequencies of each variant type with these frequency ranges allowed the explanatory power of the derived posterior distribution to be assessed. If observations are outside the theoretical expectations then the inferred social learning processes cannot replicate all aspects of the dynamic of cultural change, indicating a mismatch between theory and data. This analysis also has the potential to reveal single variant types whose temporal frequency patterns deviate from the general population trend (see [65] for more details). Applying the posterior predictive check to the case study showed that a number of observations were outside their (theoretical) frequency ranges as determined by the joint posterior distributions (see figure 4*c* for an example).

#### (ii) Further applications

In the last section, we demonstrated the application of a generative inference framework to a specific archaeological dataset. Traditionally, Bayesian inference in archaeology has been largely limited to age estimation via^{14}C analyses (e.g. [85,86]), but recently the scope of inference techniques has been vastly broadened, with ABC approaches enjoying increasing popularity (e.g. [65,82,87–90]). In one of the first archaeological applications, Crema *et al.* [87] studied frequency changes of weaponry types in the Jura region of southeast France. The dataset comprises arrowheads of 20 types attributed to 9 chronologically distinct phases. The aim of this study was to analyse whether the temporal frequency change of the different arrowhead types contained evidence for, or against, unbiased transmission or frequency-dependent selection. Using an agent-based simulation as their generative model, the authors produced frequency change patterns under different hypotheses of social learning and under the assumption that the cultural system is at equilibrium. They compared these theoretical patterns to the observed data by measuring the dissimilarity between assemblages. Applying an ABC model selection framework, they concluded that both unbiased transmission and negative-frequency dependent selection could have generated the observed frequency differences within the phases and therefore excluded positive-frequency dependent selection as a possible mechanism of cultural evolution.

But ABC frameworks have not been exclusively used to infer underlying social learning strategies. Porčić & Nikolić [88] analysed the demographic properties of the Mesolithic–Neolithic transition in the Central Balkan region, in particular growth rates and population size estimates for the Lepenski Vir population. Their model generated the expected number of accumulated houses for a large range of demographic scenarios which could then be compared to that observed in the archaeological record. The analysis revealed higher initial growth rates compared to other populations undergoing the Neolithic demographic transition and an increase in population size over time.

In order to highlight the breadth of questions that can be addressed within a generative inference framework we outline two further applications, one to historical studies (a field with no strong tradition of quantitative treatments) and to linguistics. Rubio-Campillo [91] investigated the evolution of combat. He explored the validity of different versions of Lanchester's law predicting the causalities of two enemy forces engaged in a land battle, with a dataset comprising the total number of combatants and causalities from 1080 land battles spanning from the middle of the seventeenth to the beginning of the twentieth century. The three most common formulations of Lanchester's law (linear, squared and logarithmic) can be operationalized using difference equations, and iterating these until one of the forces has suffered as many causalities as recorded in the historical record allowed for the comparison between theoretical and observed data. Besides confirming well-known results, the ABC framework pointed to a gradual decrease in the relevance of individual fighting abilities, suggesting that the plausibility of the models is not constant over the different periods.

Lastly, Thouzeau *et al.* [81] investigated the coevolution between genes and languages at a regional scale. They simulated population genetic and cognate data under various historical models encompassing divergences and multiple borrowings and admixture events between linguistic groups. They applied an ABC framework using linguistic and genetic data from across Central Asia, and were able to reconstruct the partly differing evolutionary scenarios underlying linguistic and genetic differentiation in the region.

### (e) Cautionary notes

Naturally, the application of the generative inference framework presented here has to proceed with caution. It is, after all, an analysis based on an underlying model of cultural change. If this model does not capture the main cultural and demographic processes contributing to the observed temporal frequency changes, the inferences obtained will likely be misleading. In the following, we outline some issues researchers should consider before applying this, or a similar, inference framework.

In this paper, we advocate the use of non-equilibrium frameworks. While this modelling choice allows us to include knowledge about, for example, temporal changes in demographic properties and to initialize the model with observed variant frequencies, it also introduces a time-dependency. The inference framework evaluates whether frequency changes between different time points are consistent with the changes expected under a specific learning process (instead of evaluating whether statistics such as the level of cultural diversity at each point time are consistent with the equilibrium diversity prediction) and therefore misspecifications of time points and consequently the duration of the period over which the frequency changes are measured can produce erroneous theoretical expectations. Crema *et al.* [65] argue that the equilibrium assumption should serve as a hypothesis to be tested, rather than simply held *a priori*. They applied equilibrium and non-equilibrium versions of the generative model of cultural change to a dataset similar to the one described in §2d(i). They concluded that the cultural system was likely not at equilibrium and found hints for shifts between negative and positive frequency-dependent selection for different phases of the archaeological record.

In the archaeological case study described in §2d(i), the temporal change in population size between the beginning and the end of each archaeological phase has been inferred from the change in sample size, and any increase or decrease was assumed to occur in a linear fashion over the relevant time interval. While the assumption of linear change seems plausible, especially in the absence of other information, drastic, unobserved demographic events such as population bottlenecks may be an alternative scenario. Similar to the discussion about equilibrium versus non-equilibrium models, such hidden demographic events have the potential to influence the dynamic of cultural change (e.g. [92]). As they are not included in the generative model, their influences may be mistakenly attributed to social learning processes that are able to produce a similar effect at the population level. But this potential pitfall is also itself amenable to testing with the generative inference framework. Researchers can at least evaluate the extent to which posterior distributions change when assuming a population bottleneck between the beginning and the end of the phase.

The accuracy of ABC inference depends partly on how the difference between observed and simulated data is calculated and on the achieved tolerance level. Calculating the difference based on summary statistics *S* instead of the full data *D* results in discarding likely useful information [93]. If a summary statistic (or set of) is not sufficient—as is generally the case in practice—the resulting posterior distribution will not be equal to that computed with the full data [94] (see also [93] for a review of strategies dealing with this issue). While the impact of using insufficient statistics on inference results can be mitigated by careful application, we note that by using the actual frequencies for calculating the difference between observed and simulated data this problem is circumvented entirely. Further, any posterior distribution with large tolerance levels does not approximate the ‘true’ posterior distribution and should be treated with caution. In this case, the generative model may not produce data that are sufficiently close to the observed data. Additional procedures such as posterior predictive checks, cross validation tests or coverage plots offer additional insights into the accuracy of the inference results.

Lastly, we point to the relationship between data quality or completeness and inferential accuracy. A recent study [31] revealed the importance of rare variants for inferring underlying processes. Using the progeny distribution (which records the frequencies of cultural variant types that produce *k* new variants over a fixed period of time) as a statistic, the authors showed that analyses based on only the most popular variants, as is often necessarily the case in cultural evolutionary studies, can provide misleading evidence for underlying transmission hypotheses. Especially in archaeological case studies, the observed frequencies describe the composition of often relatively small samples of cultural variants, and consequently rare variant types are likely *not* sampled and therefore absent from the data. Even though statistical techniques such as the Dirichlet distribution approach mentioned in §2d(i) are available, the number of rare variant types, i.e. types that are not contained in the observed sample, is likely to be misspecified (e.g. [95]) and future work is needed to understand the influence of missing data on the accuracy of the generative inference frameworks presented in this paper.

## 3. Conclusion

Relatively recent developments in population genetics—namely coalescent modelling and ABC—have made generative inference possible, and shown it to be a powerful inferential framework for understanding the human past (e.g. [57,58,63,64]). Cultural evolutionary theory has been greatly advanced by adopting concepts and modelling paradigms originating in population genetics. In this spirit, the aim of this paper was to demonstrate how analogous generative inference frameworks can be applied to cultural frequency data, potentially allowing us to close the gap between theoretical modelling work and empirical work in cultural evolution.

In particular, we focused on the topic of inferring how human populations use social information based on the available empirical evidence. In many case studies of interest, the available data are in the form of frequencies of different variants of a cultural trait in the population at one or several points in time, which means that we face a classical inverse problem. Naturally, attempting to address this problem leads to the question of how much information about underlying processes of social learning can in fact be extracted from cultural frequency data of a given resolution. The framework outlined here allows us to address this equifinality problem. At the heart of this framework is a generative model, which captures the main cultural and demographic properties of the system considered. As noted, there are no restrictions on the type of model used, with the one described in §2d(i) simply an example tailored specifically to the observed population-level frequency data. Whatever their form, these models establish a causal link between model parameters controlling the strengths of underlying evolutionary processes and observable population-level patterns; in our case, between parameters controlling the strengths of social learning processes and population-level frequencies of cultural variant types. Bayesian inference techniques, such as ABC, can then evaluate whether this specific process of social learning is able to produce frequency patterns consistent with the observed ones.

The outcome of this inference approach is posterior distributions of the model parameters describing the learning processes that are consistent with the observed data. As discussed in §2e, while there are a number of important factors potentially influencing the accuracy of the analysis to consider, the widths of the posterior distributions may be indicative of the amount of information about the underlying social learning processes contained in the data. Narrow posterior distributions indicate that the data carry a relatively strong signature of these processes, while wider distributions suggest that the data are largely uninformative or that the models considered do not provide an adequate description of the cultural system. Therefore, this approach does not only allow for the identification of the most likely underlying learning process given the empirical data, but also for a description of the breadth of processes that could have produced the these data equally well.

Revealing the presence of equifinality may appear to be a negative result, but we stress that one should not expect a unique mapping between (sparse) population-level frequency data and underlying processes of cultural evolution [4,36]. Nevertheless, the analysis of such data will help in excluding social learning processes that could not have produced the observed data. In this way inference frameworks will lead to a reduction in the pool of potential hypotheses (even though the level of reduction might vary from case study to case study) and to an understanding of which kinds of scientific questions can be answered by which kinds of data. Additionally, we note that generative inference frameworks inform about the consistency of a limited set of possible underlying mechanisms with the data while not excluding the possibility that other mechanisms may be consistent as well. However, this should not necessarily be seen as a weakness, and as pointed out by Csilléry *et al.* [93, p. 413], ‘in reality scientific arguments often revolve around a limited number of hypotheses or scenarios without the need to consider an infinite set of alternative models. Models can always be improved and refined by other authors, allowing an open discussion that can greatly increase our understanding of the problem being studied.’

Undoubtedly, more research is needed to further develop and improve the statistical tools and to explore the influence of e.g. unobserved changes in the demographic properties of the system considered or of the quality of the observed data on the accuracy of generative inference frameworks, but we believe this is an exciting and promising new direction in cultural evolution that has already begun to produce interesting results.

## Data accessibility

This article has no additional data.

## Competing interests

We declare we have no competing interests.

## Funding

No funding has been received for this article.

## Acknowledgments

We thank Nicole Creanza, Oren Kolodny and Mark Feldman for inviting us to contribute to this special issue. Further, we thank two anonymous reviewers for their constructive comments and criticisms, which helped us improve this manuscript and members of the department of Human Behavior, Ecology and Culture at the Max Planck Institute for Evolutionary Anthropology for helpful comments on an earlier version of the manuscript.

## Footnotes

One contribution of 16 to a theme issue ‘Bridging cultural gaps: interdisciplinary studies in human cultural evolution’.

- Accepted December 21, 2017.

- © 2018 The Author(s)

Published by the Royal Society. All rights reserved.