## Abstract

*Behavioural social choice* has been proposed as a social choice parallel to seminal developments in other decision sciences, such as *behavioural decision theory*, *behavioural economics*, *behavioural finance* and *behavioural game theory*. Behavioural paradigms compare how rational actors *should* make certain types of decisions with how real decision makers behave *empirically*. We highlight that important theoretical predictions in social choice theory change dramatically under even minute violations of standard assumptions. Empirical data violate those critical assumptions. We argue that the nature of preference distributions in electorates is ultimately an empirical question, which social choice theory has often neglected. We also emphasize important insights for research on decision making by individuals. When researchers aggregate individual choice behaviour in laboratory experiments to report summary statistics, they are implicitly applying social choice rules. Thus, they should be aware of the potential for aggregation paradoxes. We hypothesize that such problems may substantially mar the conclusions of a number of (sometimes seminal) papers in behavioural decision research.

## 1. Reconciling the segregated decision sciences

The decision sciences are currently segregated into nearly disparate research areas. On the one hand, researchers study *individual choice*; that is, decision making at the level of the individual decision maker. On the other hand, another research community studies *social choice*; that is, aggregate decision making at the level of groups or societies, especially in the form of voting. These two research communities, individual and social choice researchers, by and large associate with different scientific societies and publish in different journals. Another important distinction is that between *normative*, i.e. *rational* theories of choice, which satisfy certain theoretically motivated optimality criteria, and *behavioural*, i.e. *descriptive* theories that describe or explain empirically observed choice behaviour. Figure 1 shows these conceptual distinctions along two major axes. Different research paradigms fall into different sections of the implied 2×2 table. While many paradigms in the decision sciences do not fit squarely in a single spot, we have nonetheless attempted to place several important paradigms in their most pertinent locations in the table. For instance, utility theory, such as expected utility theory (von Neumann & Morgenstern 1947; Savage 1954), is the normative theory of rational individual decision making under uncertainty or risk.

There has been limited cross-fertilization between individual and social choice research areas. The most important exception is the fact that social choice theory has systematically incorporated rational utility theory as a theoretical primitive. Another noteworthy exception is the literature on justice and fair division (Balinski & Young 1982; Schokkaert & Lagrou 1983; Kahneman *et al*. 1986; Brams & Taylor 1996; Schokkaert & Devooght 2003; Konow in press). As Regenwetter *et al*. (2007*a*) have emphasized, major progress in the decision sciences may hinge on the ability of the various disparate communities to integrate their collective wisdoms and develop new synergies. The need for a unified framework to the decision sciences is indicated by the large dotted box in figure 1.

Several major movements have arisen, which respond to the largely normative tone of prior theory in the decision sciences. Behavioural counterparts of normative theories (e.g. behavioural decision research, experimental economics, behavioural finance, behavioural game theory and behavioural social choice, see also the glossary at the end of the paper) have a decades-long tradition of contrasting normative proposals, such as expected utility theory, against empirical human choice data.

We provide a status report on behavioural social choice research, and discuss the facilitating role this paradigm can play in establishing a broader and more unified decision sciences research programme. The paper is organized as follows. In §2, we review past work showing that the famous Condorcet paradox of majority cycles may have limited behavioural support. In §3, we discuss the fact that behavioural decision research in the individual choice domain routinely uses aggregation and therefore must become attuned to social choice paradoxes in order to avoid artefacts caused by unsound data aggregation. Section 4 provides a new result on the sampling properties of social choice rules when dealing with the preference distributions other than the symmetric distributions we have coined ‘cultures of indifference’ (see glossary for a definition). Section 5 illustrates a behavioural social choice analysis. In §6, we propose a research paradigm that expands the notions of ‘Condorcet efficiency’ and ‘Borda efficiency’ (see glossary for definitions).

## 2. The empirical rarity of the Condorcet paradox

Social choice theory (Arrow 1951; Black 1958; Sen 1970; Gehrlein & Fishburn 1976; Riker 1982; Tangiane 1991; Saari 1995; Mueller 2003) has had as its principal concerns the axiomatic structure (i.e. the abstract mathematical properties) of voting rules and social welfare functions, including impossibility results (see glossary for definitions). A major concern in social choice theory has been the problem of *intransitivity*, i.e. a cyclical situation where there exist three choice alternatives such that the first is socially preferred to the second, the second is socially preferred to the third, yet, the third is socially preferred to the first. Intransitive cycles are often labelled a ‘social choice paradox’. Social choice theory has generated many estimates of the degree of intransitivity created by/inherent in the aggregation of individual preferences into collective decisions. At the heart of much of this work is the Condorcet paradox of cyclical majorities (see glossary for a simple example). In a cyclical majority, no matter which candidate is elected, a majority of voters will be disappointed because they would prefer someone else to be the chosen candidate. Perhaps even more importantly, cyclical majorities seem to cast into doubt the very notion of meaningful majority decision making (Riker 1982).

There are two important literatures dealing directly with the Condorcet paradox. The first is based on analytic or simulation results that look at theoretical distributions. The most common assumption is the *impartial culture* (see glossary), a distribution in which all (weakly or) linearly ordered actor preferences among some set of objects are taken to be equally likely. This literature asks how often we must expect to find intransitive social preferences and concludes that the paradox should be ubiquitous (DeMeyer & Plott 1970; Gehrlein & Fishburn 1976; Riker 1982; Gehrlein 1983; Lepelley 1993; Jones *et al*. 1995; Van Deemen 1999; Mueller 2003).

The second major literature on the Condorcet paradox provides theorems that show sufficient conditions to avoid cycles and reveals that these conditions appear to be highly restrictive (Sen 1966, 1970). Two major themes of both these literatures are (i) the theoretical prediction that majority elections should be plagued with cycles and (ii) the broadly advertised policy recommendation (Shepsle & Bonchek 1997) to beware the use of majority rule in real elections because a majority winner is unlikely to even exist.

Regenwetter & Grofman (1998*a*) and Regenwetter *et al*. (2002*b*) found virtually no empirical evidence for the Condorcet paradox in survey or ballot data. Therefore, in a series of publications (Regenwetter & Grofman 1998*b*; Regenwetter *et al*. 2002*a*,*c*, 2003; Tsetlin & Regenwetter 2003; Tsetlin *et al*. 2003) that culminated in a Cambridge University Press book ‘*Behavioral social choice*’ (Regenwetter *et al*. 2006), members of the present team of authors, and others, re-examined the arguments leading to the belief that the Condorcet paradox should be an inevitable concomitant of any majority rule voting process. They showed that the existing results, while mathematically correct, were nonetheless misleading. For example, they found that simulation results were based on ‘knife-edge’ theoretical assumptions, where even minuscule deviations from the theoretical assumptions lead to dramatic changes in predictions. Similarly, they found that the theoretical sufficiency conditions for avoiding the Condorcet paradox primarily tell us what ‘cannot be ruled out under all possible circumstances’ rather than providing realistic evaluations of the threat posed by the Condorcet paradox. Regenwetter *et al*. (2003) stated abstract and yet empirically plausible sufficient conditions to avoid the paradox, and found empirical evidence in survey and ballot data that the conditions they had identified as sufficient to avoid the Condorcet paradox were satisfied (or sufficiently nearly satisfied). Recent experimental work on deliberative polls suggests potential explanations of how deliberative democracy may avoid the Condorcet paradox (List *et al*. 2007).

## 3. The importance of behavioural and normative social choice theory for behavioural individual decision research

We now turn to the role that social choice theory should (but currently does not) play in individual behavioural decision research. Consider the following three binary non-negative gambles:(3.1)(3.2)(3.3)Imagine that individual decision makers in a laboratory experiment make pairwise choices among these gambles. Suppose that most participants choose A over B, most choose B over C and yet, most also choose C over A. This kind of cyclical pattern of choices poses a challenge to standard decision theories and it has motivated recent prominent developments of *heuristic* decision theories, i.e. theories of decision making by ‘computationally simple rules of thumb’.

For illustrative purposes consider cumulative prospect theory (CPT, see glossary; Tversky & Kahneman 1992; Wakker & Tversky 1993). For non-negative gambles such as A, B, C in (3.1)–(3.3), CPT transforms each probability *p* of an outcome via a probability weighting function *w*, say,(3.4)and each gamble outcome *x* via a utility function *v*, say, of the form(3.5)The probability weighting function *w*, depending on the value of *γ*, inflates low probabilities (i.e. predicts risk-seeking behaviour when probabilities are low, such as in a lottery) and deflates high probabilities (i.e. predicts risk avoidance when probabilities are high). The utility function *v*, depending on *α*, inflates relatively small gains and deflates relatively large gains. The biasing of probabilities and utilities in CPT is based on a large empirical literature that has reported cognitive biases and limitations in humans when dealing with choice under conditions of risk or uncertainty.

For a binary non-negative gamble *f*, writing *p*_{1} for the probability of winning the smaller amount *x*_{1}, and *p*_{2} for the probability of winning the larger amount *x*_{2}, let and . In CPT, for binary non-negative gambles *g*, *h*,(3.6)The two functions *w* and *v* depend on two parameters, *γ* and *α*. Table 1 gives examples of parameter values, implied values *V*(*f*) for the above three prospects A, B and C, and the implied preference order among gambles, from the best to the worst. Regardless of the choice of parameters *γ* and *α*, we will not be able to accommodate the empirical cycle using equation (3.6). This is because (3.6) implies transitive preferences. Thus, it seems as if CPT could not explain the hypothetical cycle in our example.

A prominent recently proposed decision heuristic tackles this problem. The priority heuristic (PH, see glossary) of Brandstätter *et al*. (2006) theorizes that decision makers compare gambles via a process that induces a lexicographic interval order (see glossary for a definition). In a nutshell, decision makers sequentially (‘lexicographically’) consider three attributes (the so-called ‘reasons’) and an aspiration level for each reason. They visit the reasons in a specific order and stop their decision process whenever an aspiration level is met for the given reason currently under consideration (Brandstätter *et al*. 2006). For the three gambles above, the PH predicts that the decision maker chooses A over B (by reason 1). The PH also predicts that the decision maker will choose B over C (by reason 3). However, the PH predicts that the decision maker chooses C over A (by reason 3). Clearly, the PH accounts for 100 per cent (all three) of the pairwise majority choices.

In the terminology of Brandstätter *et al*. (2006), the PH is able to capture perfectly ‘the process’ by which the decision makers arrived at their final choices. Using a similar approach, Brandstätter *et al*. (2006) argued that the PH is superior to several leading decision theories because it models the cognitive process of decision making and, compared with these competing theories, it correctly predicts the largest number of modal pairwise choices in several datasets from the literature (Kahneman & Tversky 1979; Tversky & Kahneman 1992; Lopes & Oden 1999; I. Erev *et al*. 2002, unpublished manuscript). Because the modal choice among a pair of gambles is also the majority choice, these conclusions are, in fact, based on descriptive analyses within a pairwise majority aggregation approach.

Now, let us return to the imaginary decision experiment. Before, we reported the data in the usual aggregated fashion that one often finds in the behavioural decision literature that studies individual choice. However, if we consider the data in more detail, an interesting new picture emerges. Suppose the data came from three decision makers (DM 1, DM 2 and DM 3) who made the combinations of choices shown in table 2. Note that each decision maker acted in accordance with CPT using (3.6) and using one of the parameter choices in table 1. Most importantly, not a single decision maker chose in accordance with the PH. But majority aggregation, a popular method for summarizing choice data (Tversky 1969; Kahneman & Tversky 1979; Tversky & Kahneman 1981, 1986; Birnbaum 2004; Brandstätter *et al*. 2006), obscures this fact. By majority, the PH alone is able to accommodate 100 per cent of the (majority choice) ‘data’, while CPT, the theory according to which we computed each choice, accommodates at best two-thirds of majority choices. We are facing a Condorcet paradox, where the pattern of majority choices does not match the choice pattern of even one individual decision maker.

Any aggregation of choice data could create artefacts in the analysis of decision-making behaviour. Aggregation, especially by majority, is common in individual behavioural decision-making research, including in seminal papers (Tversky 1969; Kahneman & Tversky 1979; Tversky & Kahneman 1981, 1986; Brandstätter *et al*. 2006). This means that much past research in behavioural decision research is susceptible to aggregation paradoxes such as the Condorcet paradox. This is a reason why behavioural decision researchers should systematically incorporate social choice theoretical considerations into their work.

## 4. Sample social choice outcomes as estimators of population social choice outcomes

We now proceed to a new, yet simple, result in social choice theory. We explain the behaviour of scoring rules in samples from nearly any conceivable kind of culture (population distribution) with the only caveat that it deviates in one crucial way from the cultures that have dominated the discussion historically.

While statistics has played a major role in social choice theory in the guise of sampling distributions derived from various theoretical distributions, such as the impartial culture (see glossary for the definition), inferential statistics does not seem to be used systematically. Traditionally, social choice theorists have rarely considered the need to draw statistical inferences from empirical data about underlying population properties of social choice functions in a given electorate. Yet, ever since the close call in the 2000 US presidential election, it has become clear that published ballot counts are not a deterministic function of the distribution of preferences in a population. Rather there are many probabilistic components that affect turnout, ballot casting and ballot counting. Two of the present authors have studied this notion for more than a decade. Regenwetter *et al*. (2006) and the component papers that were published earlier, as well as Regenwetter & Tsetlin (2004) and Regenwetter & Rykhlevskaia (2007) have promoted the need to consider social choice data from an inferential statistical point of view.

In this section, we show that the sampling approaches to studying the behaviour of scoring rules in large electorates are extremely dependent on underlying theoretical assumptions. This provides additional motivation as to why behavioural analyses of social choice procedures are critical for our understanding of their real-world performance. Recall that the most famous distributional assumption in social choice theory is the impartial culture assumption, according to which an electorate can be considered to be a random sample with replacement from a uniform distribution over linear (or weak) orders on the set of candidates. According to this assumption, if one randomly samples a voter from the population and if there are *n* candidates, then all of the *n*! possible orders of candidates are equally likely to match the preference of that voter.

Consider the impartial culture from a statistical view point. When computing Condorcet's majority rule, or any scoring rule (see glossary for definitions), such as plurality or Borda, at the distribution level of the impartial culture, we obtain a perfect tie among all candidates. Yet, random samples of any size, if they contain an odd number of voters, will reproduce that majority tie among any two candidates with probability zero, when voters are assumed to have linear-order preferences. Condorcet's majority relationship is not a consistent estimator of the population majority relationship when samples may originate from knife-edge distributions (see glossary for a definition) similar in nature to the impartial culture.

Consider any social choice method, such as Condorcet's majority rule, or a scoring rule, aka positional voting method. A *culture of indifference* (*with regard to that procedure*) is a probability distribution over preference relationships (linear orders, weak orders (WOs) and partial orders) with the property that one or more pairs of candidates are tied at the distribution level, according to that social choice method. When considering majority rule, for example, a *culture of indifference* is any probability distribution over binary relationships of any kind, such that for at least some distinct pair of candidates, A, B, the total probability of those preference relationships in which A is preferred to B equals the total probability of those preference relationships in which B is preferred to A. Regenwetter *et al*. (2006) have shown that the majority rule outcomes of large electorates (drawn from an underlying theoretical culture) will converge (with increasing size of the electorate) to the majority preferences in the underlying culture *as long as* that culture is *not* a culture of indifference. The purpose of this section is to show the analogous result for scoring rules.

Cultures of indifference are knife-edge assumptions that lead to chronically paradoxical behaviour (in the statistical and social choice sense) of social choice rules in random samples. Unfortunately, much work on the likelihood of voting paradoxes has hinged on cultures of indifference. This has created a common perception that voting paradoxes are extremely likely. We consider this a profound and far-reaching artefact of unrealistic theoretical modelling assumptions.

Let be a finite set of candidates. We define a probability space (, , ), where is a family of order relationships on (e.g. linear orders, WOs or, say, asymmetric and acyclic binary relationships), the power set of and is a culture, and, in particular, a population probability distribution over . A *scoring rule* takes any preference relation from and gives a numerical *score* to each candidate in . Regenwetter & Rykhlevskaia (2007) have developed general scoring rules for a general class of binary relationships, using the notion of a generalized rank of Regenwetter & Rykhlevskaia (2004).

More formally, a scoring rule is a set of functions , where and *f*_{A}(*R*) is the score given to candidate A for the preference relationship *R*. For a given scoring rule, we define the random variable to be the score of candidate A for the *i*th draw in a sequence of independent and identically distributed (i.i.d.) draws from a population with distribution *P*. That is, for *R*∈,The *sample score* for candidate A is the average score of A in a sample of size *n*. It is denoted byFor a sample of size *n*, whenever , we say that A is socially ordered ahead of B in the sample, by the given scoring rule. The resulting order by any given scoring rule for a given sample is called the *sample social order* according to that scoring rule. The *population score* of a candidate A is the expectation of . We writeWhenever *S*_{A}>*S*_{B}, we say that A is socially ordered ahead of B in the population, by the given scoring rule. The order prescribed by a scoring rule for the entire population is called the *population social order* of that scoring rule.

The following result shows that if the population social order has no ties, then converges in probability to *S*_{A} as *n* grows arbitrarily large. Hence, is a consistent estimator of *S*_{A}. We conclude, as a consequence, that the sample social order converges to (is a consistent estimator of) the population social order whenever the population is not a culture of indifference.

*If the population social order of a given scoring rule has no ties*, *then the sample social order of that scoring rule converges to the population social order as the sample size increases*.

Since the random variables are i.i.d. with finite mean and variance, the weak law of large numbers implies that converges in probability to *S*_{A}. Hence, for any *ϵ*, *δ*>0 there exists such thatPick any pair of candidates A, B∈. Since we assume that there are no ties in the population social order, we assume without loss of generality that *S*_{A}>*S*_{B}. Pick *δ*>0 and *ϵ*>0, such that . ThenThus, the sample social order converges to the population social order. ▪

In summary, first, we have seen that, in cultures of indifference, social choice rules display highly irregular behaviour in the sense that they need not be consistent estimators of culture social orders. Yet, second, in cultures that are not cultures of indifference (i.e. nearly any culture one could think of), large electorates will display exactly the same social choice behaviour as has been theoretically assumed at the level of the culture to begin with. This holds for majority rule (Regenwetter *et al*. 2006) and, as we have proved here, for scoring rules. Note also that we have allowed individual preferences to be binary relationships of any kind, not just weak or linear orders.

## 5. Eight behavioural social choice analyses

We now expand recent developments in behavioural social choice along the lines of Regenwetter *et al*. (2007*b*). Those authors analysed four large sets of empirical ballot data from the 1998–2001 annual presidential election ballots of the American Psychological Association (APA), that were collected under the *Hare* system. Each election featured on the order of 20 000 voters and a rather politicized electorate. Regenwetter *et al*. (2007*b*) analysed these ballots using a series of different models and using bootstrap methods for statistical inference.

Table 3 illustrates some key points of such a behavioural social choice analysis, similar to that of Regenwetter *et al*. (2007*b*). First, since social choice outcomes can depend very heavily on theoretical assumptions, we carry out empirical analysis using at least two sets of fundamentally different assumptions about the nature of preferences and about the vote casting process. Second, we evaluate the statistical replicability of our findings. Third, we contrast the famed theoretical incompatibility of competing social choice procedures with a high degree of agreement among methods in empirical data.

Regenwetter *et al*. (2007*b*) included the WO analysis of the 1998–2001 data. The WO model assumes that all ranked candidates are preferred to all non-ranked candidates on the ballot, and that the voter is indifferent among all candidates s/he does not rank. In the table, we have added four new datasets (2002–2005) and a new model which we call the *Zwicker* model (Dr W. Zwicker 2006, personal communication). The ballot data are partial ranking counts. Zwicker suggested interpreting the data as follows: count A as strictly preferred to B if and only if both options have been ranked and A has been ranked as preferable to B. The Zwicker model (ZW) does not assume any preference among any pair of candidates of which one or both were left unranked. Thus, ZW translates partial rankings into strict partial orders. For scoring rules, the general results of (Regenwetter & Rykhlevskaia 2004, 2007) allow us to assign appropriate scores to all candidates from every ballot, even to those candidates that have not been listed in the voter's partial ranking.

We have drawn 10 000 random samples, with replacement, of sample size equal to the original ballot count, from a hypothetical population distribution (culture) estimated via either the WO model or the Zwicker model. In each sampled set of ballots, we have computed the Condorcet, Borda and plurality outcomes. This is a non-parametric bootstrap of the confidence we can have in the empirical social welfare outcomes under the three rules. The bootstrap is a way to simulate possible sources of uncertainty in election outcomes, such as unreliabilities in turnout, ballot casting and ballot counting. Intuitively speaking, it shows how sensitive the final tally is to small perturbations in the ballot distribution.

For each dataset, and for each model, we report the modal social welfare order, as well as its approximate bootstrapped confidence. When the confidence exceeds 98 per cent, then we leave out the value, and simply display that social order in italics. First, in all eight elections, and independently of the model, we avoid the Condorcet paradox with confidence near 100 per cent (the table omits some details). Second, we find some degree of model dependence regarding the exact nature of the social orders. For example, the Condorcet order by WO and ZW differ in 1998, 2001 and 2002, due to tight pairwise margins for plurality. Nonetheless, this model dependence has no bearing on the empirical absence of a cycle. Third, note that plurality, which uses the least information in the ballots, comes with sometimes extremely low statistical confidence, i.e. even small changes in the ballot distribution can affect the social order. Fourth, table 3 suggests that there is a fair degree of agreement among the three voting rules. In particular, in nearly every case where the three social orders can be estimated with high confidence, they yield identical winners and identical losers. This stands in direct contrast with the literature that predicts very substantial disagreements among the three rules. Besides the absence of a cycle, this is another important divergence from common wisdom in social choice theory. We discuss this more directly next, with a special focus on the agreement about the winner.

## 6. Generalizations of Condorcet efficiency and Borda efficiency

Building on research about the Condorcet paradox, a highly sophisticated, and often quite technical, literature is concerned with the Borda and Condorcet efficiencies of voting methods. For instance, the Condorcet efficiency of a voting method is the conditional probability that the election winner matches the Condorcet winner in a random sample of ballots (from some theoretical distribution), provided that there exists a Condorcet winner. More generally, this literature studies the interrelationship among social choice rules (Chamberlin & Cohen 1978; Gehrlein & Fishburn 1978; Riker 1982; Merrill 1984, 1985; Bordley 1985; Gehrlein 1985, 1992; Merrill & Nagel 1987; Nurmi 1988, 1992; Adams 1997; Gehrlein & Lepelley 2000; Merlin *et al*. 2000; Mueller 2003). A large part of this literature concentrates on cultures of indifference. This literature predicts that many standard and competing voting procedures disagree with one another a substantial part of the time. A related theoretical and empirical literature studying variants of the Condorcet jury theorem (Grofman 1981; Miller 1986; List & Goodin 2001), however, avoids cultures of indifference. The empirical literature that compares social choice procedures against each other (Yaari & Bar-Hillel 1984; Felsenthal *et al*. 1986, 1993; Rapoport *et al*. 1988; Leining 1993; Felsenthal & Machover 1995; Hastie & Kameda 2005; Tideman 2006) is small, by and large it avoids considerations of statistical inference, and it usually considers sparse datasets.

Arrow's (1951) famous impossibility theorem can be interpreted to mean that any choice of a consensus method comes at the cost of giving up principles that underlie other, competing, and mathematically not universally compatible voting methods. Saari (1994, 1995, 1999) has shown that one can create distributions that yield virtually any combination of differences in results across voting methods. He has developed an algorithm to specify such distributions precisely. We propose a straightforward extension of the study of Condorcet efficiency: what is the probability in random samples from known cultures, and what is the inferred (e.g. bootstrapped) population confidence based on empirical data, that any two or more social choice procedures, e.g. Condorcet and some scoring rules, agree on (i) the winner, (ii) the loser, (iii) the entire social order? Which social choice rules appear to be in heavy empirical disagreement, and which appear to be highly consistent in most empirical settings? What characteristics of the empirical distribution appear to drive the agreement and/or disagreement among competing social welfare functions?

While much more work is needed to support any general claims, we have some early indication that this approach will reveal more puzzles about social choice. Table 4 shows, as benchmarks, our simulated agreement probabilities for the impartial culture on WOs over five candidates (WO5) and for the uniform distribution on partial rankings of five candidates (PR5). These benchmarks suggest that one should not have high hopes of two or even all three among Condorcet, Borda and plurality yielding the same winner for five candidates. The table compares these benchmarks to agreement probabilities we derived by bootstrapping from the empirical ballot data. Our analysis in table 4 suggests that in six out of the eight ballots-based distributions, the corresponding probabilities virtually equal 100 per cent.

A large proportion of the theoretical literature is based on cultures of indifference, where sample social choice functions are not consistent estimators of the population social orders. Once we move away from cultures of indifference, large samples will have high agreement among social choice rules if and only if the cultures (population distributions) themselves have social orders that agree. This is a direct consequence of the fact that the Condorcet procedure and all scoring rules are consistent estimators of the corresponding population social orders, whenever we are not drawing from a culture of indifference. Clearly, we face another situation where theoretical predictions are direct functions of the underlying theoretical assumptions: in cultures other than cultures of indifference, if the culture features agreement among Condorcet and/or scoring rules, then large samples will replicate that agreement with probability converging to 1. Even though margins may be small in some empirical elections that involve heavy campaigning, we do not consider cultures of indifference to be realistic representations of real-world electorates.

Our findings show how crucial empirical work will be in untangling the puzzle surrounding the question of agreement or disagreement among social choice procedures. Axiomatic theory is, by and large, mute about population distributions (i.e. about what are or are not suitable assumptions to make about underlying cultures that generate ballot frequencies). The nature of preference distributions in electorates is ultimately an empirical question. We hope that this paper will encourage the social choice community to augment their traditionally normative theoretical work with a behavioural analysis component. However, we would like to emphasize that empirical work should be carried out in a fashion that is statistically sound. Published ballot counts are not deterministic functions of the underlying preference distributions and must be subjected to adequate statistical inferential methods.

## 7. Conclusion and discussion

Behavioural social choice research has collected evidence that Condorcet cycles are surprisingly rare in empirical survey and ballot data. This work has leveraged statistical inference as a major methodological tool. Behavioural social choice theory has also highlighted the role of model dependence, i.e. the fact that conclusions about social choice procedures can hinge on the theoretical assumptions that enter theorems, simulations or statistical analyses of empirical data. Nonetheless, while inferred distributions of preferences in electorates often depend on theoretical assumptions in the analysis, the empirical *absence* of majority cycles has been extremely robust across a range of modelling assumptions. The same holds for the agreement among competing voting methods.

In this paper, we have reviewed how various branches of the decision sciences are nearly completely disparate. Furthermore, we firmly believe that major advances could be possible if the different ‘constituencies’ of the decision sciences were consolidated. For example, we have highlighted how the standard research paradigm of individual behavioural decision research routinely relies on social choice aggregation of individual choice data, often without regard to possible social choice paradoxes. While it is too early to tell, this practice could permeate the empirical literature with artefacts.

In the social choice domain, we have discussed a new but straightforward result about the statistical nature of scoring rules. When sampling from a culture that is not a culture of indifference, (sufficiently) large electorates' social order, by any scoring rule, will match (with probability arbitrarily close to one) the social order by the same scoring rule found in the underlying culture. Our previous work has shown that the same is true for the Condorcet criterion. This has major implications for the famed disagreement among social choice rules. Whether Condorcet and/or various scoring rules, such as Borda, agree or disagree with each other in large samples will completely depend on the assumptions made about the underlying culture. In cultures of indifference, which have received a disproportionate amount of attention, all of these scoring rules display pathological sampling behaviours because, in this case, the sample social orders are not consistent estimators of the population social orders. For example, with just two candidates and an impartial culture, the social order in the impartial culture is a two-way tie by Condorcet and by every scoring rule. Yet, for samples of any size and for asymmetric and weakly complete pairwise individual preferences (i.e. when each decision maker strictly prefers one or the other among the two choice alternatives), if the sample size is odd, the probability that the sample social order matches the social order of the underlying population is zero, regardless of sample size.

Behavioural social choice analyses have now revealed in a number of datasets that competing social orders appear to be in nearly perfect agreement with each other, often with high statistical confidence. This suggests that realistic cultures should not be cultures of indifference, and that the theoretical literature may promote overly pessimistic views about the likelihood of consensus among consensus methods. Axiomatics highlight that competing methods cannot universally agree with each other. Simulation results, as we have shown, will completely hinge on the assumptions made about the generating distribution that underlies the ballot counts. Ultimately, it falls upon empirical researchers to discover the properties of real-world distributions of preferences in real populations, and to characterize the conditions under which competing social choice rules agree or disagree with each other.

Here, we have highlighted a behavioural social choice approach to understanding the empirical and statistical properties of preference aggregation and voting methods. We think that this paradigm can be usefully extended to other domains. For example, there is a growing literature on statistical properties of belief aggregation methods that builds on the Condorcet jury theorem in much the same way that early social choice work built on the Condorcet criterion for aggregating preferences (Black 1958; Grofman *et al*. 1983). Much early work in this area uses very strong assumptions about statistical independence, while some recent work uses Nash–Bayesian assumptions that seem to us behaviourally implausible (List & Goodin 2001; Dryzek & List 2003).

Science often proceeds in a two-stage process where empirical work and theoretical work go hand-in-glove, each inspiring the other in an upwardly spiralling ladder of knowledge. This is our hope for behavioural social choice.

## Acknowledgments

This material is based upon work supported by the Air Force Office of Scientific Research, *Cognition and Decision Program*, under Award no. FA9550-05-1-0356 entitled ‘Testing Transitivity and Related Axioms of Preference for Individuals and Small Groups’ (to M.R., PI) and by the National Institute of Mental Health under *Training Grant* Award no. PHS 2 T32 MH014257 entitled ‘Quantitative methods for behavioral research’ (to M.R., PI). Grofman's contributions to this research were supported by the Institute for Mathematical Behavioral Sciences and the Center for the Study of Democracy at the University of California, Irvine. D. Cavagnaro carried out this work while a NIH postdoctoral trainee at the University of Illinois at Urbana-Champaign. Any opinions, findings and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the Air Force Office of Scientific Research, the National Institute of Mental Health, the Insitute for Mathematical Behavioral Sciences, or the Center for the Study of Democracy.

## Footnotes

One contribution of 11 to a Theme Issue ‘Group decision making in humans and animals’.

- Glossary:
- © 2008 The Royal Society