We expect that natural selection should result in behavioural rules which perform well; however, animals (including humans) sometimes make bad decisions. Researchers account for these with a variety of explanations; we concentrate on two of them. One explanation is that the outcome is a side effect; what matters is how a rule performs (in terms of reproductive success). Several rules may perform well in the environment in which they have evolved, but their performance may differ in a ‘new’ environment (e.g. the laboratory). Some rules may perform very badly in this environment. We use the debate about whether animals follow the matching law rather than maximizing their gains as an illustration. Another possibility is that we were wrong about what is optimal. Here, the general idea is that the setting in which optimal decisions are investigated is too simple and may not include elements that add extra degrees of freedom to the situation.
In this paper, we are concerned with modelling the action of natural selection. Selecting the best action when making a decision can be of great importance to an individual's fitness; we often term making the right choice as being ‘rational’. Ultimately, however, decision-making processes are products of an individual's evolutionary history, endowed via natural selection. In this paper, we are concerned with whether we should expect rational behaviour to be a product of naturally selected systems. In a general sense, rationality involves thinking and behaving reasonably and logically, but the term holds different meanings for researchers in different intellectual fields. The meaning and implications of the term ‘rationality’ have been discussed at great length. We do not intend to try and review all of these here, but use the categorizations of Kacelnik (2006) as a guide (for further introductions to the debate, see Wilson 1974; Moser 1990; Manktelow & Over 1993). Kacelnik (2006) adeptly introduces and summarizes three categorizations, representing the different disciplines for which rationality has been of central interest: philosophy and psychology (PP-rationality); economics (E-rationality); and behavioural ecology/evolutionary biology (B-rationality).
Psychologists have traditionally been more interested in the internal mechanisms of behavioural processes, rather than the behavioural outcomes per se. Rational behaviour is distinguished from irrational as a function of the process by which the behaviour became manifest, not by the behaviour itself (akin to Simon's (1976) procedural rationality). An important consequence is that PP-rationality is not understood in terms of observable behaviours, but in terms of internally consistent thoughts and beliefs (Kacelnik 2006). As a result, it is difficult to carry out experimental analyses of PP-rationality, since it involves examination of the cognitive processes involved in producing behaviour. Human subjects can be questioned about their reasons for choosing particular courses of action, enabling a limited level of investigation into beliefs; such investigation is virtually impossible in the study of non-human behaviour (Kacelnik 2006).
In contrast to PP-rationality, E-rationality is predominantly a goal-led concept, within certain bounds. The goal is the maximization of expected utility. E-rationality can therefore be used to predict patterns of observable behaviour (Kacelnik 2006). For the studies of human rationality, utility is most often taken as a financial gain, although there are some notable exceptions (e.g. Silberberg et al. 1991); in non-humans, utility is most often assumed to be linked to food acquisition, but could involve access to water, mates or conspecifics. However, the plasticity of utility as a term brings its own problems. A forager that seems to be failing to maximize the obvious utility may not be behaving irrationally but maximizing a different utility. As long as observed choices can be shown to maximize some form of utility, no matter how bizarre, then a decision maker can be classed as acting rationally. E-rationality suffers from another problem when its underlying axioms do not hold true. Violations of the axioms of transitivity and independence have been repeatedly reported from a variety of literatures (e.g. Busemeyer & Townsend 1993; Rieskamp et al. 2006; and references therein). Some examples are discussed in more detail later. A rigorous logician might argue that violations of the axioms of E-rationality can only mean one of two things: either that animals, including humans, cannot be rational or E-rationality cannot be a reasonable description of behaviour. In response, it might be argued that E-rationality, on the whole, can provide a good account of behaviour and the underlying assumptions are violated only in extreme circumstances. It is indeed true that economic concepts of rationality have proved to be effective and useful in predicting behaviour, but the recent boom in experimental economics (e.g. Hey 2002; Kahneman 2003; Smith 2003), which studies how economic agents actually behave without disregarding deviations from how they ought to behave, illustrates that it has not been totally successful (Hammerstein & Hagen 2005; Kacelnik 2006).
The differences between PP- and E-rationality are relatively straightforward. PP-rationality is a process-based concept dealing with predominantly internal beliefs and not their outcome. On the other hand, E-rationality considers the outcome to be of primary importance, in that utility should be maximized, but the process by which this occurs is not taken to be of great interest. B-rationality, springing mainly from the evolutionary literature, has yet another approach.
Steer & Cuthill (2003) proposed that a direct analogy can be drawn between optimal behaviour in animals and rational behaviour in humans, and that lessons learned by the study of the former can be applied to the latter. Studies in both biological and psychological literatures have tested the premises of E-rationality in non-humans (e.g. Navarick & Fantino 1972, 1974, 1975; Shafir 1994; Hurly & Oseen 1999; Waite 2001a, b; Bateson 2002; Bateson et al. 2002, 2003; Shafir et al. 2002; Waite & Passino 2006). In one sense, B-rationality, also known as ecological rationality (Todd & Gigerenzer 2000; Stephens et al. 2004; Hutchinson & Gigerenzer 2005), can be thought of as a subset of E-rationality, in that it defines a desired outcome, with utility being replaced by fitness, which is a more specific concept. An important difference between fitness and utility is that fitness functions can be measured, in terms of reproductive success, independently of the decisions an agent makes, whereas utility functions are derived from the decisions themselves and therefore are not independent of the choice procedure (Luce & Raiffa 1957; Houston & Staddon 1981; Kacelnik 2006). The processes by which an animal reaches a decision are not of primary importance to B-rationality, again similar to E-rationality. However, there is an added caveat: B-rationality assumes that agents are products of naturally selected processes which have shaped the cognitive and emotional machinery of the decision-maker to behave in a manner such as to maximize fitness. We cannot expect natural selection, having no foresight, to shape organisms to act rationally in all circumstances, but only in those circumstances which it encounters in its natural setting. Therefore, B-rational behaviour (i.e. fitness-maximizing behaviour) might not appear when animals are placed in a novel context. As an example, take an anecdote concerning the ultimate father of B-rationality, Charles Darwin. While on the Galapagos, Darwin noted that a frightened marine iguana (Amblyrhynchus cristatus) could not be induced to enter the ocean by any means other than picking the animal up and tossing it into the waves (something he did a number of times to one unfortunate individual). He wrote ‘perhaps this singular piece of apparent stupidity may be accounted for by the circumstance, that this reptile has no enemy whatever on shore, whereas at sea it must often fall a prey to the numerous sharks. Hence, probably urged by a fixed and hereditary instinct that the shore is its place of safety, whatever the emergency may be, it there takes its refuge.’ (Darwin 1839, ch. 17). Only when the context of the decision changed with the appearance of a land-based agitator did the iguana's behaviour—heading for land given any danger—appear to be irrational. It is clear to see then that B-rationality can be consistent with the use of simple rules of thumb (heuristics; Todd & Gigerenzer 2000) to determine behaviour (McNamara & Houston 1980).
To summarize, PP-rationality focuses on how decisions or beliefs are arrived at, but not necessarily on what the decisions or beliefs actually are. Conversely, the focal point of E-rationality is the decision itself, not the process by which it is achieved. E-rationality assumes that an agent will attempt to act in such a way as to maximize utility, utility being an undefined entity. B-rationality, similar to E-rationality, is also most concerned with the endpoint of a decision-making process, but assumes that an animal will maximize fitness when in a relevant context (Kacelnik 2006). From here on, we concentrate on the interplay of E- and B-rationality.
E-rational preferences are generally assumed to obey a series of conditions, including the following.
Transitivity. In its simplest form this states that preferences are hierarchical, so if option a is preferred to option b and option b preferred to option c, then a will be preferred to c.
Independence from irrelevant alternatives (I.I.A.; Arrow 1951; Tversky & Simonson 1993). The basic idea is that relative preference for one option over another is unaffected by adding or removing options from the choice set (see also Luce 1959, 1977; Luce & Suppes 1965; Rieskamp et al. 2006).
If either of these conditions is violated, behaviour is classed as irrational.
As we will show, existing models of choice may predict seemingly irrational behaviours in particular circumstances. Broadly speaking, models of choice are descriptive (they describe the observed behaviour) or normative (they specify the behaviour that ought to be observed). Normative models are based on the evaluation of behaviour in terms of some measure (e.g. money or reproductive success). These approaches are linked; if rules have been shaped by natural selection, then rules that provide a good description should also make sense in terms of performance. In this paper, we review examples of behaviour that at first sight do not conform to what we might expect from a rational decision maker. We also present new results on a decision principle known as the delay-reduction hypothesis (DRH) and bring out general patterns in the behaviour of humans and other animals.
2. Descriptive models
The wealth of descriptive models that have been put forward is too great for us to review them all here. Instead, we focus on a few descriptive models from operant psychology that have been influential in the study of choice behaviour.
The matching law (Herrnstein 1961, 1970) emerged as a description of how animals in the laboratory choose between options that provide food. The matching law states that, for two options, the ratio of responses which a decision-maker makes on the options equals the ratio of rewards that it has previously received from the options, i.e.(2.1)where Ri is the number of rewards previously received from option i and Bi is either the number of responses previously made or the amount of time previously spent responding on that option. Ri/Bi is the local rate from option i, so the matching law means that local rates are equal. A proportional form of the law is often used, which is as follows:(2.2)
Baum (1974) gave the following generalization of the matching law:(2.3)where s and b are the fitted parameters often referred to as sensitivity and bias, respectively. Houston and colleagues (Houston & McNamara 1981; Houston & Sumida 1987) caution that these parameters should be treated simply as fitted parameters, since their relevance to actual choice mechanisms is not clear. If both s and b equals 1, then basic matching holds, since Herrnstein's original formulation of the matching law adequately describes the data. The generalized matching law, however, effectively allows many behavioural patterns that deviate from basic matching to be explained in terms of generalized matching, but only by fitting the values of s and b a posteriori.
Since its inception, the matching law, especially in its generalized form, has been a successful and popular tool for describing behaviour (e.g. Myerson & Hale 1984; McDowell 1989; Pierce & Epling 1995; Spiga et al. 2005). It has been used to explain behaviours as diverse as wagtail foraging (Houston 1986a) and self-harm in humans (McDowell 1981; however, see Fuqua 1984 who highlights a range of inconsistencies between controlled investigations of matching behaviour and tests of matching in more applied settings). The relationship between matching and neurophysiology has also been discussed (e.g. Sugrue et al. 2004; Soltani & Wang 2006). However, it is important to note that basic matching (equation (2.1) or (2.2)) may not uniquely specify behaviour; in some settings, matching can be produced by a wide range of different behavioural allocations (Houston & McNamara 1981).
Many studies of matching behaviour have investigated the behaviour of animals (including humans) on concurrent variable interval (VI) schedules. These schedules are often used in experimental psychology to test choice behaviour; well over 50 studies have been published in the last 10 years, which have used these schedules. They supply a reward (or stimulus of some sort) following a subject's first response after a given, but variable, delay has elapsed. For example, a pigeon might be confronted with an illuminated disc (or ‘key’) that it can peck. During an initial random delay (the variable interval), any pecks that the pigeon makes on the key are unrewarded. However, once the delay has elapsed, the first peck that the pigeon makes on the key results in a reward being delivered (see Ferster & Skinner 1957 for further information). If two VI schedules are available to a subject at any one time, then this is termed a concurrent VI–VI procedure.
Optimal foraging theory is a normative approach that attempts to explain behaviour in terms of the maximization of fitness (see Stephens & Krebs 1986 for a review). One simple assumption is that maximizing the rate of energetic gain will maximize fitness. Several papers have discussed the relationship between the matching law and rate maximization. It has been shown that maximizing the rate of gain does not necessarily result in matching (e.g. Heyman & Luce 1979; Houston & McNamara 1981; Houston 1983; Heyman & Herrnstein 1986). Given that matching does not necessarily specify behaviour uniquely, it is not possible to say whether matching behaviour maximizes rate of gain. On concurrent VI schedules, an infinite number of behavioural allocations can satisfy matching. Some of these allocations will give rates that are close to optimal (Houston & McNamara 1981). When faced with a VI schedule and a schedule that has a constant probability of giving a reward, matching results in a rate of reward that is well below the optimal (Houston 1983; Heyman & Herrnstein 1986). Behaviour on such schedules can be described by the generalized matching law (Heyman & Herrnstein 1986). This behaviour and the resulting loss in reward rate can be viewed as side effects of a decision rule that evolved in other circumstances.
VI schedules often use a negative exponential distribution of times between rewards. This means that whether a response is rewarded or not gives no information about the time until the next reward becomes available. The form of the best strategy when choosing between two such schedules is to repeat a cycle comprising a fixed time t1 on side 1 and a fixed time t2 on side 2 (Houston & McNamara 1981). In other words, if the animal knows the parameters and is sure that they will not change, it should ignore rewards and get an accurate clock so that it can measure the optimal times t1 and t2. Houston & McNamara (1981) found an exact solution to the problem of maximizing rate of gain, given a choice between two VI schedules when the mean interval of each schedule is known (see also Belinsky et al. 2004). This is not realistic. If we wish to understand the evolution of foraging behaviour, we should be looking for rules that perform well under the range of conditions that an animal is likely to experience (cf. Seth 2007). It should not be assumed that the animal has full knowledge of current conditions. Instead, the animal both learns about and exploits its environment. There are three general features of foraging environments that are relevant here.
Rewards may give information about future rewards. This possibility has been thoroughly investigated in the context of how long to stay in a patch that contains a random number of food items (e.g. Iwasa et al. 1981; McNamara 1982; McNamara & Houston 1987c).
The environment may contain other foragers. A rule that works well for an isolated forager might not work well when that animal has to compete with others. Similarly, a forager that uses a rule that is successful in group situations might perform poorly when in isolation (Seth 2001, 2007).
The environment may change. If a set of environmental parameters is constant, then it is often possible to evolve fixed optimal behaviours. However, these behaviours will become suboptimal, given any change in the environment (McNamara & Houston 1985, 1987b; McNamara 1996; Dall et al. 1999).
To cope with these aspects of the real world, what is needed is an approach that is based on rules that use information from rewards obtained to decide between options. A process called melioration uses previous information and results in outcomes that satisfy the matching law. The idea behind melioration is that an animal increases its allocation to the alternative that gives it the highest local rate. At the stable outcome with both options chosen, local rates are equal, i.e. matching holds. This is really a framework rather than a detailed model; there are lots of ways in which melioration can be implemented. There are also rules that result in matching without using the principle of melioration (e.g. Harley 1981; Houston & Sumida 1987; Seth 1999).
(b) The delay-reduction hypothesis
Like the matching law, the DRH (Fantino et al. 1993) was developed as a description of choice behaviour in the laboratory. The DRH can be used to predict choice on what is known as the concurrent chains procedure. In the simplest case, an animal can respond to one of two alternatives, known as initial links. Each alternative has an associated VI schedule but, in contrast to a standard concurrent VI–VI procedure in which the VIs provide rewards, here, the VIs provide access to terminal links that result in a reward after a delay has elapsed. This access is indicated by a cue that signals the availability of a reward after a certain delay. During the initial link phase, the animal can choose between the two alternatives, but once a terminal link becomes available, the animal must wait until this link ends in a reward. After this has occurred, the animal can again choose between the initial links. There is some resemblance between the chains procedure and an animal that is searching for two food types at once. From time to time, it encounters food items that result in energy after the handling time has elapsed. These items can be thought of as being analogous to the terminal links. Let T be the overall average time to a reward on the concurrent chains procedure. This time depends on both the initial links and the terminal links. To illustrate, assume that each initial link is a VI, with the time to its terminal link having an exponential distribution with a mean of 60 s. Then, the average time for the first terminal link to become available is 30 s. If one terminal link has a delay of 10 s and the other has a delay of 30 s, then the average time on terminal links is 20 s and hence T=50 s (see equation (2.5) for the general equation for T). The start of the terminal link on side i means that food will be available after a delay Di. Thus, the start of the terminal link is associated with a reduction in the expected delay to reward of T−Di. Define ρ to be the proportion of responses to alternative 1, i.e. the number of responses made on the initial link for side 1 divided by the total number of responses made on both the initial links. The DRH for the two alternatives states that(2.4)This is the equation suggested by Fantino (1969). (For modifications in the case of unequal initial links, see Squires & Fantino 1971; Fantino & Davison 1983.)
Fantino & Dunn (1983) point out that the DRH predicts the violation of the principle of independence from irrelevant alternatives (I.I.A.). Fantino & Dunn (1983) and Mazur (2000) found that adding a third option in a concurrent chains procedure could change the preference of pigeons for the initial pair of options. We now give a formal analysis based on the version of the DRH given in equation (2.4). Consider a general case with n initial links, each with an exponential VI schedule with mean interval I. Then, the mean wait until a terminal link first becomes available is Wn=I/n. The DRH states that an animal's preference for an alternative is given by the relative reduction in delay to reward associated with the alternative. If initial link i leads to a terminal link with delay to reward Di, then the overall time to reward is(2.5)
Now, compare the ratio of allocations to options 1 and 2 with and without option 3 being present. In the case of just two options being available,When a third option is added, the allocation is
It follows from these equations that adding a third alternative can change the allocation of responses to option 1 relative to option 2. In other words, we have a violation of I.I.A. The relative allocation may either increase or decrease; the critical value for a third alternative to produce no change is
Otherwise, adding a third alternative does have an effect.
DRH and optimal foraging theory We now explore the relationship between the DRH (a descriptive account) and the optimal foraging theory (a normative account) and present a new result relating the DRH to the costs of deviating from optimality. Consider a foraging animal that searches for food and encounters two types of prey item. The DRH equation for two options (equation (2.4)) applies to cases in which the animal responds on both the initial links. This corresponds to the region of parameter space in which the maximization of rate of energetic gain predicts that both the prey types should be accepted (Fantino & Abarca 1985; Houston 1991). This means that the optimal rate of energetic gain, γ, is the rate resulting from taking both the types. Assuming equal energy content (which can be set equal to 1 without loss of generality) and denoting the overall time to reward by T,
Now, consider an animal that encounters items that can differ in energy and handling time. An item of type i has energy ei and handling time Di. McNamara & Houston (1987a) show that the energetic value of accepting a type i item isAs Houston & McNamara (1999) point out, ei is the energy gained by accepting a type i item and γDi is the energy that could have been obtained by foraging at rate γ for time Di rather than spending this time handling the item. Thus, Hi is the energetic value of accepting the item. It is optimal to accept all item types for which H>0, i.e. for which e/D>γ (Houston & McNamara 1999). As we have already said, the DRH is concerned with parameter values for which both ‘types’ should be accepted. This means that Hi is positive for both the alternatives and is the energy lost if the type is rejected rather than accepted. In other words, Hi is the cost associated with making the error of rejecting type i.
Animals typically make errors of decision and errors are more probable if the cost is low (Houston 1987, 1997; McNamara & Houston 1987a). One possible simple equation that captures this property iswhere ρ is the probability of choosing option 1. If we use equation (2.6) to substitute for H1 and H2 in this equation, then we obtain equation (2.4).
Thus, we have shown that the DRH is linked to optimality and the costs of making errors. This might suggest that it is a good rule—even though it was proposed as, and is still primarily, a descriptive model of choice, it has a normative basis.
So far, we have discussed the DRH when all rewards have the same magnitude. Data from experiments in which terminal links differ in delay and reward magnitude violate a form of transitivity known as strong stochastic transitivity (Navarick & Fantino 1972, 1974, 1975). Houston (1991) shows that a generalization of the DRH to include different reward magnitudes can produce such violations. Houston et al. (2005) extend the analysis to include terminal links in which the delay to reward is variable and show that stochastic transitivity can still be violated when reward magnitudes are constant, but the terminal link durations are variable.
3. Normative models
We expect rules to be ‘good’, i.e. natural selection has resulted in rules that perform well. But animals (including humans) sometimes make bad decisions (‘humans are not rational’—see Sutherland 1994 for a review). Such findings have been explained in a number of ways; we concentrate on two of these explanations.
(a) The outcome is a side effect
What matters is how a rule performs (in terms of reproductive success). Several rules may perform well in the environment in which they have evolved. Their performance may differ in a ‘new’ environment (e.g. the laboratory); some rules may perform very badly in this environment. The debate about whether animals follow the matching law rather than maximizing their rate of energetic gain can be used as an illustration. It is possible that what has been favoured by natural selection is a rule that performs well in environments that do not resemble experimental procedures, and hence, when a forager is presented with such procedures, the behavioural outcome is not what we might expect from an E-rational standpoint. It is important to distinguish between rules and outcomes when considering whether a particular behaviour is rational or not. We may well be able to liken some behavioural outcomes to spandrels (sensu Gould & Lewontin 1979), in that they are non-selected by-products of a decision-making mechanism (rule). Take risk sensitivity for example; although there are various normative explanations for the appearance of risk-sensitive responses (e.g. McNamara & Houston 1992; Houston & McNamara 1999), it has also been argued that risk sensitivity can appear as a side effect of a forager using simple learning rules (e.g. Kacelnik & Bateson 1996; March 1996). Arkes & Ayton (1999) suggest that some errors in human reasoning result from overgeneralizing a rule that is reasonable in many contexts. (For a general discussion of rules and side effects, see McNamara & Houston 1980.)
Decision framing is another important factor. There are lots of ways to provide an animal with a particular set of options, all of which have the same mathematical characterization, but which may result in systematic differences in behaviour. The general point is that the details of the experimental procedure can be important (e.g. Shettleworth 1989; Savastano & Fantino 1994; Heyman & Tanz 1995). Shettleworth & Jordan (1986) found that rats preferred receiving sunflower seeds in the husk and removing the husks themselves to simply waiting for a ‘handling time’ to elapse before being presented with a dehusked seed. Similarly, different time periods within an experimental situation seem to be treated with different degrees of importance by foragers; for example, inter-trial intervals are often found to be unimportant for guiding choice behaviour, whereas the delay between making a response and receiving food is extremely important (e.g. Kacelnik & Bateson 1996; Stephens et al. 2004). When seemingly irrational behaviours appear, especially in experimental situations, we always need to ask whether it is the outcome of a well-adapted rule misfiring in a novel environment.
(b) We were wrong about what is optimal
The idea here is that the context in which optimal decisions are viewed is too simple and may ignore elements that add extra degrees of freedom to the situation. We now present a range of examples.
(i) Uncontrolled variation in state
Studies of human irrationality tend to concern one-shot decisions; there can therefore be no differential accumulation of rewards between treatments. This is not always the case for non-humans. Schuck-Paim et al. (2004) highlight several cases where analogies have been drawn between similar ‘irrational’ behaviours in humans and non-humans; however, they claim that the underlying choice mechanisms are fundamentally different. Their findings hold considerable implications for the comparison of choice behaviour across species. Schuck-Paim et al. (2004) show that, at least in some cases, seemingly irrational behaviour in animals can be explained purely as a function of state-dependent preferences. As examples, we discuss work on grey jays (Waite 2001a) and rufous hummingbirds (Bateson et al. 2002).
Grey jays (Perisoreus canadensis) were trained in one of two contexts, both involving choosing between two foraging patches. The patches consisted of a tube in which raisins were placed at different distances from the entrance; increased distance was assumed to correlate with increased perceived predation risk. In context A, the jays were offered a series of choices between one and three raisins placed 0.5 m along separate tubes. Birds in context B were offered a series of choices between two identical options: two tubes with a single raisin placed 0.5 m from the entrance. All birds were subsequently offered a choice between three raisins placed 0.7 m along a tube and one raisin just 0.3 m from the entrance of a second tube. Preference for the larger, but riskier, reward was higher among birds which had been trained in context B. The findings were taken as indicating departures from value maximization as a result of cognitive biases arising in consequence of the choice context. The results were seen as mirroring framing effects (Tversky & Kahneman 1981), in that the same decision was presented to different individuals but in a different scenario, the scenario affecting the final decision. Waite (2001a) compared the results with that of the trade-off contrast hypothesis (Tversky & Simonson 1993). This predicts that an individual will be more likely to choose a low-quality, cheap item over a high-quality, expensive one if the individual has already experienced a choice between items of similar quality, but with a smaller difference in cost. However, as Schuck-Paim et al. (2004) pointed out, it was not just the previous context in which decisions had been made that differed between the two treatment groups. During the initial phase of the experiment, the jays in context A had gained more than twice as many rewards as those individuals in context B. The problem therefore can be thought of in terms of energy–predation trade-offs the kind of which have been extensively discussed in behavioural ecological systems (e.g. Houston et al. 1993; Houston & McNamara 1999; Cuthill et al. 2000).
Many species of animal can only increase their rate of energetic gain by also increasing their probability of being killed by a predator (see Lima 1998 for a review). To predict choice, it is necessary to know the value of energy (gain from foraging) and the value of life (lost if killed) (Houston & McNamara 1988, 1989). These will typically depend on the animal's state. In general, an animal should accept a risk in order to obtain energy when reserves are low (Houston & McNamara 1988; McNamara 1990; Clark 1994). The lower energetic state of the individuals in context B means that they should have been more prepared to take the greater risk to achieve the higher pay-off than the context-A individuals that were more sated. Similarly, seemingly irrational behaviour was reported from rufous hummingbirds (Selasphorus rufus; Bateson et al. 2002). The hummingbirds changed their relative preferences for two options in the absence or presence of a decoy option, which provided a lower rate of gain than either of the other two options. Once again, however, the rate of energy gain differed between the two conditions, which could well have led to alterations in choice behaviour consistent with rational theory (Schuck-Paim et al. 2004). Giving weight to their argument, Schuck-Paim and her colleagues showed experimentally how seemingly irrational decisions in European starlings (Sturnus vulgaris) disappeared when energetic rates were equalized across treatments.
(ii) Future expectations
The best action to choose at present depends on future expectations (McNamara & Houston 1986; Houston & McNamara 1999 and references therein). For example, whether or not an animal should take risks in terms of predation in order to obtain food depends on whether food is likely to be plentiful and easy to obtain in the future. Adding an option to the set of available options changes what is possible in the future, and hence can change future expectations, even if it is not optimal to choose the additional option now. Thus, even if the new option is not chosen when added, its presence can change the current optimal choice. We give two examples of this effect.
Errors. Suppose that there are errors in decision making, with costly errors being rare. Then, future gains depend not only on the preferred option, but also on other options that are mistakenly chosen. Adding a suboptimal option now will thus affect future expectations, because this option is likely to be wrongly chosen in the future. This means that the value of an option depends on the context (i.e. on the other options that are available). As a result, violations of transitivity may occur (Houston 1997) and violations of I.I.A. may also occur.
Possible future states. Schuck-Paim et al. 2004 show that uncontrolled variation in state can produce behaviour that appears to be irrational. Houston et al. (2007) show how state-dependent effects can produce apparently irrational behaviour even when an animal's choice is measured in the same state. Consider an animal choosing at discrete times between foraging options that differ in terms of expected energetic gain and risk of predation. The animal's state is its level of energy reserves. The animal dies of starvation if its reserves fall to zero. Option A provides little food but has no associated risk of predation. Option B provides slightly better (but still not good) food and involves a risk of predation. Option C provides good food and involves the same risk of predation as option B. Thus, in this model, the animal is faced with options that differ in terms of energetic gain and risk of predation. Houston et al. (2007) consider three environments. In the first environment, options A and B are available. In the second environment, options B and C are available. In the third environment, options A and C are available. In each case, Houston et al. (2007) use dynamic programming to find the strategy that maximizes long-term survival (cf. McNamara & Houston 1990b). As we would expect, this strategy is state-dependent, i.e. the optimal decision depends on the level of energy reserves. When options A and B are available, it is optimal to choose A when reserves are very high and B otherwise. When options A and C are available, it is optimal to choose A unless reserves are low, in which case C is chosen. There is an intuitive explanation for these results. In this example, option A is safe but has a low yield. Thus, the option is used when energy reserves are high. When energy reserves are low, the animal should take risks in order to obtain a higher rate of energy gain. When option A is present with option C, because C has a higher yield than B the animal can afford to delay using this risky option until its reserves fall to a lower threshold than if B had been present rather than C. Finally, when options B and C are available, option C is always chosen because it yields more food than B and has the same predation risk. From these three results, we see that at intermediate levels of reserves, B is preferred to A when these are the two options available, C is preferred to B and A is preferred to C. In other words, transitivity is violated.
We have given two examples of violations of transitivity, one based on errors (Houston 1997) and the other based on state (Houston et al. 2007). The common principle is that all available options influence choice because they have an effect on future expectations. It is important to emphasize that these examples are based on the assumption that all options currently available will also be available in the future. Options are linked to the future either because the animal makes mistakes and hence may choose a suboptimal option or because stochastic changes in state may take it to a state in which an option that is not currently chosen should be used. If this sort of analysis is to be relevant, animals must expect some degree of persistence of options into the future. Whether we might predict animals to have this view will depend on the sort of environment in which they have evolved.
(iii) There may be more freedom in behaviour than originally anticipated
Activities often have positive and negative effects. As a result, an animal may be faced with a situation in which it has to trade-off these effects. We have mentioned above that getting food may expose an animal to predators. This may mean that there is a trade-off between energy and predation. We are now interested in the consequences of changing some aspect of an animal's environment. We draw attention to the contrast between results if behaviour is fixed and results if the animal is free to change its behaviour. The same basic principles can be seen in a range of examples from humans and other animals. Improvements in safety, such as the introduction of seatbelts or airbags in cars, provide an illustration. If the behaviour of road users does not change, then there should be a reduction in injury level. The changes may, however, lead to people changing their behaviour, typically by behaving in a more dangerous way. This change may be strong enough to result in an increased level of injury (for data and discussion, see Peltzman 1975; Keeler 1994; Peterson & Hoffer 1994). Similarly, the obvious response to traffic congestion—building more roads—may cause an increase in traffic (e.g. Noland & Lem 2002; Cervero 2003; Goodwin & Noland 2003) to the extent that traffic jams are worse than they were before. Thus, when animals are free to change their behaviour, the consequences may be an effect on performance that is the opposite of what was expected (and desired). A physiological example concerns whether a bit of dirt is a good thing. Dirt may have a negative direct effect on disease, but a positive indirect effect through changes in the immune system. The ‘hygiene hypothesis’ (Strachan 1989) states that improved hygiene (and reduced family size) have reduced the extent to which children are exposed to infectious agents. The result is a change in the immune system that renders it more likely to give rise to allergic responses such as asthma (see Yazdanbakhsh et al. 2002; Romagnani 2004; Christen & von Herrath 2005 for further discussions). Tenner (1996) calls effects like these revenge effects. In analysing such examples, two approaches can be adopted. In one, behaviour in response to the change is ‘given’ (i.e. we adopt a descriptive approach). An alternative is to derive the behavioural response from considerations of optimality (i.e. we adopt a normative approach). This second approach is adopted in models that derive the behaviour of humans from the maximization of utility. Blomquist (1986) and Janssen & Tenkink (1988) use utility maximization to derive the dependence of a driver's behaviour on a parameter that corresponds to the level of a safety measure. They show that the effect of an improvement in safety may be substantially reduced by changes in behaviour. We now demonstrate the normative approach in other contexts, starting with examples based on the trade-off between obtaining energy and avoiding predators.
There are two ways of looking at optimal response to a change: (i) change in optimal behaviour (e.g. driving speed in the model of Janssen & Tenkink (1988)) and (ii) change in some aspect of performance, such as mortality. We show that the relationship between various environmental factors and both the optimal response and the resulting performance may not always be obvious.
Changes in optimal behaviour. Consider the following example (see also McNamara & Houston 1994): a forager has to reach a critical size in order to reproduce. How should it respond to a permanent change in the predation level? The answer depends on how behaviour and predation interact to determine mortality. Assume that the animal has the choice of how hard it works to obtain food. Denote this rate of work by u. The animal's rate of intake of food is proportional to u. Its predation rate M is given bywhere the function N(u) is increasing and accelerating and determines how predation rate changes with u for a given density of predators (m0) and μ is a background mortality rate. There is thus a trade-off between gaining food and predation risk. We denote the optimal rate of working for food by u*.
If a change in predation is a result of an increase in m0, u* decreases. This is because there will be a marked increase in predation if foraging intensity is high; therefore, the best response is to adopt a less dangerous foraging behaviour. In contrast, if μ increases, u* increases. This is because the same increase in predation rate is imposed on all foraging options. Therefore, the best response is to reduce the time exposed to predation by growing faster (i.e. by working harder for food). This shows that the effect of an increase in danger may result in either an increase or a decrease in how hard the animal works for food.
Changes in the performance. When animals can trade-off energetic gain against predation risk, the effect of a change in the environment may be counter-intuitive (e.g. McNamara & Houston 1990b; Abrams 1993). For example, an increase in food availability can lead to a decrease in food intake or an increase in starvation (McNamara & Houston 1987d; 1990a; see McNamara & Houston 1994 figs. 2 and 3; also Houston & McNamara 1999). The effects arise because the animals change their behaviour in adaptive ways (see McNamara & Houston 1994 for a review). We now give some examples of changes in the performance as a consequence of adaptive behaviour.
McNamara & Buchanan (2005) model a situation in which an animal is exposed to a stressor, such as cold or high predation risk, for a period of time. During this period, the animal can choose the level of available resources to direct against the stressor. The more the animal diverts to combating the stressor, the less probable the stressor is to kill the animal. However, diverting resources from essential maintenance reduces the condition of the animal. As condition decreases, the probability of death from disease increases. Thus, the animal faces a trade-off between dying from the stressor or from the effects of poor condition. McNamara & Buchanan (2005) find the allocation of resources to combating the stressor that maximizes the probability that the animal will survive. They show that if the likelihood of death from disease at a given level of condition is decreased, the animal allows condition to deteriorate much more. The result is to increase the likelihood that the animal will die from disease. In this model, much of the mortality from disease occurs during recovery of condition after the stressor disappears.
Failure to take account of the fact that behaviour is flexible may make it difficult to detect important costs. If we vary a factor, an animal may respond by changing its behaviour or morphology in a way that we have not anticipated. This makes it hard to detect the direct effect of the change. We give an example based on the diving behaviour of the animals that hunt for food underwater and return to the surface to breathe (e.g. puffins, otters). A dive cycle starts with the animal at surface. The animal travels to the foraging area at a particular depth, forages there for a time t and then returns to the surface where it spends a time s gaining oxygen. The amount of oxygen gained is a decelerating function of time at the surface. The total time travelling between the surface and the foraging area is τ. A simple approach assumes that the animal should maximize the proportion of time spent in the foraging area, subject to the constraint that the diver balances its oxygen over the cycle (Kramer 1988; Houston & Carbone 1992). The rates of oxygen use are mt and mτ during foraging and travelling, respectively. If τ is increased with t fixed, then the oxygen constraint means that s is an accelerating function of time underwater. But if the animal is free to adopt the behaviour that maximizes the proportion of time spent in foraging, then as τ increases, s may be approximately proportional to time underwater (Houston & Carbone 1992; McNamara et al. 2001). Thus, the cost (the effect of a unit increase in τ on s as τ increases) is not apparent when the animal is able to adjust its time budget. Houston et al. (2003) investigate the behaviour of a diving animal that can catch only a single item when hunting underwater. They show that if the diver maximizes its rate of energetic gain while hunting for items of two types, then the success of a dive (i.e. the probability of returning to the surface with an item) is not a good indicator of the quality of the environment. For example, as the probability of finding the better type of item increases, the success of the dive may first increase, then decrease and then increase again.
It is often assumed that predation risk in birds depends on the fat load because heavy birds will be less agile. It is, however, hard to detect the effect of fat on predation because a bird may change both its behaviour and body composition. As fat loads increase, a bird may adopt safer foraging options. As a result, predation may decrease with mass (Welton & Houston 2001). Another response to an increased fat load is an increase in muscle mass, allowing greater agility and thus preventing an increase in predation.
These examples show that care is needed in choosing the variables that will be measured. If an important variable is not measured, then results may be misleading. For example, in the case of the diver, a better understanding of costs can be obtained if time at the surface is related to time travelling, τ, and time foraging, t, rather than to total time underwater, τ+t.
(iv) Fluctuating environments and biased probabilities
It might seem obvious that natural selection should always result in organisms having an accurate view of the world. Models based on evolution in a certain kind of stochastic environment show that this is not the case. Chance acts on many scales in the natural world. At the finest scale, demographic stochasticity describes the good and bad luck that affects individual population members, independently of other population members. At the other extreme, environmental stochasticity concerns fluctuations in the environment as a whole. These fluctuations, which might be due to weather or changes in population size, affect all population members in a similar manner.
Consider first the situation where there is demographic stochasticity, but no environmental stochasticity. Demographic stochasticity will affect the lifetime reproductive success (LRS) of individuals in the population. Let p(x) be the probability that the LRS of a particular individual is x. Then, the mean LRS of this individual iswhere this mean is an average over demographic stochasticity. The quantity r is the standard fitness measure in this situation and tends to be maximized by the action of natural selection.
Now suppose that there is also environmental stochasticity. The LRS of an individual will then depend on both demographic good and bad luck and on the state of the environment. Let p(x|s) denote the probability that the LRS of the individual is x when the environmental state is s. The mean LRS when the environmental state is s is thus
In this situation, the standard measure of fitness is the geometric mean, G, of r(s), where the mean is an average over the environmental state s. Equivalently, fitness can be taken to be g=log G, the logarithm of G. This fitness measure can be expressed aswhere f(s) is the probability that the environmental state is s. The quantity g tends to be maximized by the action of natural selection.
Now suppose that the above population is at evolutionary stability, so that population members are maximizing the fitness measure g. Let r*(s) denote the expected LRS of population members when the environmental state is s. Then, it can be shown that population members are also maximizingwhere f* is a certain probability distribution of environmental states. Under this distribution, the probability, f*(s), of state s is proportional to f(s)/r*(s). As this formula shows, the distribution f* distorts the true probability distribution f, giving extra weight to environmental states for which population members do badly and reducing the probability of environmental states for which population members do well. Thus, population members are maximizing their average LRS (averaged over environmental stochasticity), but the average is based on biased probabilities (McNamara 1995; cf. Haccou & Iwasa 1995; Sasaki & Ellner 1995). For the link between this approach and a general account of optimization under the action of natural selection, see Grafen (1999).
Cooper (2001) argues that evolution results in rational choice as summarized by the laws of logical thought. A limitation of Cooper's general line of argument is that it is based on choices made by rules that are optimal for particular conditions. We have stressed, however, that problems arise when animals are faced with novel environments (Darwin's iguana, see also McNamara & Houston 1980; Shettleworth 1985). Given that animals follow robust rules, side effects (often referred to as spandrels in evolutionary psychology, e.g. Buss et al. 1998; Hampton 2004) will be ubiquitous. A related point is that we cannot just determine optimal behaviour for the environment of the laboratory (Houston & McNamara 1989, 1999). An animal in the laboratory may be safe from starvation and predation, but it does not ‘know’ this. It presumably follows rules that evolved to cope with these threats and deal with competition and changes in the environment. Matching is not optimal in some of the procedures that are used in laboratory experiments; it may be that matching is a side effect of decision rules that perform well in a broader context (see also Seth 2007). We have shown that a version of the DRH can be related to the maximization of rate of energetic gain, given that errors occur but costly errors are rare. Our result suggests that although the DRH may not be strictly optimal, it is likely to be a good principle, given that errors occur. The interaction between options that is captured by the DRH can be understood in terms of decisions that are subject to error. From this view, some aspects of the DRH may have appeared to be irrational because we had a limited conception of optimality.
Previous work (e.g. Tversky & Simonson 1993) has presented models based on plausible psychological principles that can describe irrational behaviour. In this paper, we have attempted to construct links between descriptive and normative accounts. In addition to showing that the DRH emerges from optimal decision-making subject to errors, we have pointed out that intransitive choice can result from optimal behavioural mechanisms when decisions depend on state and options persist into the future. This result does not rely on uncontrolled variation in state. It emerges owing to the effect that options have on future expectations. This general principle deserves further investigation.
We have drawn attention to common themes that arise in the study of humans and other animals, but many analogies have not been explored. For example, the model of driving speed investigated by Janssen & Tenkink (1988) is analogous to models of optimal flight speed (e.g. Norberg 1981; Houston 1986b). Whether this resemblance is productive remains to be seen. An area in which a unified account might be useful is optimal defence. In the context of military history, we might be interested in how the builder of a castle should allocate resources to structures that improve the strength of the castle and to features that improve its appearance and hence the prestige of the builder, and consequently the number of descendants that he leaves. Analogous issues arise in several biological contexts, including the interactions between predators and prey (e.g. Abrams 1986; McNamara et al. 2005), the way in which plants defend themselves against herbivores (e.g. Adler & Karban 1994; VanDam et al. 1996), the evolution of diseases and the defences against them (e.g. Frank 1996; Shudo & Iwasa 2001, 2004; Medley 2002; Day & Proulx 2004; van Boven & Weissing 2004) and the defence of a social insect colony against attack (e.g. Oster & Wilson 1978; Aoki & Kurosu 2004). Adler & Karban (1994) make the military analogy explicit and Jokela et al. (2000) present ‘steps towards a unified defence theory’, but we suspect that further work on a synthesis of these areas would be instructive.
We also think that an approach to decision making adopted by Fawcett & Johnstone (2003) could be extended. They investigate a model of optimal choice when an animal chooses between objects on the basis of more than one cue. These cues can differ in reliability of the information that they provide and in the cost of assessing them. This sort of approach may have broad implications for the understanding of apparently irrational behaviour.
The area that we have addressed is vast and our coverage has been highly selective; many issues have not been considered. For an entry to some of the topics that we have not discussed, see Bernardo & Welch (2001) and Robson (2002, 2003).
We thank Alex Kacelnik, Anil Seth and two anonymous referees for their comments on the previous versions of this manuscript. A.I.H. and J.M.McN. were supported by Leverhulme Trust fellowships and M.D.S. by a BBSRC studentship.
One contribution of 15 to a Theme Issue ‘Modelling natural action selection’.
- © 2007 The Royal Society