## Abstract

Metabolic rate in animals and power consumption in computers are analogous quantities that scale similarly with size. We analyse vascular systems of mammals and on-chip networks of microprocessors, where natural selection and human engineering, respectively, have produced systems that minimize both energy dissipation and delivery times. Using a simple network model that simultaneously minimizes energy and time, our analysis explains empirically observed trends in the scaling of metabolic rate in mammals and power consumption and performance in microprocessors across several orders of magnitude in size. Just as the evolutionary transitions from unicellular to multicellular animals in biology are associated with shifts in metabolic scaling, our model suggests that the scaling of power and performance will change as computer designs transition to decentralized multi-core and distributed cyber-physical systems. More generally, a single energy–time minimization principle may govern the design of many complex systems that process energy, materials and information.

This article is part of the themed issue ‘The major synthetic evolutionary transitions’.

## 1. Introduction

Both organisms and computers have evolved from relatively simple beginnings into complex systems that vary by orders of magnitude in size and number of components. Evolution, by natural selection in organisms and by human engineering in computers, required critical features of architecture and function to be scaled up as size and complexity increased. In biology, Kleiber's Law describes empirically how metabolic rate and many other traits, such as lifespan, heart rate and number of offspring, scale with body size [1]. Similarly, computer architecture has Moore's Law to describe scaling of transistor density and performance [2], Koomey's Law for the energy cost per computation [3], and Rent's rule for the external communication per logic block [4].

We posit that these empirical patterns originate from a common principle: networks that deliver resources are optimized to reduce energy dissipation and increase flow rates, expressed here as minimizing the energy–time product. That is, both living systems and computer chips are designed to maximize the rate at which resources are delivered to terminal nodes of a network and to minimize the energy dissipated as it is delivered and processed. For example, in biology the vascular network of mammals supplies oxygen and nutrients to every cell, fuelling metabolism for maintenance, growth and reproduction. Since energy is a limited resource, we assume that mammals are selected to minimize the time spent and energy dissipated as oxygen is delivered through the network [5] and processed to produce ATP in the mitochondria. Similarly, computation in microprocessors relies on a network of microscopic wires that transmits bits of information between transistors on a chip. This network is designed to deliver the maximum information flow at the lowest possible energy cost.

Here, we model mammals as composed of nodes (regions of tissue) that process oxygen delivered via a hierarchical vascular network, and we model microprocessors as composed of nodes (transistors that perform computation) that communicate bits over a network of wires. As each system scales up in size, our model identifies network designs that minimize (i) the time for resources to be delivered by the network and processed in the nodes, and (ii) the energy dissipated during these processes. Despite the obvious differences between animals and chips, we present a general model and derive energy and time-scaling relations from physical principles applicable to each system. Using these relations, we express the optimal network design as a trade-off between energy cost and processing speed. This energy–time minimization model is consistent with shifts across the major evolutionary transitions, such as the transition from protists to multicellular animals and the transition from single- to multi-core computer chips. It also points to likely future trajectories of the evolution of computer architecture and to possible extensions of metabolic scaling theory to account for sociality.

Previous biological scaling models have sought either to minimize energy dissipation, e.g. [5], or to maximize resource delivery rate [6], but they have not formalized the trade-offs between these goals. By simultaneously considering energy and time minimization, our analysis helps to explain how nature and engineering are able to produce designs that approach pareto-optimality along the energy–time trade-off, a question investigated extensively in computer architecture (e.g. [7,8]). Thus, biological evolution has produced mammals ranging in size from mice to elephants, rather than converging on a single optimal size, and computer engineers have designed processors with thousands to billions of transistors, each of which fills a specific computational niche.

In the rest of the paper, we present the unified energy–time minimization model (§2) and its assumptions (§2a). We then use the model to derive a series of predictions about how time and energy scale with system size, first for mammals (§3a,b) and then for microprocessors (§3c). We discuss new insights into previously analysed scaling relationships in biology that we gain from the time–energy minimization framework, and we test our scaling predictions with empirical power and performance data on computer chips. Finally, in §4, we discuss the implications of these results for evolutionary transitions in nature and engineering.

## 2. Unified model of network scaling

Vascular systems are hierarchical branching networks where blood vessels (pipes) become thicker and longer through the hierarchy from the capillaries to the aorta. Similarly, microprocessor chips are organized hierarchically into a nested structure of modules and submodules, where wires become longer and thicker as the hierarchical level of a module increases (figure 1). These wires are organized into metal layers, where short, thin wires are routed on the lowest layers, and long, thick wires are placed on the top layers. We model the scaling of length (*l*) and thickness (*r*) of both pipes and wires as
2.1and
2.2where *i* is the hierarchical level of a branch or module, *λ* is the branching factor and *D*_{l} and *D*_{r} are the length and thickness dimensions, respectively. This model resembles the hierarchical pipe model of vascular systems proposed in [5], where and correspond to *β* and *γ*, respectively, in [5] (note that in [5], the aorta or top of the network is labelled as level 0, while here the smallest branches, the capillaries, are labelled as level 0).

In vascular networks, *r* represents the radius of cylindrical pipes, and in computer interconnect, *r* represents the width of wires with aspect ratio 1. *D*_{r} describes the relative radius of pipes between successive hierarchical levels. The smallest edges occur at *i* = 0, and have constant radius, *r*_{0}, but length, *l*_{0}, that scales with system size [6].

The length parameter *D*_{l} is determined by the spatial dimension occupied by the nodes of the network [9]. For chips, *D*_{l} = 2, since transistors are placed on a single two-dimensional layer [10]; for three-dimensional organisms, *D*_{l} = 3. Because the length of a vessel defines the radius of a three-dimensional volume of tissue supplied by that vessel, each successive vessel in the hierarchy also scales according to equation (2.1) with *D*_{l} = 3 [5,6]. Similarly, the length of each successive wire on a two-dimensional chip defines the area to which that wire delivers signals [11]. Thus, in the simplest networks that efficiently deliver resources homogeneously throughout a volume or area, *D*_{l} describes both the relative length of pipe between successive hierarchical levels and the physical dimension of the system. For example in figure 1*c*, where *λ* = 2 and *D*_{l} = 2, wires are 2^{1/2} = 1.41 times longer when they connect to successively higher modules in the hierarchy.

Digital circuits scale in a third way beyond length and radius, which has no direct analogue in mammalian cardiovascular networks. Digital circuits are partially *decentralized*, with networks that connect multiple sources and destinations, while vascular networks are centralized, with blood flowing from a single heart. In vascular networks, each pipe branches at each hierarchical level forming a tree structure (in the simplest case with *λ* = 2 forming a binary tree). Chips, however, have many connections within each level of the network, and the number of these connections varies systematically with the hierarchical level. To account for this difference, we introduce a new equation, in which the communication (or number of wires) per module increases with the hierarchical level as
2.3where *D*_{w} is the communication dimension and *w*_{0} is the average number of wires per node. This hierarchical scaling of communication is a well-known pattern in circuit design called Rent's rule [4], where *p* = 1/*D*_{w} is Rent's exponent.^{1} This pattern is not unique to circuits and has been shown to occur in many biological networks [12–15]. Vascular systems correspond to a special case, where *w _{i}* = 1 for all

*i*.

### (a) Assumptions of the unified model

Before presenting the model and deriving scaling predictions, we state the model's assumptions and how they relate to earlier models, both in computation and biology:

(1)

*Time and energy are equally important constraints*. System designs seek to deliver the maximum quantity of resource per unit time for the minimum quantity of energy expended. In computer architecture, this relationship is expressed as the ‘energy-delay product’, which formalizes the insight that a chip that is 10 times faster or 10 times more energy efficient is 10 times better [16]. In synchronous systems, clock speed (delay between clock ticks) determines the maximum rate at which the system can compute.(2)

*Steady state*. Resource supply matches processing demand [6,17]. That is, the network supplies resources continually to the nodes and is always filled to capacity. This avoids network delays and the need to store resources in the system. Specifically,(a) System designs balance network delivery rates with node-processing speeds, so that resources are delivered at exactly the same rate that they are processed.

(b) Pipelining: a concept from computer architecture in which resources, e.g. computer instructions, leave the source at the same rate that they are delivered to the terminal nodes and the network is always full. Consequently, resources (oxygen molecules or bits) flow through the network continually without bottlenecks, and they do not accumulate at the source, sink or intermediate locations.

(3)

*Terminal units and service volumes*. We follow previous scaling models of biology, which posit that the service volume (the volume of tissue that is supplied by a single terminal unit of the network) increases with system size and has a fixed metabolic rate [5,6]. In contrast to [5], we do not assume that terminal branches of the vascular network have fixed size. Following [6], we assume that the length (*l*_{0}) of the terminal branches of the network (e.g. capillaries) is proportional to the radius of the service volume. We also follow the assumptions in [6] that the capillaries have fixed radius, and that the speed of flow (*u*_{0}) through the service volume is proportional to its length, so that the rate of arrival of oxygen molecules to mitochondria in the service volume is constant across mammals. In chips, transistor size has shrunk over many orders of magnitude over the past 50 years. Similar to the length scaling of the service volume in mammals, the radius of the isochronic region (the service area) for chips scales proportionally with decreasing transistor size [11]. Thus, service regions are*smaller*in more powerful chips (which have more transistors), but they are*larger*in larger animals. We refer to the service volumes in mammals and the service regions on chips as*nodes*.

In addition to these general assumptions, we make the following refinements to accommodate salient differences between biology and computer architecture.

(a) In biology, the energy processed by a node (

*E*_{node}) is invariant with system size. That is, as the size of a service volume increases with body size, the total amount of energy it processes remains constant. We do not make this assumption for chips.(b) Component packing: in chips, we assume that total chip area is constant, and the number of transistors (

*N*) is the square of the process size, i.e. the length of one side of a transistor.

In biology, it is known that blood flow slows by several orders of magnitude as it travels from the aorta to the capillaries [5]. Earlier scaling models have generally not characterized this slowing [5,6], but our equations include velocity as an explicit term to highlight where it affects time and energy scaling. Here, we model *D*_{r} as constant within an organism so that blood slows continuously from the heart to the capillaries. We also model *D*_{w} and *D*_{l} as constant. Because rates of blood flow, oxygen delivery and ATP synthesis can be converted one to another by a simple conversion constant, we treat them interchangeably in our scaling model.

## 3. Model predictions for mammals and microprocessors

We define *E*_{net} and *T*_{net}, respectively, to be the energy dissipated and the time taken by the network to deliver a fundamental unit of resource to each node. For mammals, the resource is oxygen (in mammals, carried by a unit volume of blood), and for computers, the resource is a bit of information. Similarly, we define *E*_{node} and *T*_{node} as the energy dissipated and the time taken by the nodes to process that resource. For mammals, the node is the service volume corresponding to a region of tissue supplied by a single capillary [6], which corresponds to a volume of tissue containing a constant number of mitochondria [18], the organelles that process oxygen molecules to generate biologically useful energy in the form of ATP. A node is defined as having a constant rate of delivery of oxygen and processing of oxygen, but the volume of a node varies with organism size.

*E*_{net} is the energy required to deliver oxygen to the cells (as analysed in [5]), and *E*_{node} is the energy dissipated by cells processing incoming oxygen. *T*_{net} is the time delay between delivering each oxygen molecule to the cell, and *T*_{node} is the time taken for the cell to process each oxygen molecule. From the steady-state assumption, *T*_{net} = *T*_{node}, i.e. supply matches demand as in [6].

In microprocessors, the nodes are transistors, and *E*_{net} and *E*_{node} represent the energy dissipated as bits are delivered to transistors and the energy required to process the bits at the node. *T*_{net} and *T*_{node} are the times required to deliver and process a bit at the node (i.e. network and transistor switching delay). In computers, the time taken to deliver and process bits is bounded by max(*T*_{net}, *T*_{node}), i.e. a node cannot process another bit until the bit is delivered, and a node cannot process a new bit until the node has finished processing the previous bit. For both mammals and microprocessors, we define the total energy as the sum of energy dissipated in the network plus the energy dissipated in the nodes: *E*_{sys} = *E*_{net} + *E*_{node}.^{2}

In the following, we derive general scaling relationships between *E*_{net}, *T*_{net}, *E*_{node} and *T*_{node}, and the number of nodes *N*, under the assumption that the energy–time product is minimized. *N* is our measure of system size (number of capillaries or number of transistors). In mammals, larger *N* implies larger organism volume and mass. For computer chips, *N* increases by shrinking components, and so increasing *N* does not imply increasing chip area, which we assume to be constant.

The hypothesis that mammals and computers minimize the energy–time product predicts that optimized system designs will achieve the highest performance per cost, where performance is given by flow and cost by energy expended. To show this mathematically, we express the optimal network design as a constraint optimization problem in which the whole system's energy–time product is minimized as
3.1We derive expressions for *E*_{sys} and *T*_{sys} for mammals (§3a) and microprocessors (§3c) in terms of the dimensions *D*_{r}, *D*_{w} and *D*_{l}, where *D*_{l} is fixed by the external dimensions of the system.

### (a) Mammallian cardiovascular network

In this section, we derive general-energy and time-scaling relations for the cardiovascular network and nodes, and then use them to minimize equation (3.1). We first define scaling relationships for the four key quantities: (i) *E*_{net}, (ii) *E*_{node}, (iii) *T*_{net}, and (iv) *T*_{node}, and then show how they scale with *N* when equation (3.1) is minimized. In contrast to computer scaling, several theoretical scaling models have been proposed for animals over the last century (e.g. [5,6,19–21]). The influential West *et al*. [5] model predicted scaling relationships by minimizing energy dissipation, whereas an alternative model [6] maximized metabolic rate by minimizing the time to deliver oxygen. Not surprisingly, scaling models that assume different optimization principles make different predictions [22]. Our model combines both energy and time constraints into a single framework.

(i) *E*_{net}. From basic principles of hydraulics, the energy dissipated to transport a constant volume of blood through the network is given by the loss in pressure from the aorta to the capillaries multiplied by the volume being transported. The loss in pressure is the product between hydraulic resistance (*R*) and flow (*Q*), so Δ*P* = *RQ*. Thus,

(ii) *E*_{node}. Following [5,11], we assume that the quantity of energy dissipated to metabolize a fixed quantity of oxygen in each node is constant. Therefore, the energy summed over all nodes is

(iii) *T*_{net}. The time to deliver a fixed number of oxygen molecules to the nodes is given by the volume of blood being transported divided by the flow (*Q*). Since a constant volume is delivered to each node in parallel, we consider the volume being distributed per unit time to all nodes, giving

There is no distance term in the *T*_{net} equation. This is because *T*_{net} is defined as the time to deliver the ‘next’ oxygen molecule from a capillary, consistent with the steady-state assumption. It is not the time it takes a single molecule to traverse the network (i.e. it is not *τ* in [6]), but rather the inverse of the rate at which oxygen molecules are delivered to the nodes, analogous to the inverse of clock speed in computer chips.

(iv) *T*_{node}. From the steady-state assumption,

Substituting these relationships into equation (3.1) (where *E*_{sys} = *RQ* + *N*, and ) gives
3.2

We now show how *R* and *Q* scale with *N*. The resistance of a pipe is given by the well-known Hagen–Poiseuille's equation, where *R* at hierarchical level *i* is and *μ* is the viscosity constant. The total network resistance *R* is given by [5]
3.3where there are *H* + 1 hierarchical levels, and *n _{i}* =

*λ*

^{H}^{− i}is the total number of pipes at hierarchical level

*i*.

Next, we consider upper and lower bounds for *D*_{r} given the objective of minimizing the energy–time product (equation (3.2)). Recalling that *λ*^{−H} = *N*^{−1}, in the case where *D*_{r} ≤ 4*D*_{l}/(1 + *D*_{l}), the summation in equation (3.3) converges to a constant (log(*N*) in the case of equality), and
3.4As *D*_{r} increases above 4*D*_{l}/(1 + *D*_{l}), *R* increases from to (see Appendix A in the electronic supplementary material for details of the calculation).

Flow through a pipe is defined as *Q* = *uπr*^{2}, where *u* is the fluid velocity. Therefore, flow through the aorta equals and substituting from equation (2.2), Since we do not assume that *u _{H}* is independent of

*N*,

*u*

_{0}appears in the equations. If

*Q*is equal at all levels of the network (steady-state assumption) then: 3.5With

*R*and

*Q*in hand, we now substitute these relationships into the equations for

*E*

_{net},

*E*

_{node},

*T*

_{net}and

*T*

_{node}, obtaining the scaling predictions shown in the first column of table 1. It is evident that the scaling behaviour of

*E*

_{net}depends on the value of

*D*

_{r}:

**Case 1:**

**Case 2:**

Given that *D*_{l} = 3 for three-dimensional animals, and that *D*_{r} must be greater than 2 to accommodate the necessary slowing of blood as it flows towards the capillaries (5), then Case 1 applies for 2 ≤ *D*_{r} ≤ 3, and Case 2 applies for *D*_{r} > 3.

Appendix A (in the electronic supplementary material) gives the derivations for *E*_{net} for all values of *D*_{r}. Here we show the case (*D*_{r} ≤ 3) that minimizes the scaling of the energy–time product (equation (3.2)):
3.6

The energy–time product is dominated by the second term in equation (3.6), which is minimized by setting *D*_{r} to its minimum possible value. Thus, minimizing the energy–time product requires *D*_{r} = 2 (Case 1), and
3.7

### (b) Biological scaling predictions from the energy–time minimization model

Earlier scaling models showed that area-preserving branching (*D*_{r} = 2) leads to the 3/4 power scaling of metabolic rate with body size known as Kleiber's Law (e.g. [5,6]). However, in animal circulatory networks blood must slow before reaching capillaries in order to reduce pressure on the walls of small vessels and to allow oxygen to be dissociated from haemoglobin in the capillaries. Under this circumstance, perfect area-preserving branching is not feasible, and *D*_{r} must be greater than 2.

We make a specific prediction for the value of *D*_{r} that minimizes the energy–time product while both slowing the flow of blood to the capillaries and matching the supply and demand for oxygen in the nodes. By our definition of a node as the volume of tissue that processes oxygen at a fixed rate, *T*_{node} must be invariant. Table 1 shows the model prediction

Following [6], in the optimal case *u*_{0} increases with organism mass, and therefore with *N*. See electronic supplementary
material, §6.1 for the derivation that Substituting this equation for *u*_{0} into the equation for *T*_{node} in table 1, we find that *T*_{node} is invariant with respect to *N* when *D*_{r} = 24/11 = 2.18. The last column of table 1 lists the scaling predictions given this value of *D*_{r}.

We test the prediction that *D*_{r} = 24/11 using data from [23]. This influential Kolokotrones *et al*. paper showed that metabolic rate is elevated in both small and very large mammals, indicating systematic deviations from a simple power-law relationship between metabolism and mass. Although the deviation appears only as a slight curvature in the canonical log–log plots, as shown in figure 2, it is important because it calls into question prior scaling models that purport to explain a universal scaling exponent.

We derive the equation relating metabolism (*B*) to mass (*M*), following the approach used in [6], but we relax the assumption that *D*_{r} = 2 giving^{3} and

3.8

See electronic supplementary material, §6.1 for details of the calculations.

Although this prediction for *B* is not as simple as the 3/4 scaling predicted by West *et al*. [5] or the alternative models proposed by Kolokotrones *et al*. [23], the exponents in equation (3.8) arise naturally by combining two scaling relationships: that of the metabolic rate of the nodes and the metabolic power required to drive the network.

By considering blood slowing through the network due to *D*_{r} > 2 and by including energy dissipated in both the network and the nodes, each with different scaling exponents, the model naturally generates the curvature observed in the data. Intuitively, in smaller animals a greater fraction of energy is consumed by *E*_{node}, a term that is linear in the number of nodes.

We tested the predicted value of *D*_{r} = 24/11, which minimizes the energy–time product, and find a marginally better fit (solid line in figure 2), than alternative models in [23]. The m.s.e. for our model is 0.0271 versus 0.0287 for the extended West *et al*. model (red dotted line in figure 2). The alternative models in [23] that were specifically designed to account for curvature have m.s.e. 0.274 and 0.0277. We also calculated a value of *D*_{r} that is the best statistical fit to the data. Following [23], we use least-squares regression, eliminate the orca that is an outlier, and choose scaling constants to best fit the data. We find that *D*_{r} = 2.50 gives the best statistical fit (dashed line in figure 2). Alternative fitting methods and inclusion of the outlier have negligible effect on the best-fit value of *D*_{r}.

The energy–time minimization model is the only model proposed thus far that naturally generates curvature accounting for the elevated metabolic rate of the largest mammals as well as the smallest. The predicted value of *D*_{r} between 2 and 3 is also consistent with the idea that the upper region of the network is area preserving with *D*_{r} = 2, while *D*_{r} = 3 in the lower region as proposed by West *et al*. [5], and it is consistent with the empirical radius scaling reported in [22].

### (c) Microprocessor model

We now apply the same reasoning to computer chips. In computers, unlike biology, nodes (transistors) are not constant size but have shrunk by many orders of magnitude over 40 years of microarchitecture evolution. During this time, total chip area has grown much more slowly, and we assume it to be constant for our calculations. In addition, the total area of all transistors on the chip is a fixed fraction of the area of the chip [11]. Putting these two constraints together, the linear dimensions of transistors decrease with transistor count as *N*^{−1/2} (more generally, ). The width of the smallest wires is because minimum transistor size and wire width are both determined by the process size. Similarly, because transistor linear density increases as *N*^{1/2}. Intuitively, this means that the number of nodes increases as smaller transistors are placed closer together and connected with smaller and shorter wires. In the following, we assume that all wires carry the same flow and that information is transferred synchronously. We now calculate how *E*_{net}, *T*_{net}, *T*_{node} and *T*_{node} scale with the number of transistors, *N*, and the three scaling dimensions, *D*_{l}, *D*_{r} and *D*_{w}.

*E*_{net} can be calculated from basic principles of electronics as the energy dissipated to transmit a bit over a wire: *CV*^{2}/2, where *C* is capacitance and *V* is voltage. Because *V* has remained approximately constant over the last four decades (decreasing only by a factor of five while transistor count increased by six orders of magnitude [24]), we estimate that the total energy to transmit all bits over the network scales as *C* [25]. Ignoring fringe effects and for an aspect ratio of 1, wire capacitance is proportional to wire length, [26], where is the dielectric constant. Thus, the network capacitance is the sum of the capacitances of all wires, which is proportional to the total wire length of the network [27]:
3.9where at all levels *i*, *l _{i}* is the length of wire,

*w*is the number of wires per module, and

_{i}*n*is the number of modules. Recalling that and gives 3.10Note that the scaling of

_{i}*E*

_{net}with

*N*depends on

*D*

_{l}and

*D*

_{w}, but not on

*D*

_{r}. Similar to energy scaling in mammals, how

*E*

_{net}scales depends on whether the exponent 1/

*D*

_{l}+ 1/

*D*

_{w}−1 in equation (3.10) is positive or negative. If

*D*

_{w}≥

*D*

_{l}/(

*D*

_{l}− 1) the exponent is negative and the summand converges to a constant (log(

*N*) in the case of exact equality), leaving When

*D*

_{w}<

*D*

_{l}/(

*D*

_{l}− 1), Given

*D*

_{l}= 2 for two-dimensional chips,

*E*

_{net}is minimized when

*D*

_{w}≥ 2. See Appendix B (in the electronic supplementary material) for details.

We now calculate the scaling of *E*_{node} ignoring leakage power.^{4} For a single node, computation energy is given by the transistor's (dynamic) energy dissipation as *CV*^{2}/2. Again assuming constant *V* and the capacitance of a transistor proportional to its length (*l*_{0}), *E*_{node} is obtained by summing the capacitance across all *N* nodes giving

We calculate *T*_{net} as the time to transmit a bit over the last wire in the network that connects to each transistor. This assumes perfect pipelining so there is no delay in signal arriving at the last wire (electronic supplementary material, Appendix B shows that perfect pipelining requires *D*_{r} = 2). Thus, *T*_{net} is equivalent to the wire latency that equals resistance multiplied by the capacitance of the wire (*RC*). For wires with aspect ratio 1, where *ρ* is the resistivity of the material, and as above. Thus,
3.11where is constant, because in chips and both are determined by process size.

Computation time for each node, *T*_{node}, is calculated as the transistor delay, *CV*/*I* [28], where again *V* is constant and *C* is proportional to transistor length:

Before calculating the energy–time product, we observe that *T*_{net} is the only term that depends on *D*_{r}, so we set *D*_{r} = 2 to minimize *T*_{net}. Similarly, *E*_{net} is the only term that depends on *D*_{w}, and we set *D*_{w} to minimize *E*_{net}. In summary, given *D*_{l} = 2, the terms of the energy–time product are minimized when *D*_{r} = 2 and *D*_{w} ≥ 2. Although the energy–time product is minimized for values of *D*_{w} greater than 2, this would entail greater communication locality, which is challenging to engineer and doesn't improve the energy–time product. Thus, the model predicts that *D*_{w} = 2, which is consistent with observed Rent's exponents that approach 1/2 [15,29]. The scaling relations for various quantities are summarized in table 1.

### (d) Predictions for microprocessors

Summarizing the results from the previous section, the energy–time product for chips is minimized when *D*_{l} = *D*_{r} = 2 = *D*_{w}. This result corresponds to ideal scaling, as suggested by Dennard [30], where the linear dimensions of transistors and wires scale at the same rate, wire delay is constant, and Rent's exponent is 1/2.

The final energy–time product scales as *N*^{1/2} (table 1), showing that, unlike mammals, as size increases, the energy-delay product per node decreases systematically. Thus, chips have become faster and they consume less energy per transistor as more transistors are packed onto a chip. Of course, this trend arises from the remarkable miniaturization of transistors and wires described by Moore's Law. It is not surprising that transistors are faster (*T*_{node}) and require less energy (*E*_{node}) as they become smaller. It also makes sense that *E*_{net} grows sublinearly with the number of transistors, because as *N* increases the distance between nodes is reduced. Additionally, *D*_{w} = 2, means that most bits move locally, so the distance between nearest nodes affects the average distance that bits are transmitted. The only term in the energy–time product that does not decrease with increased *N* and decreased process size is *T*_{net}, which remains constant under Dennard scaling where wire radius and length scale proportionally to each other.

These scaling models make two testable predictions. First, power consumption (*P*) in chips (total energy dissipated per unit of time) scales as
3.12

Second, performance, measured as computations executed per unit of time, or throughput (*Tp*), is predicted to scale linearly with *N*, i.e.
3.13

We compared our theoretical predictions for active power consumption (ignoring leakage power) with data obtained for 523 different microprocessors over a range of approximately 6 orders of magnitude in transistor count (see the electronic supplementary material, §7.3 for details of the data collection). The data are shown in figure 3, where the measured exponent was 0.495 (95% confidence interval = 0.46–0.53), which agrees closely with our prediction of 0.5. Consistent data on performance across many technology generations is difficult to obtain because reporting standards have changed over the years and their adoption by different vendors is not uniform. We obtained normalized performance data for 100 different Intel chips, measured with Dhrystone Millions of Instructions per Second (DMIPS), from a variety of sources (see the electronic supplementary material, §7.3). These sources included a variety of published third-party performance comparisons from different generations over a range of 6 orders of magnitude in transistor count. The best-fit exponent for these data is 1.11 (95% confidence interval = 1.07–1.15), as shown in figure 4. This is close to our predicted exponent of 1, suggesting that engineered designs slightly outperform the theoretical optimum defined by the model. Performance and throughput were fitted using least-squares regression, assuming that there are no significant errors in the reported count of the number of transistors [31].

It is somewhat counterintuitive that performance increases only linearly with the number of transistors. Given that transistor switching times have decreased dramatically as size has decreased, one might expect performance to increase as the product of clock speed and transistor number (*N*). However, this is not the case, and we show the expected performance if time were actually the inverse of clock speed in the dotted line in figure 4. Some performance increases are achieved by increasing clock speed for a given manufacturing process, which may account for the higher-than-predicted scaling exponent.^{5} This analysis confirms that the network is indeed the bottleneck. The network delivers bits to transistors at a constant rate per transistor (equation (3.11)), so performance has increased only linearly with transistor number even though, in principle, smaller transistors could process information more quickly. As in biology, performance cannot be understood without considering the constraints of the network.

Our model provides a simple theoretical explanation for the scaling of power and performance in computers over 40 years of microprocessor technology improvements. The excellent agreement between the theoretical optimum and experimental data suggests that through successive generations of trial and error, innovation and optimization, engineered designs are highly successful, approaching and sometimes exceeding the theoretical optimum predicted by the model.

## 4. Discussion

### (a) Summary of scaling predictions

Scaling analyses provide a framework for understanding critical parameters and constraints on the design of both biological and computational systems spanning an enormous range of sizes. We have presented a unified model which predicts scaling relationships for both mammals and microprocessors by simultaneously minimizing energy dissipation and delivery time. The energy–time minimization model highlights the similarities and differences between biological networks that deliver oxygen and computational networks that deliver information. Earlier scaling models focus either on minimizing energy dissipation or on minimizing delivery time (e.g. [5,6]). Here we extend that work by considering minimization of energy and time simultaneously, and investigating the trade-offs between them.

This theoretical model makes testable scaling predictions for biological metabolism and for the power and performance of computers. In biology, the energy–time model explains the observed curvature in metabolic scaling of mammals (figure 2). Other studies have interpreted the deviation from linear scaling as indicating that there is no single unified metabolic scaling theory, for example, as imperfect matching of supply and demand [17]. The framework presented here accounts for curvature in the optimization model by including time and energy minimization in both the network and the nodes. In computation, the unified model accurately predicts Rent's exponents, active power consumption and chip performance in over 40 years of chip design. Thus, the model provides evidence of strong convergence between natural and engineered designs due to physical constraints despite the obvious differences between them.

The model presented here is, of course, a simplification of the more complex reality. For example, our analysis assumes that *D*_{l}, *D*_{r} and *D*_{w} are fixed constants throughout the network both within and across systems. In reality, each of these may vary. For example, Newberry *et al*. [22] did not find evidence for a constant *D*_{l} = 3 in mouse vasculature, suggesting that the network does not deliver resources uniformly throughout the body volume. This is not surprising given that different tissues and organs have different metabolic requirements. *D*_{r} may vary within the vascular network with area-preserving branching closer to the heart and area-increasing branching slowing blood velocity in smaller vessels, but Newberry *et al*. [22] find values for *D*_{r} consistent with our predictions. Similarly, there is evidence that *D*_{w} varies across hierarchical levels in computer chips [32]. Including these factors in the model would allow more accurate predictions, but they are unlikely to substantively change the order-of-magnitude predictions of our simple unified model.

Our model makes novel predictions both for mammals and microprocessors. For mammals, we give the first quantitative prediction for *D*_{r} that accounts both for blood slowing through the network and for the empirically observed curvature in scaling relations that cause small and very large mammals to deviate from 3/4 scaling predictions. Additionally, this prediction (*D*_{r} = 24/11) gives an energy–time product that is approximately linear with *N* ( table 1). Highlighting the inherent trade-off between energy dissipation and delivery times has important implications for understanding the energetic basis of fitness. Some have proposed that biological fitness maximizes metabolic power (energy/time) [33,34], whereas others have proposed that it minimizes biological times (e.g. generation times, which is equivalent to maximizing vital rates) [35,36]. The invariance of the energy–time product on a per-node basis is consistent with the idea that organism fitness is largely independent of body mass. Mammals of all sizes, from small, fast mice to large, slow elephants, coexist and, therefore, are probably nearly equally fit. This implies a direct trade-off between maximizing metabolic power and minimizing generation times, which holds over the many orders-of-magnitude variation in body mass. The energy–time product reflects powerful geometric, physical and biological constraints on the evolution of organism designs.

In computation, the model accurately predicts power consumption and performance of computer chips as simple functions of the number of transistors. These order-of-magnitude performance predictions highlight that delivery of bits through the network, rather than processing bits at the transistors, is the rate-limiting step that constrains performance. More precise predictions may be obtained by incorporating additional factors, for example, leakage power, which comprises an increasing fraction of the power budget of computer chips [7].

### (b) Implications for evolutionary transitions

The similarities between biological and computational scaling suggest future trajectories in computing based on how the fundamental structural and functional properties of organisms from bacteria to mammals have changed over evolutionary time. Work by Delong *et al*. [37] demonstrated that the slopes and intercepts of metabolic scaling relations change at the evolutionary transitions: prokaryote (bacteria) metabolic rate varies *superlinearly* with size, unicellular protist rate varies *linearly*, and whole-organism metabolic rate of multicellular animals scales *sublinearly*, converging to the canonical 3/4 exponent that approximates the mammalian scaling described above. The authors hypothesize that these discontinuous scaling shifts arise from body plans overcoming pre-existing constraints, and then accommodating to new constraints, as body size and complexity increase.

Delong *et al*. hypothesize the following: larger bacteria have higher metabolic rates because their larger genomes allow increased use of metabolic substrates, but eventually cell surface area limits metabolic processing. Unicellular protists overcome this constraint by internalizing the metabolic machinery into respiratory organelles (i.e. mitochondria that convert oxygen into ATP). The number of mitochondria increases linearly with cell size until intracellular transport constraints begin to limit the rate of metabolic processing. Next, multicellular animals have effectively invariant cell size and intracellular transport, but as body size and number of cells increased, vascular networks evolved to rapidly and efficiently deliver metabolites. However, vascular networks introduce the sublinear network scaling constraints characterized above.

Delong *et al*. highlight the importance of both time and energy constraints, and these change at each evolutionary transition, with the consequence that the absolute time and quantity of energy required to deliver each molecule of oxygen increase across the major evolutionary transitions. This suggests that the energy–time minimization framework that we have used to predict the curvature in metabolic scaling in mammals may apply across the range of living organisms, with different constraints on time and energy emerging at each evolutionary transition. The explanations that the authors hypothesize are also directly relevant to understanding of how energy–time minimization affects the ongoing evolution of computer hardware.

#### (i) Innovations in chip design mimic innovation in the evolution of bacteria

The chip scaling described above shows how time and energy dissipation have decreased while performance increased as larger numbers of smaller transistors have been packed onto each chip. During this era, technological innovations in chips have emerged that optimize against physical constraints. Just as bacteria have evolved larger genomes and used the new genes to exploit new metabolic niches, new materials, switching methods, etching processes and cooling technologies have pushed physical boundaries, allowing transistors to shrink and more of them to be packed onto each chip. Like bacteria, however, there are limits to this process. There are no elephant-sized bacteria, and there will be no silicon-based single-core chips with quadrillions of transistors.

#### (ii) Single-core chip scaling mimics unicellular protists

Historical chip scaling mimics the linear relationship between performance and size (figure 4) seen in protists. Unicellular protists show linear increases in metabolic rate with size (fig. 1 of [37]) as more energy-processing nodes (mitochondria) are packed into larger cells. As size continues to increase, however, this design strategy also reaches physical limits. Our analysis suggests that the internal transport network already constrains processing speeds (*T*_{net} constrains *T*_{sys}). Further, the requirement to dissipate heat over a fixed surface area constrains both cells and chips.

#### (iii) Multi-core chips echo the transition to multicellularity

Computer chips are currently undergoing the evolutionary transition to multi-core, resembling the biological transition to multicellularity. Our unified scaling framework suggests some future scenarios. As the era of transistor minimization wanes, additional transistors will require increased physical area and, therefore, networks that span greater distances. Similar to multicellular organisms, we expect that as the number of cores grows, an increasing fraction of chip power will be devoted to these ever-larger ‘networks on chip’ (NoC) connecting more cores. Larger networks will consume more power and take more time to traverse, and ultimately the energy–time minimization will be increasingly difficult to sustain as chips increase in size. Clock speeds have already levelled off as power, footprint and cooling requirements dominate chip-design considerations [38]. If chips follow biology, we can expect that the most important future advances in chip design will increase network efficiency, for example, by using optical networks.

#### (iv) Computer scaling deviates from biological scaling in important ways

There are also important differences between scaling of oxygen delivery in biology and information delivery in computation, which play an important role in evolutionary transitions. In particular, on-chip computer networks have two advantages not available to cardiovascular networks. First, the shrinking of ‘process’ size (smaller transistors and wires) reduces both energy and delay in the nodes as the number of nodes increases. This reduction in process size will ultimately end as physical limits are reached [38]. Second, the locality of network traffic, characterized by Rent's exponent and *D*_{w}, reduces long-distance communication over computer networks. As shown above, this effect reduces *E*_{net} and leads to a smaller wire footprint as *N* increases on single-core chips. This advantage will probably continue for multi-core chips, where communication and, therefore, network bandwidth, footprint and energy consumption of NOCs can be reduced by keeping communication primarily local [39,40]. Communication locality has the potential to produce more favourable scaling in multi-core computation than is achievable in multicellular biology.

#### (v) Decentralized designs in the transition to sociality

We now consider how the lessons learned from computer architecture may lend insights into an important biological evolutionary transition, the transition to social-animal societies. Understanding and improving the flow of energy, materials and information through human societies is one of the greatest challenges facing science and engineering, and scaling analyses lend an important perspective on this problem [41]. Sociality is an important evolutionary transition, reflected in the ecological dominance of humans and ants, whose networked systems transport both energy and information. These social species have experienced great success, dispersing over vast territories across the globe and capturing a large fraction of available energy [42,43]. Recent evidence suggests that ant colonies and human societies follow similar scaling relationships as individual organisms [44–48].

In social-animal systems and networked computer systems, networks are at least partially decentralized, e.g. traffic flow within cities [49] and among ant nests [50]. Tainter *et al*. [51] argue that complex human and ant societies are able to exploit ‘low-gain’ energy systems—those that provide low concentrations of dispersed energy, but that are ubiquitous and therefore can be exploited by complex systems capable of processing and storing vast quantities of energy. Understanding the forces that have driven the tremendous power and performance scaling in computing may lend insights into how other technologies exploit similar scaling relationships [52]. In particular, communication locality in computation suggests an important strategy in the transition to sociality: animal societies can escape the constraints of the centralized distribution network by evolving systems for decentralized transportation and modular communication. Indeed, the transition to solar energy is capitalizing on the same kind of dramatic technological performance improvements that computer technology experienced as Moore's Law [53]. The history of computing suggests large gains in the efficiency of energy delivery if increasingly powerful solar cells use dispersed solar energy locally to escape the centralized distribution overhead of the fossil fuel-based economy.

Moreover, power laws as a function of size are not unique to organisms and computers but are observed across a wide variety of complex systems in nature, society and technology. The scaling of white and grey matter [54] and communication modularity [14] in the brain, of flow through river networks that minimize transportation costs [55], of energy use and GDP in countries [56], and the pace of life and population in cities [45] are all additional examples that a unifying scaling theory might explain. Because cost and performance, i.e. energy and time, impose universal constraints, we suggest that a common design principle may govern the scaling of many evolved and engineered complex systems that process energy, materials and information.

## 5. Conclusion

Our analysis provides a unifying explanation for the origin of scaling laws in biology and computing. Despite obvious differences in form and function, the scaling of organisms and computers is governed by the same simple principle: minimizing the energy and time to deliver and process resources. Both natural selection and human engineering have evolved designs that manage the trade-off between cost and performance to minimize energy dissipation and time to deliver resources, resulting in general scaling laws that predict metabolic rate, and microprocessor power and performance over several orders-of-magnitude variation in system size.

Engineering ingenuity and economic pressures have created increasingly fast and powerful computers through a series of innovations, including integrated circuits, innovations in materials and other technological tricks, synchronizing clock trees, multi-core chips and networked and distributed computation. Today, technology is undergoing another major evolutionary transition as distributed computing changes the metabolic landscape of technology that is becoming more tightly coupled with the environment. As computers are embedded in more physical devices, physical proximity and energy concerns for low-power devices may drive computational scaling to more closely resemble biological scaling. In computation, dramatic changes have emerged over the last 35 years, but to a surprising extent, their trajectories mimic the biological transitions that took billions of years to evolve simple unicellular bacteria into the largest and most powerful animals and societies on the Earth.

## Data accessibility

The datasets supporting this article have been uploaded as part of the electronic supplementary material.

## Authors' contributions

All authors made substantial contributions to this paper's conception and design, acquisition and analysis of data, and drafting and revisions.

## Competing interests

We have no competing interests.

## Funding

We gratefully acknowledge funding from NSF 0621900 and 1413947, DARPA FA8750-15-C-0118, AFOSR FA8750-15-2-0075, the UNM PIBBs programme through NIH T32EB009414, and a James S. McDonnell Foundation Complex Systems Scholar Award.

## Acknowledgements

The authors would like to thank Ricard Solé, Van Savage and Doyne Farmer for insightful discussions that improved this paper, as well as the very helpful comments of anonymous reviewers.

## Footnotes

One contribution of 13 to a theme issue ‘The major synthetic evolutionary transitions’.

↵1 Rent's rule is typically expressed as

*C*(*n*) =*kn*, where^{p}*C*is the external communication of a module,_{n}*n*is the size of the module (number of nodes),*k*is the average external communication of a module with size 1, and*p*is Rent's exponent. For a hierarchy with branching factor of*λ*, the size of a module is given as*n*=*λ*, where^{i}*i*is the hierarchical level. Therefore, we can rewrite Rent's rule as*c*=_{i}*c*_{0}×*λ*, where^{ip}*c*_{0}=*w*_{0}and*p*= 1/*D*_{w}.↵2 For computers, it is intuitive that these quantities can be treated independently. In biology, this is less obvious because the heart that powers the vascular network is itself composed of cells (nodes) that require oxygen delivery, an apparent circularity. However, the metabolic power of the heart (

*E*_{net}) is supplied by oxygen delivered directly to the heart by the coronary artery, bypassing the rest of the vascular network. Thus, we treat*E*_{net}independently from*E*_{node}.↵3 These expressions are consistent with those in [6], specifically when

*D*_{r}= 2, and and when*D*_{r}= 3, and↵4 Transistors and other devices conduct a small amount of current even when they are not being used. This energy loss is referred to as ‘leakage power’ and is a significant issue in modern microprocessor design not explicitly addressed by our model.

↵5 Additionally, higher-end chips are more likely to be benchmarked, potentially leading to a bias in the data towards higher-performing chips.

- Accepted April 15, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.