The idea of sex differences in the brain both fascinates and inflames the public. As a result, the communication and public discussion of new findings is particularly vulnerable to logical leaps and pseudoscience. A new US National Institutes of Health policy to consider both sexes in almost all preclinical research will increase the number of reported sex differences and thus the risk that research in this important area will be misinterpreted and misrepresented. In this article, I consider ways in which we might reduce that risk, for example, by (i) employing statistical tests that reveal the extent to which sex explains variation, rather than whether or not the sexes ‘differ’, (ii) properly characterizing the frequency distributions of scores or dependent measures, which nearly always overlap, and (iii) avoiding speculative functional or evolutionary explanations for sex-based variation, which usually invoke logical fallacies and perpetuate sex stereotypes. Ultimately, the factor of sex should be viewed as an imperfect, temporary proxy for yet-unknown factors, such as hormones or sex-linked genes, that explain variation better than sex. As scientists, we should be interested in discovering and understanding the true sources of variation, which will be more informative in the development of clinical treatments.
Sex differences in the brain have made headlines for more than a century. In 1912, James Crichton-Browne, a prominent neuropsychologist and collaborator of Darwin, explained in a New York Times article why ‘women think quickly’ and ‘men are originators’:
In woman, Sir James said, the posterior region of the brain receives a richer flow of arterial blood, in men the anterior region. The work of the two regions of the brain is different. The posterior region is mainly sensory and concerned with seeing and hearing. The anterior region includes the speech centre, the higher inhibitory centres, which are concerned with will, and the association centres, concerned with appetites and desires based upon internal sensations.
There is, Sir James thinks, a correspondence between the richer blood supply of the posterior region of the brain in women and their delicate powers of sensuous perception, rapidity of thought and emotional sensibility, and between the richer blood supply of the anterior region in men and their greater originality on higher levels of intellectual work, their calmer judgment and their stronger will [1, p. 4].
Although we may find such revelations archaic and even a bit offensive, the same type of thinking remains prevalent today. News reports and information-based websites such as Wikipedia, WedMD and HowStuffWorks.com contain an alarming amount of pseudoscience. It is commonly asserted, for example, that women listen with both sides of the brain, whereas men use only the left side [2,3] and that women use white matter to think, whereas men use grey [4,5]. Women allegedly have 10 times as much white matter as do men, whereas men have 6.5 times as much grey matter as do women ([6,7], reviewed in ). Whereas women navigate using cerebral cortex, men use ‘an entirely different area’ that is ‘not activated in women's brains' . Such assertions, although inaccurate, are easy to find on the Internet and in the popular press.
The misrepresentation of sex differences is likely to become even more commonplace. Partly because of increasing availability of imaging technologies, the percentage of journal articles that refer to sex differences and the brain has more than doubled in the past two decades (figure 1a). Over the same period, media reporting on the topic has risen by about fivefold (figure 1b). These increases are already impressive, but the amount of research on sex differences is about to increase even further, far beyond what figure 1 could foretell. This year, the US National Institutes of Health (NIH) mandated the inclusion of both sexes in most research with animals, tissues or cells . The new policy requires NIH-funded researchers to disaggregate data by sex and, when possible, compare the sexes. The goal is to ‘transform how science is done’ . Research on sex differences is thus set to expand from a small percentage of studies to nearly all studies funded by NIH. If this goal is achieved, the side effects will almost certainly include a fresh onslaught of questionable interpretations and claims.
In this article, I outline some traps that researchers face as they test for and report sex differences. These pitfalls are largely related to the interpretation of statistical tests, choice of wording and the use of inference. I suggest below a number of strategies that may help researchers avoid common problems and therefore minimize misinterpretation and misrepresentation of their work.
2. Three fallacies of sex differences
Miscommunication of the nature and meaning of sex differences can be traced to many causes [11–14]. Here, I outline three problematic ways of thinking, or fallacies, that have impeded the communication of findings. First, we usually present our conclusions about sex differences as the answer to a yes-or-no question when the real answer lies somewhere in between ‘yes’ and ‘no’. Second, we often attempt to infer the behavioural or evolutionary function of a sex difference in the brain without sufficient evidence to do so. Third, we tend to assume that sex differences are caused by genetic or hormonal influences rather than by experience. At the root of all three of these points is a fourth issue, which perhaps could be regarded as a fourth fallacy: the notion that sex acts as an independent variable that measurably affects other variables. Sex is merely a label; defining it in biological terms has proven tricky . Sex is at best a proxy for the more important and interesting factors that covary with sex .
(a) Fallacy 1: with respect to any trait, the sexes are either fundamentally different or they are the same
Sex has been called a basic biological variable that splits the population into two halves ; the categories of male and female are regarded as rigidly discrete [18,19], forming ‘taxa’ . When these two populations are compared, however, measures of most traits overlap extensively [14,19–21]. The conceptualization and communication of that overlap are impeded by our natural urge to dichotomize  and by language choices that emphasize difference [12,23]. For example, if a statistical test returns a low p-value, we are likely to make statements such as, ‘females outperform males on the memory task’ or ‘women are more susceptible than men to the side effects’. Taken literally, these statements imply that with respect to the trait measured, males and females constitute distinct groups . In nearly all cases, however, that interpretation is wildly incorrect. Conversely, when the p-value is not low enough to reject the null hypothesis of sameness, we often conclude that the sexes are the same even when sex could explain some important variation [25–28]. The problem here is that we are asking a yes-or-no question when both ‘yes’ and ‘no’ are the wrong answer. To truly understand the nature of most sex differences, which arguably are not actual ‘differences’, we need to ask how much the sexes differ, not whether or not they do [25,26].
(b) Fallacy 2: the cause of a sex difference in behaviour or ability can be inferred from functional neuroanatomy
It is a longstanding tradition to invoke sex differences in neuroanatomy to explain alleged sex differences in behaviour, intellect or other traits . Just as the New York Times printed in the early twentieth century that male-like patterns of blood flow allow more original thinking , in this century more white matter in women is said to confer greater language skills  and ability to multitask . The larger hippocampus of women is said to support better memory , language skills , learning skills  and processing of emotive information . The inferior parietal lobule, larger on the left in men and on the right in women , apparently underlies differences in math ability and sensitivity to crying babies . Testosterone-induced lateralization of brain function is claimed to increase men's interest in machines , while oestradiol increases women's attention to emotions and communication . Each of these anatomical or hormonal differences has been invoked to explain why men and women tend to enter different types of professions [39–41].
The above assertions are based on the following logic: (i) a structure (or hormone) we'll call ‘X’ differs between men and women; (ii) X is related to a behaviour we'll call ‘Y’; (iii) men and women differ in Y; therefore, the sex difference in X causes the sex difference in Y. This argument is invalid because it invokes the false cause fallacy–a sex difference in Y cannot be deduced to depend on X. In addition to being invalid, the argument is also often unsound in that rarely are all three premises supported. Evidence that structure X plays a role in behaviour Y, for example, is usually scarce. Even in animal models, in which lesions and other manipulations can be performed, the behavioural functions of sexually differentiated brain regions are, for the most part, unclear . Evidence of a sex difference in behaviour Y is also sometimes lacking, and instead a stereotype is offered. Consider the following popular inference: (i) the hemispheres of the brain are more heavily interconnected in women than in men ; (ii) greater hemispheric interconnectedness allows better multitasking; (iii) women are better multitaskers than men, therefore the anatomical difference explains the difference in ability [7,44]. First, the evidence that variation in interhemispheric connections actually contributes to variation in human abilities is practically non-existent . Second, studies of multitasking have shown no female advantage [46,47]. The argument pervades popular culture nonetheless, probably because it appears to confirm stereotypes [8,11,48].
(c) Fallacy 3: sex differences in the brain must be preprogrammed and fixed
A third type of fallacious thinking, which pervades news stories and scholarly articles alike, is signalled by words such as ‘hardwired’, ‘natural’ and ‘genetic’. These terms are nearly always used to argue that sex stereotypes are rooted in biology. They make sex differences sound predetermined and inevitable, untouched by experience or culture [8,11,23,49,50]. Readers are easily convinced, particularly when the explanations appear to support their own biases [8,48]. Such arguments, however, ignore the exquisite plasticity of the brain; the effects of sex-linked genes and sex hormones on neuroanatomy are irreducibly entangled with the effects of sex-specific experience . Suggesting otherwise leads to logical leaps known as the appeal to nature and deterministic fallacies—that sex-typical behaviour is natural, predetermined and out of our control. The cost of these fallacies is high: readers exposed to such arguments are more likely to endorse stereotypes and engage in stereotype-consistent behaviour (reviewed in [12,51,52]), and may feel powerless to change their own trajectories [48,50].
In §§3 and 4, I will focus primarily on Fallacy 1 and how to avoid it. The remaining two fallacies are important in the context of communicating findings in research papers as well as press releases and are revisited in §5.
3. Pink hippocampus, blue hippocampus? Most are purple
Becker et al.  defined a sex difference as a dimorphism, in other words a trait that ‘occurs in two forms, one form typical of males and the other typical of females.’ (p. 1651). The trouble with such definitions is that, except for sex chromosomes, gonads and external genitalia , the two sexes rarely take two distinct forms. The vast majority of sex differences in neuroanatomy and physiology are characterized by overlapping distributions. Variation that is attributable to sex can often require large sample sizes to detect. It is almost never the case that the sexes can be distinguished by a single structure in the brain that is said to ‘differ’. In other words, we typically cannot identify the sex of an individual by measuring any one thing in the brain; the majority of values fall into a grey area [14,19–21]. Conversely, values cannot be predicted accurately from the sex of an individual [21,54]. The overlap between the sexes is usually not represented clearly by scientists and conflicts with public perception of sex differences in the brain [8,11,49].
Figure 2 illustrates often-cited sex differences in humans. The graphs, which were made using the reported means and standard deviations (table S2), show the frequency distributions, or the number of individuals of each sex with any given measure or score. Note that Reis & Carothers  have shown that for many sex differences, actual data do not cluster according to sex as they do in figure 2, but rather fall onto a single continuum for both sexes.
Because most readers are familiar with the sex difference in human height, I have depicted that first , for comparison (figure 2a). This sex difference is relatively large, as is the difference in total brain volume (figure 2b) . When brain volume is controlled, sex differences in individual brain regions are smaller or disappear completely . One of the most often-cited sex differences in the brain is that of the hippocampus, which some authors have reported is larger in women than men . A recent meta-analysis showed no sex difference in this structure , and even when such a difference has been detected [21,57] there is a good deal of overlap. In the dataset depicted in figure 2c, hippocampal size is more typical of the opposite sex, i.e. it is larger than average in men or smaller in women, in about a third of the population. Thus, despite a statistically significant sex difference (p < 0.0001) we cannot say that the hippocampus occurs in two forms.
The claim of two forms is commonly made, even when the extent of overlap is quite large. In the paper on interhemispheric connectivity in humans cited in §2, Ingalhalikar et al.  concluded that there are ‘fundamental differences’ between male and female brains. The authors did not report the degree of overlap, but the reported T statistics and degrees of freedom indicate that the sexes overlapped by nearly 90% (figure 2d,e; see also ). Nonetheless, the study was hailed by both scientists and the media as strong evidence that male and female brains take two distinct forms [13,69]. News stories announced that ‘men's brains go back to front, women's go side to side’  and that the structural differences are ‘so profound that men and women might almost be separate species’ .
Sex differences are sometimes interpreted as evidence that, despite overlap, an entire sex is somehow deficient. For example, early findings of a lower rate of serotonin synthesis in men (e.g. ) (figure 2f) have been used in the popular press to argue that a serotonin deficit makes men impulsive and ‘stupid’ . This sort of interpretation of a sex difference, in other words that one sex exhibits a deficit, can lead to singling out of one sex for interventions. The alleged serotonin deficit in men, for example, has been argued by educators to warrant specialized educational strategies for boys . In 1997, a different study  suggested that serotonin synthesis may actually be higher in men (figure 2g). This finding eventually led to a reversal in the popular press such that women became the abnormal sex. In a book on women’s mental health, Dr David Edelberg [73, p. 14] wrote, ‘When doctors discovered the relationship between low serotonin and emotional disorders, they started comparing the serotonin levels between sexes and found that women simply were not making enough’. As more clinically relevant biomarkers are discovered to vary according to sex, presenting and emphasizing overlap between the sexes should help prevent the impression that an entire sex is atypical.
The main goal of the new NIH policy  is to balance our approach to understanding diseases and clinical conditions, the aetiology of which may vary according to sex . An often-highlighted example of such a condition is pain. Of the dozens of reported sex differences in pain, two with among the largest sample sizes are shown in figure 2. Figure 2h shows lower pain thresholds for women than men—which is typical not only for pressure , but also thermal, electrical and other types of pain . Sex differences in the response to analgesia, on the other hand (figure 2i), are characterized by greater overlap  and less agreement about which sex exhibits the greater response. Some authors have reported that opioid analgesics are more effective in women than men, some the other way around (reviewed in ). Although animal research has suggested that mechanisms of morphine analgesia differs between males and females , sex differences in morphine efficacy have been difficult to detect in humans—they may be masked by differences in side effects or pain thresholds [62,63,74].
Perhaps the most popular example of a drug with differential effects in men and women is zolpidem, the sleep aid in Ambien. Even after controlling for body mass, the clearance rate of this drug is lower in women than in men  (figure 2j), which has clear consequences. The morning after taking zolpidem, for example, women deviated more from a straight line while driving—in other words, they were more impaired (figure 2k) . The U.S. Food and Drug Administration (FDA) recently issued new guidelines reducing the dosage for women , which was hailed by advocates of personalized medicine as a huge step in the right direction. Cahill  pointed out that ‘millions of women had been overdosing on Ambien’. That is most certainly the case. Note, however, that a sizeable proportion of the men fall into the female range for clearance rate (figure 2k). If most of the women taking Ambien were overdosing, then about a third of the men were doing the same. In their 2013 announcement , the FDA actually recommended lower doses for both sexes. The guidelines stated, ‘These lower doses of zolpidem will be effective in most women and many men’ (p. 3; italics added). Despite the overlap acknowledged by the FDA, the change in guidelines for zolpidem remains by far the most-cited example of the need for sex-specific medicine.
A statistically significant sex difference does not necessarily indicate a meaningful separation between the sexes. Figure 3 shows hypothetical data for a fictional drug I will call ‘Dimorphinil’. In this fictive sample of 40 men and 40 women, the sex difference in clearance rate is both significant (p < 0.01) and of medium size (d = 0.60). The effect size exceeds that for some real drugs, for example, cyclosporine and nifedipine, for which clearance rate differs significantly between the sexes [77,78]. The low p-value and medium effect size for Dimorphinil suggest non-trivial, clinically relevant sex difference . Yet, the percentage of males and females above and below the overall mean is about the same—22 of the males and 18 of the females cleared Dimorphinil faster than the overall average and 18 of the males and 22 of the females cleared it more slowly. If we were to recommend different dosages of Dimorphinil for men and women, only a handful of patients would benefit—those in the tails of the distributions, which consist mostly of one sex. The majority of patients, for whom sex does not reliably predict clearance rate, could be harmed by too much or too little drug. Sex-based dosages, even in this case, may be preferable to a one-size-fits-all approach, if only because benefiting a small minority of patients is nonetheless a benefit. It is critical to remember, however, that when significant sex differences exist, they usually indicate that sex explains some small portion of the variation, not that the sexes are ‘different’. Treatments that are ‘personalized’ for each sex will benefit everyone only when the effect size is astronomical. Although some patients will certainly benefit from sex-specific medicine, it comes with a high price tag: it over-emphasizes difference and strengthens false notions that men and women fall into dichotomous categories of patients.
Overlap between the sexes indicates that other factors, in addition to sex, contribute to variation in a trait. Because sex is itself not a mechanism  and cannot be absolutely defined in biological terms , it is at best a proxy for these other variables. Many of them are not yet known. Most sex differences, if they have not already, will eventually be completely explained by some other factor that covaries with sex. These explanatory factors will, I believe in every case, contribute more to our understanding of mechanism than does the label ‘sex’. Take, for example, a study that showed a male advantage in multitasking . The results conflicted with the popular notion that women are better multitaskers than men . The more interesting result, however, was that the sex difference was completely explained by a much larger sex difference in video game experience. In other words, multitasking ability was actually predicted by video game experience, not sex. In this case, investigating the potentially explanatory correlates of sex was more informative and satisfying than simply reporting a sex difference. In the case of drug development, a better strategy than dividing patients by sex, or an important next step, would be to discover and study the covariates that explain sex differences. For example, let us imagine that unbeknownst to researchers, the sex difference in Dimorphinil clearance rate (figure 3) is explained by physical activity, which also depends on sex [80,81]. If we did not know about the effect of activity on Dimorphinil clearance, we might tailor dosage according to sex instead—female athletes and male couch potatoes would receive the wrong dose.
The list of known variables that can covary with sex is long , and includes some rather uninteresting, non-biological factors such as housing arrangements in animal facilities . Perhaps for that reason, it is commonly argued that sex differences driven by obvious or uninteresting covariates, for example, body mass, are not ‘true’ sex differences . If they are not caused by biological factors that covary with sex, what, then, are true sex differences? Those caused by sex hormones? Levels of sex steroids can overlap extensively, depending on the species and stage of development. Are true sex differences caused, then, by the sex-determining region of the Y chromosome? Some women have that gene . As we peel away and discard each of the mechanisms that actually do explain sex differences, we are left only with the ‘essence’ of male and female; in other words, slippery concepts that gender scholars identify as the basis of ‘essentialist’ thinking [45,85]. Essentialism is not useful to us as neuroscientists [23,86]. In neuroscience, sex can be a predictor but never a cause . Sex differences are all caused by knowable factors that covary with sex—those factors are not likely to be defined by sex. Our task is to discover and understand those factors, not simply to demonstrate that the sexes are different. The mechanisms underlying sex-based variation are incredibly complex, so for now, using sex as a proxy for the more interesting variables will have to suffice. Because sex is highly politicized and poorly communicated to the public, however, it is not a good stopping point.
4. Is there a difference? Both ‘yes’ and ‘no’ are wrong answers
When we compare the sexes, we divide our sample into two categories. Thus, the very nature of the scientific question encourages us to think dichotomously. To make matters worse, the most commonly used statistical tests encourage dichotomous thinking about the results . Our decision to declare the sexes ‘different’ or the ‘same’ is usually based on whether a p-value is above or below 0.05. But p = 0.04 and p = 0.06 represent essentially the same result and cannot lead logically to opposite conclusions. Further, no matter what our decision, we are almost certainly wrong. As noted above, finding a statistically significant difference does not mean that the sexes are substantively ‘different’—the differences are almost always characterized by important overlap (figures 2 and 3). Even a strikingly low p-value may not indicate a meaningful difference; if the sample size is large enough, a statistically significant sex difference could be detected in any measure simply owing to noise [26,27].
Conversely, a p-value above the threshold for a significant difference (usually p > 0.05) does not indicate that the sexes are the same [25–27]. In other words, failure to reject the null does not give license to accept the null. In so doing, we would be accepting absence of evidence as evidence of absence. Cumming has called this logical error the ‘fallacy of the slippery slope of non-significance’ . The error is particularly common in psychology and neuroscience, fields that rely heavily on null hypothesis significance testing . Hoekstra et al.  found that in the field of psychology, 60% of authors concluded ‘no difference’ when p > 0.05. If one's goal is to confirm that sex does not matter for the measurement at hand, the t-test would seem a rather poor choice to test for sameness.
Rather than asking a yes-or-no question about whether the sexes differ, it is more informative to quantify the extent to which, or how much, sex contributes to variation . The p-value obtained from null hypothesis significance tests, e.g. t-tests, does not answer this question. Lower p-values do not indicate larger effects. Measures of effect size, such as Cohen's d, are more useful (figure 2). When d is less than about 0.5, regardless of statistical significance, sex is unlikely to explain important variation and the finding should probably not be emphasized without good reason . Even if no significant difference is detected, reporting the effect size is helpful to determine next steps . Estimates of confidence intervals [25,27] or Bayesian approaches  represent other alternatives to null hypothesis significance testing. Overlap or similarity between groups can be estimated [88,89]. The companion web page of this article, www.sexdifference.org, is an online tool that calculates effect size and percentage overlap from user-entered descriptive statistics. It can be used to visualize distributions of the user's own data or, as was done in figure 2, those of published sex differences.
Whether we are calculating p-values or effect sizes, to detect a sex difference we must compare the sexes directly. Disaggregation of data by sex, now mandated by NIH, does not involve an actual comparison. We can simply test for an effect of treatment in each sex independently. When the p-value is below alpha for one sex but not the other, we typically claim a ‘sex-specific effect’. Such conclusions are problematic for many reasons. As critics of the NIH policy have pointed out [87,90], dividing a sample into subgroups lessens power and therefore the ability to detect effects. In a famous illustration of this phenomenon, Sleight  recounted the analysis of data from the International Study of Infarct Survival, which showed a clear benefit of daily aspirin. When the population was divided by astrological sign, resulting in 12 separate subgroups, the beneficial effect of aspirin was lost in the Libras and Geminis. Testing a hypothesis within each sex incurs a similar risk that an effect will be detected in one sex but not the other, when in fact both sexes are responding.
The problem with conducting independent tests in males and females goes beyond the loss of statistical power. Such a design does not actually allow us to test whether the effect of treatment depends on sex. To answer that question, we must test for interactions between sex and treatment. Nonetheless, the practice of testing two groups independently is quite common in neuroscience. In an analysis of articles in top-ranking journals such as Nature Neuroscience and Neuron, Nieuwenhuis et al.  found that authors tested for an interaction in only half of the cases in which two experimental effects were compared. In the rest, the authors tested for effects in the two groups independently. When p < 0.05 for one group and p > 0.05 for the other, they concluded that the effect of treatment depended on the group. This conclusion is simply another case of the slippery slope of non-significance . Upon failing to reject the null for one group, the authors accepted the null—and worse, contrasted the result with that of the other group. But when the p-values of two tests differ, the outcomes themselves cannot be said to differ [27,92,93].
Testing whether experimental effects differ between two groups requires a test for interactions, for example, factorial analysis of variance (ANOVA). An advantage of ANOVA is that the main hypothesis can be tested using the entire sample of males and females together, thus avoiding the Libra/Gemini problem described above . Although power to detect an interaction is notoriously low in ANOVA , a low-powered test is perhaps acceptable in the context of developing sex-specific medical treatments because we are interested in detecting only robust, clinically relevant interactions. Note, however, that if we fail to detect an interaction, particularly with a low-powered test, we cannot say that the sexes respond in the same way to treatment—in so doing we once again slide down the slippery slope .
5. Communicating sex differences
What is the best way to plot a sex difference? The most valuable representations of our data will accurately depict overlap. Plotting the distributions (figure 2; see www.sexdifference.org) allows the reader to assess the effect size. Other options to depict overlap include graphing confidence intervals using error bars or ‘cat's eye pictures’ . Alternatively, plotting individual data points, for example, in dot plots (figure 3), allows the reader to see exactly the extent of overlap as well as the percentages of each sex in the tails of the distributions. Many readers, particularly those outside the field, may look only at the figures; thus depicting overlap graphically likely pays off even more than adjustments to language in the paper.
Once the graphs are made and the paper accepted, some of the most important work is yet to be done. Publication of newsworthy results is generally accompanied by a press release, in most cases written by staff in the public relations department of the home institution. This press release, much more than the paper itself, sets the tone of media coverage and dictates the information contained therein. Even journalists who specialize in science writing may skip the journal article and read only the press release . Although scientists may blame the media for misrepresenting and sensationalizing their findings [97–100], press releases often contain the same oversimplification, omission of information and misinterpretations as do news stories [13,51,52,97,101]. The process of packaging our findings into sound bites naturally leads us into the three traps outlined above in §2. Difference is more popular than sameness [11,102], which may compel us to downplay overlap and announce striking sex differences—thus invoking Fallacy 1.
In order to make newly discovered sex differences more meaningful to the public, researchers are prone to speculate about the functions of those differences—thus invoking Fallacy 2. In an analysis of a highly covered study on hemispheric connectivity (mentioned above; figure 2d,e ), O'Connor & Joffe  found that dubious claims in the news stories were actually present in the press release. Some of the claims could be traced to the journal article itself. Although the study contained no behavioural data, the authors listed a number of sex differences in domains such as memory, social cognition and sensorimotor skills that their data might explain. The press release highlighted these domains and introduced new ones, such as multitasking. All of these were picked up by the popular media, which added even more alleged sex differences supposedly explained by the findings. The small sex effects on connectivity detected in the study (figure 2d,e) were said to explain sex differences in emotional intelligence, intuition, athleticism, hunting, cleaning the house and endless other so-called gendered behaviours [13,71]. A typical headline stated, ‘The hardwired difference between male and female brains could explain why men are ‘better at map reading’ .
By using the term ‘hardwired’, that headline also invoked Fallacy 3: the idea that sex differences are predetermined and immutable. The issue of predeterminism is particularly relevant to results from animal models, which are often extrapolated to humans more hastily than is warranted. We are told both by media and by researchers that because a sex difference appears in a non-human animal, it must be genetic, shaped by evolution and free of sociocultural influence. Take, for example, a recent paper by Farmer et al. . The authors showed that in mice, experiencing pain caused females to spend less time with males. By contrast, males in pain continued to pursue females. In their paper, the authors argued that the findings ‘suggest that the well-known context sensitivity of the human female libido can be explained by evolutionary rather than sociocultural factors.’ (p. 5747; italics added). The press release , which led with ‘Not tonight honey…’ proved irresistible; the study received worldwide coverage. The fact that it was conducted in mice, not humans, was sometimes lost, however. Headlines announced, ‘Women's low sex drive due to biological reaction to pain’  and ‘It's science: why your headache excuse is actually legit but his isn't’ . The coverage of this study shows clearly that animal studies are not immune to widespread media attention or potential over-interpretation.
Although the authors of a study do not typically write the press release, they are often quoted and may even be given opportunities to edit. Thoughtful attention, particularly to how the work will be interpreted by the public, is important at this stage. The press release should be regarded as a ‘point of no return’; once unleashed, misinformation evolves on its own and can be difficult, if not impossible, to rein in [13,98]. Rather than offering questionable functional or evolutionary explanations for the results, it may be more effective to point out what the research does not indicate. In a recent press release from Northwestern University [108,109], the senior author emphasized that research on sex differences in the brain ‘is not about things such as who is better at reading a map or why more men than women choose to enter certain professions’. Rather, the author emphasized that it is ‘about making biology and medicine relevant to everyone, to both men and women’. The ensuing media coverage was both widespread and higher in quality than the usual. Clearly, close collaboration with the PR department can pay off. Authors may even want to take the lead in communicating findings to the public; more and more scientists use social media and blogs for this purpose [96,100]. Ultimately, although we all want to share our findings widely and ponder what they mean, the most effective communications—those that enhance public understanding—will stick to the facts and avoid speculations about evolutionary function or hardwiring. The topic of sex differences is too volatile and easily sensationalized to risk doing otherwise.
6. Future steps and an alternative ending
The inclusion of both sexes in biomedical research is a necessary and important step forward. Comparing the sexes, and their responses to potential treatments, will inform not only the development of inclusive medicine but also our understanding of mechanisms that covary with sex . The new NIH policy  will advance these goals, but carries with it high risk of collateral damage. As more and more sex differences are discovered, the number of misinterpretations will also increase. It would be best to be prepared, ideally by providing training to researchers and journalists. The NIH Office of Research for Women's Health already offers helpful online courses that cover the importance of studying both sexes, the biology of sexual differentiation and disorders that affect one sex disproportionately . This training does not, however, cover how to recognize and avoid contributing to pseudoscience. In fact, the training materials currently state that ‘women have larger left cortical language receptors than men’ (course 2, lesson 4, p. 6) but the source cited  does not mention language. According to the same course, men mount a ‘fight or flight’ response when faced with a crisis whereas women ‘tend and befriend’; this difference is said to be caused by a sex difference in oxytocin. One of the cited sources does not mention sex differences ; another states emphatically that the role of oxytocin in human behaviour is not well understood . Certainly, if these sorts of misrepresentations can creep into NIH training materials, we can expect them to pop up practically anywhere— including in our own work [13,52]. Training in the interpretation and communication of sex differences should be a priority.
In this article, I have painted a rather grim picture of fallacious headlines appearing in the news every day. Whether the new NIH guidelines actually increase the number of such headlines depends, of course, not only on the manner in which results are communicated, but also on the extent to which the guidelines are followed. Depending on how stringently they are enforced, there could be a very different but equally disappointing outcome. Research is expensive, and not all researchers are interested in sex differences. When asked to increase sample sizes and perform additional analyses, a majority of researchers may be motivated to rule out sex differences as quickly as possible. In such cases, demonstrating that the sexes are the same becomes highly incentivized. After a cursory t-test showing ‘no difference’, researchers may feel free to go back to business as usual. Thus, whereas dubious interpretations of positive findings threaten public scientific literacy, false negatives may turn out to be much more detrimental to the mission of the NIH. For every sex difference that makes headlines, numerous others may go undiscovered as they slip down the slippery slope of non-significance and out of sight.
The datasets supporting the figures have been uploaded as the electronic supplementary material.
I have no competing interests.
My effort devoted to writing this article was supported by Emory University.
I am grateful to Evan Goode, who designed and implemented the interactive tool at sexdifference.org. I also thank Chris Goode, Kim Wallen and Catherine Woolley for helpful comments on the manuscript.
One contribution of 16 to a theme issue ‘Multifaceted origins of sex differences in the brain’.
- Accepted November 15, 2015.
- © 2016 The Author(s)
Published by the Royal Society. All rights reserved.