Royal Society Publishing

Towards writing the encyclopaedia of life: an introduction to DNA barcoding

Vincent Savolainen , Robyn S Cowan , Alfried P Vogler , George K Roderick , Richard Lane

Abstract

An international consortium of major natural history museums, herbaria and other organizations has launched an ambitious project, the ‘Barcode of Life Initiative’, to promote a process enabling the rapid and inexpensive identification of the estimated 10 million species on Earth. DNA barcoding is a diagnostic technique in which short DNA sequence(s) can be used for species identification. The first international scientific conference on Barcoding of Life was held at the Natural History Museum in London in February 2005, and here we review the scientific challenges discussed during this conference and in previous publications. Although still controversial, the scientific benefits of DNA barcoding include: (i) enabling species identification, including any life stage or fragment, (ii) facilitating species discoveries based on cluster analyses of gene sequences (e.g. cox1=CO1, in animals), (iii) promoting development of handheld DNA sequencing technology that can be applied in the field for biodiversity inventories and (iv) providing insight into the diversity of life.

1. ‘Star Trek's tricorder’ coming to reality

In the early 1960s, World War II veteran Gene Roddenberry brought to the air a now famous science fiction drama, Star Trek, in which a handheld ‘tricorder’ device was used to scan and identify alien life forms (www.startrek.com). Four decades later, the first international conference on ‘Barcoding Life’ was held at the Natural History Museum in London (UK), attended by over 200 participants from about 50 countries, and a portable DNA sequencing device to identify all life was claimed to now be within reach (Marshall 2005). Of course, the London conference had nothing to do with Star Trek, but there is a parallel many of us will make. Will we indeed build a DNA-based identifier the size of a mobile phone? How will this new technology be useful in biology, is it truly revolutionary and what does the DNA barcoding approach entail?

It took over two centuries for taxonomists to describe 1.7 million species, but we know this figure might be a gross under-estimate of the true biological diversity on Earth (Blaxter 2003; Wilson 2003). Although taxonomists can identify most organisms with which they are familiar, an ever-growing community requires taxonomic information for a broad range of taxa. The build-up of DNA databases has great potential for the identification and classification of organisms and for supporting ecological and biodiversity research programmes.

One of the first conferences exploring these issues was the DNA Taxonomy Workshop at the Deutsche Staatssammlung in Munich in April 2002, funded by the German Science Association (DFG) with the participation of some 100 scientists mainly from European countries (Tautz et al. 2002). At this early stage, the issues much in focus were the most useful markers for the so-called DNA taxonomy (i.e. a universal DNA-based classification system across all organismal groups), the difficulties of linking established names to entities within a DNA-based system (Tautz et al. 2003), and the implications for nomenclature (Minelli 2003).

With a different viewpoint from the German meeting, a group of scientists lead by Paul Hebert at University of Guelph in Canada developed the use of part of one mitochondrial gene as a universal ‘identification’ marker for animal species (Hebert et al. 2003a,b). Building upon the idea of the ‘universal product code’, known as ‘barcodes’ in the retail industry (Brown 1997), a few DNA nucleotides (e.g. the sequences of a short DNA fragment) may well provide an immediate diagnosis for species. As with commercial barcodes, the use of these ‘species barcodes’ first requires the assembly of a comprehensive library that links barcodes and organisms. Recognizing the potential of this approach, the Alfred P. Sloan Foundation funded two meetings at Cold Spring Harbor, the first in March 2003, the second in September the same year. From these meetings came the idea that major natural history museums should take the lead in connecting diagnostic DNA sequences both to specimen vouchers in collections and to the existing taxonomic system, the so-called Linnean system. In spring 2004, the Sloan Foundation provided another substantial award to establish a secretariat for the ‘Barcode of Life’, based at the Smithsonian's National Museum of Natural History in Washington, DC, USA. The Consortium for the Barcode of Life (CBOL) was also created and joined by many natural history museums and herbaria, research organizations and private partners (www.barcoding.si.edu).

Barcoding life has quickly attracted much attention and received considerable media coverage. Last autumn the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov) sealed a partnership with CBOL whereby barcode standard DNA sequences and relevant supporting data can now be archived in GenBank. This is an important step forward because despite the greatest efforts of the curatorial teams at GenBank (www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/) and elsewhere, much DNA sequencing work has hitherto been done without adhering to standards in taxonomy and data quality, and systematic coverage remains sketchy, precluding a wider use of these molecular tools in taxonomy. Much debate has also been generated; for example at the Partnerships for Enhancing Expertise in Taxonomy's fifth biennial conference Vincent Smith, Kipling Will and Paul Hebert participated in a vivid debate on ‘Genetic Barcoding’, (available for viewing at www.conferences.uiuc.edu/peet/video.html) which will be published soon in Systematic Biology (Hebert & Gregory in press; Smith in press; Will et al. in press). The advocates of DNA barcoding say that it will revitalize biological collections and speed up species identification and inventories (Gregory 2005; Schindel & Miller 2005), whereas its opponents argue that it will destroy traditional systematics and turn it into a service industry (Ebach & Holdrege 2005); several papers have provided weighted analyses of the pros and cons (Moritz & Cicero 2004; Marshall 2005).

Last February, in contrast, the spirit of the CBOL conference in London was to provide a scientific and technological forum where an objective examination of the prospects and limitations of DNA barcoding were made possible, and standing pro-actively far away from the often tedious and rather naïve polemics that have surrounded the barcoding initiative. Instead, the main scientific issues debated were (i) is it possible to distinguish a large number of species using short DNA sequence data? (ii) can closely related or fast-evolving species be distinguished with this technique? (iii) what are the appropriate DNA sequences for barcoding various taxa, i.e. will the partial cox1 sequence be useful in groups other than those tested so far, or will we have to use multiple markers from different genomes to identify all life? The technological challenges also included the building of both a simple portable DNA sequencing device and a centralized, and appropriately curated, barcoding-specific database. This themed issue compiles some of the best contributions from the London conference. Here we introduce the broad scope of papers and views that helped in making this meeting a success.

2. DNA-based biodiversity inventories

The direct benefits of DNA barcoding undoubtedly include:

  1. make the outputs of systematics available to the largest possible community of end-users by providing standardized and high-tech identification tools, e.g. for biomedicine (parasites and vectors), agriculture (pests), environmental assays and customs (trade in endangered species);

  2. relieve the enormous burden of identifications from taxonomists, so they can focus on more pertinent duties such as delimiting taxa, resolving their relationships and discovering and describing new species;

  3. pair up various life stages of the same species (e.g. seedlings, larvae);

  4. provide a bio-literacy tool for the general public.

Perhaps another relatively uncontroversial aspect of DNA barcoding is that it will also facilitate basic biodiversity inventories. Indeed, from the premises of molecular phylogenetics to assembling the tree of life (Blaxter 2003; Cracraft & Donoghue 2004), DNA has proved useful in identifying clades and evolutionary relationships. Whether or not actual species can be identified with DNA (see below), the number of distinct DNA sequences in environmental sampling and reconstruction of phylogenetic trees to place these sequences into an evolutionary context have been used in several inventories of cryptic biodiversity (e.g. soil bacteria or marine/freshwater micro-organisms). Initially referred to as DNA typing or profiling, the DNA barcoding initiative has taken this step forward, and several taxa have now been surveyed in their natural habitats using this technique.

Such an approach has been particularly useful for marine organisms (Shander & Willassen 2005), including fishes (Mason 2003; Ward et al. 2005), soil meiofauna (Blaxter et al. 1998, 2004), freshwater meiobenthos (Markmann & Tautz 2005) and even extinct birds (Lambert et al. 2005). In the rainforests, rapid DNA-based entomological inventories have been performed so efficiently (Janzen et al. 2005; Monaghan et al. 2005; Smith et al. 2005) that tropical ecologists have been among the most active advocates of DNA barcoding (Janzen 2004).

More pragmatically, DNA barcodes have proved useful in biosecurity, e.g. for surveillance of disease vectors (Besansky et al. 2003) and invasive insects (Armstrong & Ball 2005), as well as for law enforcement and primatology (Lorenz et al. 2005). Barcoding efforts have also recently received the attention of conservation agencies. For example, the UK Darwin Initiative for the Survival of Species (www.darwin.gov.uk) has funded two projects this year that include DNA barcoding activities to support conservation priorities, capacity building and trade surveillance in meso-American orchids and cacti.

3. Beyond a universal cox1 barcode

The core idea of DNA barcoding is based on the fact that short pieces of DNA can be found that vary only to a very minor degree within species, such that this variation is much less than between species (www.barcodinglife.org). Simplistically, a threshold of variation could even possibly be characterized for each taxonomic group (ca 2–12%) above which groups of individuals do not belong to the same species but instead form a supra-specific taxon. Clustering analyses could therefore, be performed based on DNA sequences, reveal species groups and assign unknown individuals to species (Hebert et al. 2003a; figure 1). One such piece of DNA, the mitochondrial cytochrome oxidase subunit 1 (cox1, usually referred to as COI in barcoding studies; see White et al. (1998) for a discussion on gene nomenclature), was proposed to be a good candidate for barcoding animal species (Hebert et al. 2003a).

Figure 1

Hypothetical clustering analysis of DNA ‘barcoded’ individuals reveals at least three species. Sequence cluster 1 corresponds to a traditionally recognized species based on morphology (species 1). Clusters 2 and 3 can be two cryptic species revealed by DNA barcoding, which were previously embedded within species 2. Bold numbers indicate six unknown individuals sampled for a biodiversity inventory and assigned to their respective species using DNA barcoding. Individual 21 is unplaced and illustrates potential problems with barcoding: the non-assignment of 21 to clusters 1–3 could be due to problems with the barcode marker, the clustering algorithm used, and/or to biological phenomena such as hybridization and introgression; 21 could also be considered to represent a separate entity sampled only once. Some taxonomists prefer to rely on DNA-based clusters whereas others will preferentially consider morphology-based species recognition; most often both approaches converge to the same solution (as for cluster 1 and species 1). A difficulty in tree-based approaches is that the topology may not be well supported based on short mitochondrial DNA sequences, in particular at deeper nodes (indicated by asterisks). Alternative approaches do not use the tree structure but the character states unique to the individuals from particular sets of populations, e.g. see DeSalle et al. (2005).

With several early successes of using cox1 (Hebert et al. 2003b; Remigio & Hebert 2003; Hogg & Hebert 2004), larger sequencing programmes were set up (e.g. for fishes and birds, see CBOL website), with concerted massive data production rapidly differentiating the barcoding movement from previous DNA-based taxonomic identification attempts. However, although partial cox1 sequences continue to be used for barcoding (e.g. Armstrong & Ball 2005; Blaxter et al. 2005; Janzen et al. 2005; Lorenz et al. 2005; Smith et al. 2005), several other markers have also been proposed as putative barcodes. For example, given the potential problems with mitochondrial genes at the species boundaries in some groups (Moritz & Cicero 2004), nuclear ribosomal regions have been used as well in various animals (Markmann & Tautz 2005; Monaghan et al. 2005).

In plants, because of the limited variation in mitochondrial DNA generally, cox1 is useful only in some algae (Saunders 2005). In flowering plants another approach has been put forward. On one hand several plastid loci do discriminate between species, e.g. the trnH-psbA intergenic spacer (Kress et al. 2005) and some more typical phylogenetic markers such as rbcL and trnL-F (Chase et al. 2005), but on the other hand multiple genetic loci might be necessary to account for the common hybridization and polyploidy events in angiosperms. Ribosomal DNA (e.g. ITS in orchids) could be used to complement plastid genes, and shorter low-copy nuclear markers are being discovered that might in the future be used to provide a more sophisticated multiple component barcode for species diagnosis and delimitation, the ‘gold standard’ according to several botanists (Chase et al. 2005). This approach, whereby a few micro-barcodes would be used in combination, has also received considerable interest from mycologists (Summerbell et al. 2005).

4. Exploring species limits

Broadly speaking, taxonomy is concerned with the identity of organisms and their relationships. The discipline certainly faces many challenges in this new century (Godfray 2002; Godfray & Knapp 2004; Smith in press), and DNA barcodes are likely to play a major role in the future of taxonomy. In its strictest sense, DNA barcoding addresses only a limited aspect of the taxonomic process, by matching DNA sequences to ‘known’ species, the latter being delimited with traditional (e.g. morphological) methodologies. In this context, the role of barcodes is to provide a tool to assign unidentified specimens to already characterized species (Hebert et al. 2003a). This is of great utility to the end users of taxonomy, and will help make more rapid progress in traditional taxonomic work (Gregory 2005). As DNA barcodes are applicable to all life stages, it is also especially useful in cases where larval stages are difficult to identify with traditional methods, e.g. butterflies (Janzen et al. 2005) or amphibians (Vences et al. 2005), and social insects in which several casts have different ‘unrelated’ morphologies (Smith et al. 2005). In all of these cases, DNA barcoding is applied only in conjunction with classical approaches. Where species are simply unknown or no attempts have been made to delimit them, the barcode approach as originally intended would be limited in its applicability.

However, it is a widely accepted fact that species, however defined, are variable for most DNA markers including the widely used cox1 gene. Hence, the analogy to commercial barcodes presumes that the variation within these species is smaller than between them (www.barcodinglife.org). Therefore, an obvious contribution that barcoding is making to taxonomy is helping to discover cryptic species (Hebert et al. 2004). Using DNA to discover such morphologically similar but genetically differentiated species is not new or contentious (Moritz & Cicero 2004); even cryptic elephant species have been described based largely on genetic distances and clustering analyses (Roca et al. 2001). However, in these cases the reference to established species no longer needs to be strict, and species delimitation is at least partially relying on DNA data. This is a challenging problem that requires the characterization of appropriate markers and analytical tools, i.e. to discriminate clusters of interbreeding individuals versus those that have experienced an interruption to gene flow for a long enough period of time that species recognition is appropriate.

Recent barcoding papers have advocated criteria on sequence similarity, assuming a cut-off value for maximum within-species variation (e.g. Lambert et al. 2005). Within a parsimony framework, however, ‘barcoders’ have looked for unique combinations of autapomorphies in populations (DeSalle et al. 2005). Others have suggested that because of within species variation at potentially every nucleotide more sophisticated methods of species assignment are necessary (Matz & Nielsen 2005). Such approaches are an active area of research and are being implemented in new user-friendly software (e.g. Steinke et al. 2005). Phenetic approaches have also been used where species are not so easily conceptualized as biological entities, as in micro-organisms, or where these entities are difficult to define based on morphology, as in nematodes (Blaxter et al. 2005) and other meiofauna (Markmann & Tautz 2005). This has led to the concept of ‘molecular operational taxonomic unit’ that refers to clusters of individuals that are recognized based on such analyses of sequence similarity (Blaxter et al. 2005).

Some researchers have argued that ultimately the short DNA sequences themselves could potentially provide the basis for a taxonomic system, in which a set of sequences is used to delimit a cohesive group of organisms and the sequence itself represent the species diagnosis—or even the ‘type’. This idea of DNA taxonomy (Tautz et al. 2003) assumes that evolutionary entities in nature are recognizable equally as well from DNA sequences as from any other evidence in traditional taxonomy, see Markmann & Tautz (2005). Preliminary analyses presented in this issue are shown for beetles (Monaghan et al. 2005), meiofauna (Markmann & Tautz 2005) and nematodes (Blaxter et al. 2005). This use of DNA barcoding, however, remains among the most contentious (Will et al. in press).

Despite the firm commitment of the barcoding community to collection-based taxonomic research, criticisms from systematists have continued (Lipscomb et al. 2003; Wheeler 2004; Ebach & Holdrege 2005; Will et al. in press). CBOL has responded to this by noting that barcoding is neither a substitute for alpha-taxonomy nor about inferring phylogenies (Schindel & Miller 2005). However, we must keep open the possibility that the barcode sequences per se and their ever-increasing taxonomic coverage could become an unprecedented resource for taxonomy and systematics studies in addition to being a diagnostic tool. As little as ten years ago, a standard paper in the top-ranked journals of molecular systematics was likely to be based on no more sequence information per taxon than the barcodes of today and with less dense taxon sampling. Although, phylogenetic support levels were frequently low in these studies, it would be incongruous to ignore the phylogenetic information content of short mitochondrial DNA sequences at appropriate levels of divergence (see Rubinoff & Holland in press, for a critique), especially if in the future these could be supplemented with sequences from a standardized set of nuclear markers. In plants, the need for multiple markers is likely to be a necessity and is already being explored (Chase et al. 2005; Kress et al. 2005), but this approach may be equally useful for most other groups (Monaghan et al. 2005). With sampling of multiple individuals in populations and across geographic ranges, the power of barcodes could well also help resolve several taxonomic problems and assist in establishing the extent of species entities, as several papers in this volume discuss (DeSalle et al. 2005; Janzen et al. 2005; Smith et al. 2005). It also possible that some taxa can be established from the sequence variation alone and re-identified unequivocally in future collections while awaiting morphological analysis and formal species description, i.e. the ‘reverse taxonomy’ of Markmann & Tautz (2005; see also Monaghan et al. 2005).

5. A life barcoder for conservation

As the technical aspects of large-scale production of molecular barcodes are becoming more refined (Hajibabaei et al. 2005), and the value of the resulting database is increasingly apparent, barcoding of life has now developed into a more complex tool with uses at the interface between population genetics, phylogenetics and taxonomy. This is not new in essence, but perhaps what makes the barcoding of life unique is the large scale of its technological and societal ambitions. Another important factor is the aim of barcoding for standardization of the markers, DNA banking and proper taxonomic vouchering. The urgency of creating DNA and tissue banks has been well recognized (Savolainen & Reeves 2004; Lorenz et al. 2005; Savolainen et al. in press), and solutions for linking DNA samples with taxonomic vouchers are being developed for all sorts of organisms, for example for those ‘barcoded’ nematodes (Blaxter et al. 2005) that at first did not seem to exhibit morphological variation at the species level (De Ley et al. 2005).

Barcoding of life will have to be both integrative and integrated with other worldwide taxonomic initiatives such as the Global Taxonomic Initiative of the Convention on Biological Diversity (www.biodiv.org) or the Global Biodiversity Information Facility (www.gbif.org). Perhaps within three years a handheld DNA sequencer will become available (Rita Colwell, personal communication, Smithsonian Botanical Symposium 2005, sec http://persoon.si.edu/sbs/index.cfm), and one can now envision the time when automated DNA barcoding and wireless communications technology will be combined in a portable device. Not only will a ‘Life Barcoder’ be used to identify species, but also be linked via the World Wide Web to other kinds of biodiversity data such as images, uses, conservation status or biology. Surely if every child, politician and scientist has such direct access to life form information, then the importance of preserving biodiversity can only be formidably enhanced. This is not to deny naively the complexity of the problem of biological conservation, but given its urgency, we should welcome and help develop new initiatives that hold promise in that direction, which is the case for the barcoding of life.

Acknowledgments

We thank Mark Chase, Jim Mallet, Craig Moritz, David Schindel and Quentin Wheeler for useful comments and discussions, as well as the Royal Society, especially James Joseph, Editor, for having agreed to publish this themed issue. We are also grateful to the Consortium for the Barcode of Life, the Alfred P. Sloan and Gordon and Betty Moore Foundations and the Darwin Initiative for funding.

Footnotes

  • One contribution of 18 to a Theme Issue ‘DNA barcoding of life’.

    References

    View Abstract