Royal Society Publishing

DNA barcodes for biosecurity: invasive species identification

K.F Armstrong, S.L Ball


Biosecurity encompasses protecting against any risk through ‘biological harm’, not least being the economic impact from the spread of pest insects. Molecular diagnostic tools provide valuable support for the rapid and accurate identification of morphologically indistinct alien species. However, these tools currently lack standardization. They are not conducive to adaptation by multiple sectors or countries, or to coping with changing pest priorities. The data presented here identifies DNA barcodes as a very promising opportunity to address this. DNA of tussock moth and fruit fly specimens intercepted at the New Zealand border over the last decade were reanalysed using the cox1 sequence barcode approach. Species identifications were compared with the historical dataset obtained by PCR–RFLP of nuclear rDNA. There was 90 and 96% agreement between the methods for these species, respectively. Improvements included previous tussock moth ‘unknowns’ being placed to family, genera or species and further resolution within fruit fly species complexes. The analyses highlight several advantages of DNA barcodes, especially their adaptability and predictive value. This approach is a realistic platform on which to build a much more flexible system, with the potential to be adopted globally for the rapid and accurate identification of invasive alien species.

1. Introduction

Biosecurity is emerging as one of the most important issues facing the international community. Traditionally it has been associated with risks from infectious diseases, living modified organisms and biological weapons, but in the very broadest sense it encompasses minimizing risk through ‘biological harm’ (Meyerson et al. 2002). Not least is the economic risk from invasive alien species (IAS) that threaten ecosystem stability, producer livelihoods and consumer confidence (Cock et al. 2003). That risk is facilitated by the movement of exotic species around the world through increasing international tourism and trade, and is influenced by changes in climate and land use. Of those species introduced to novel environments an estimated one percent is anticipated to become invasive and with serious economic impacts (Williamson 1996). An example relevant to the following discussion is provided by Japan, where on average four exotic insect species have become established each year for the last 50 years. Of these 74% were economic pests, but just two, the Oriental fruit fly, Bactrocera dorsalis, and the melon fly, Bactrocera cucurbitae, have cost equivalent to more than EUR200 million to eradicate (Kiritani 1998). Also in the USA, the potential cumulative economic losses from Asian gypsy moth (Lymantria dispar) and nun moth (Lymantria monacha) establishment between 1990 and 2004 were estimated in the range equivalent to EUR28–46 billion (Cock et al. 2003).

New Zealand is very sensitive to the potential impact that such pests could have on the primary industries and natural ecosystems that underpin its economy. This is apparent by internationally having the most comprehensive biosecurity approach based on its Biosecurity Act of 1993 (Meyerson & Reaser 2002). Nevertheless, one of the main weaknesses recognized with this is the difficulty to predict new IAS which limits the implementation of appropriate risk management strategies (Parliamentary Commissioner for the Environment 2000). A critical aspect of prediction, and also monitoring, is the ability to accurately identify any intercepted specimen to the species-level. This is essential for support of early detection systems. It is also a means of collecting complete and accurate data about which species are actually entering for the assessment of risk. However, development of a comprehensive identification capability is hindered by the growing imbalance worldwide between diagnostic needs and the availability of trained taxonomic experts. Long-term research strategies are also required to address the deficiencies in existing taxonomic keys to deal with morphologically indistinct immature life stages, cryptic species and damaged specimens. For a few of the most economically significant and global pests morphotaxonomic keys are now supported by molecular diagnostic technology, e.g. fruit flies (Tephritidae; Armstrong et al. 1997a), tussock moths (Lymantriidae; Armstrong et al. 2003), leafroller moths (Tortricidae; Dugdale et al. 2002) and thrips (Thripidae; Toda & Komazaki 2002). However, such methods are developed on an ad hoc and often reactive basis with immediate local needs in mind and little or no coordination between institutes, regions or taxa. The following discussion illustrates how DNA barcodes can provide a very realistic, practical and flexible framework for species identification in the context of biosecurity. Specific examples are given from a New Zealand perspective for two global economically significant agricultural pest insect groups, the fruit flies and tussock moths. However the principles could equally apply to other taxa, sectors and countries. We propose that adoption of the method would enable the international IAS diagnostic community to better cope with changing and localized species priorities, to capitalize on the efforts of others and to address the international standardization of technologies that has been recommended for a more effective and coherent diagnostic effort (e.g. Klijn 2004).

2. Limits of previous molecular diagnostics for biosecurity

A variety of stand-alone molecular methods exist for identification of regulated pest species. Immunological (e.g. Symondson et al. 1999; Trowell et al. 2000) or protein-based (e.g. Miles 1979; Soares et al. 2000) methods are not widely used being highly taxon-specific, difficult to adapt, vulnerable to environmental factors and reliant on good quality, fresh tissue. The majority of molecular diagnostic methods are instead polymerase chain reaction (PCR) based DNA analyses which are not limited in these ways. However, a decade or more into the application of molecular diagnostics for biosecurity and other identification purposes, two major hurdles exist that preclude the building of a smarter, more co-ordinated and anticipatory IAS identification system.

(a) Finite range of taxa

The number of taxa that can be accommodated by any one method is predetermined and limited to various degrees from one to around fifty species. Due initially to the relative expense of method development, and for other pragmatic reasons, the approach has been to develop tests for those taxa predicted to be the most likely invaders, i.e. for species known to be invasive and spreading elsewhere (Cock et al. 2003). Examples are for species within the fruit flies (Haymer et al. 1994; Armstrong et al. 1997a), tussock moths (Armstrong et al. 2003), leafroller moths (Dugdale et al. 2002) and thrips (Toda & Komazaki 2002). Taxa within these and others are prioritized differently amongst countries according to matching host and climate, access to existing pathways of entry and anticipated economic impact. Modification of protocols to accommodate additional species as the need arises, or to adapt for different sectors or countries with differing taxa priorities may not be practical. This can be especially difficult when diagnosis is reliant on a few single nucleotide polymorphisms that form the basis of primers required for specific PCR or for polymorphic restriction sites or in the design of oligonucleotide arrays. Finding additional informative polymorphisms within such methods can be problematic.

Methods designed out of necessity for a predetermined range of taxa also undermines the potential to cope with the unpredicted arrivals. The entry of species found on inanimate objects, such as vehicles (Armstrong et al. 2003) or solid wood packaging (Wittenburg & Cock 2001), are more difficult to predict compared to those that are closely associated with their host material such as fruit flies (Armstrong et al. 1997a). Others may not be predicted because they are innocuous or minor pests in their native range. However, they can become a significant pest in a new environment with no specific natural enemies or competition. This has been a significant issue in, for example, North America where of the six most devastating forestry pests introduced only the European strain of the gypsy moth was known as a pest in its indigenous range (Cock et al. 2003). Similarly in New Zealand the unanticipated arrival in 1999 of painted apple moth Teia anartoides (Lepidoptera: Lymantriidae) from Australia, where it is a minor localized pest, was predicted to have a significant impact and so an eradication programme was initiated. The cost to New Zealand if it is not eradicated is anticipated to be equivalent to EUR33–205 million over the next 20 years (Case study 3 2002). A significantly more flexible and anticipatory diagnostic system is required to provide timely support for management of these events.

(b) Diverse methodologies

There has been little or no consistency in the PCR-based technologies used. A number of different methods have been designed, such as species-specific PCR (e.g. Kohlmayr et al. 2002; Lu et al. 2002; Liu 2004), PCR restriction fragment length polymorphism (PCR–RFLP; e.g. Armstrong et al. 1997a,b; Brunner et al. 2002), multiplex PCR (Kumar et al. 1999; Kengne et al. 2001), DNA sequencing (e.g. Brown et al. 2002; Dugdale et al. 2002) and oligonucleotide array analyses (Naeole & Haymer 2003). Even the idiosyncrasies of similar methods means that they are rarely directly transferable between laboratories or for use with different taxa and data cannot be shared.

There is also no consistency in the gene or parts of genes used to identify species. For insect identification examples of mitochondrial DNA (mtDNA) used are cytochrome oxidase subunit I (cox1; e.g. Brunner et al. 2002; Kohlmayr et al. 2002), non-transcribed region between cox1 and tRNAleu (Stauffer 1997), 16S rDNA (Brown et al. 2002) and cytochrome B (Khemakhem et al. 2001, 2002). For nuclear gene regions the rDNA internal transcribed spacer regions ITS1 plus ITS2 (Armstrong et al. 1997a,b), ITS1 only (Chiu et al. 2001) and ITS2 only (Pfeifer et al. 1995) have been used, as well as an actin gene intron (He & Haymer 1997) and randomly amplified polymorphic DNA (RAPD; Kengne et al. 2001). To a certain extent choice is dependent on the taxonomic range involved and appropriate evolutionary rate of the gene, but there may also be elements of convenience regarding primers available and in-house experience. The consequences of this disparity have been recognized to be an issue of much broader dimensions across the field of phylogenetics (Caterino et al. 2000).

In essence therefore, molecular diagnostic tests, which are more and more being accepted as an inevitable and essential component of the biosecurity toolbox (Martin et al. 2000), remain very limited. They are not flexible enough to accommodate the growing number of IAS, to identify unanticipated arrivals or to capitalize on the efforts of others that collectively work across a very diverse taxonomic range.

3. Identification using DNA barcodes

The emergence of DNA barcoding as a means of species identification (Hebert et al. 2003a) has the potential to address the shortcomings outlined above. In contrast to the molecular diagnostic methods available to date, DNA barcoding proposes to use information within a single gene region common across all taxa and to access that information by DNA sequencing under universal conditions. These features lend it well to standardization across species and laboratories, thus providing a platform for global exchange of homologous data and capitalizing on the efforts of others to build a more flexible system.

There is a growing literature demonstrating that cox1 will reliably discriminate a diverse range of taxa at the species level (e.g. Hebert et al. 2003a,b; Hogg & Hebert 2004; Whiteman et al. 2004; Ball et al. 2005; Shander & Willassen 2005). This gene, along with 16S, 18S, and elongation factor-1α genes, has also been encouraged as a standard target for insect phylogenetics (Caterino et al. 2000). Of enormous benefit to the international diagnostics community is the very large amount of cox1 sequence information that already exists in the literature for a diverse range of insect taxa. However from a biosecurity perspective, where accuracy is critical, the robustness of identifications and genetic limits of this gene need to be established. Potential complications arising from discordance with morphologically established species limits, species sequence overlap or divergence across intra-specific geographic ranges also need to be examined. Even so, if the concept can be verified to operational agencies as sound, it needs to be demonstrated that DNA sequencing is a practical and rapid alternative to the current technologically more accessible methods.

4. Testing cox1 DNA barcodes for exotic insect identification

To examine the suitability of cox1 sequence as a diagnostic tool for biosecurity, two datasets that exist for molecular identifications of specimens intercepted at the New Zealand border over the past decade have been revisited. The datasets are for the tussock moths (Lepidoptera: Lymantriidae) and fruit flies (Diptera: Tephritidae). Several species within these groups are considered internationally to be significant economic pests. They are not established in New Zealand, but are considered high risk to New Zealand's forestry and horticultural industries, respectively. DNA from a random selection of specimens that have been intercepted at the border was used. To test accuracy of the barcode method, identifications so determined were compared to those that, to all intents and purposes, had been successfully identified to species using the previously designed PCR–RFLP and specific-PCR methods. To test for improvement on the previous methods, all specimens that were previously unidentifiable by those methods were also included. The latter were either because of failure to PCR amplify the 1.5 kb ITS rDNA region for subsequent RFLP analysis, ambiguous RFLP patterns, RFLP patterns that were not recognized amongst those established for the target list of species or failure to amplify a nested species-specific PCR product.

DNA previously obtained for border specimens and specimens of the morphologically identified species contributing to the ‘profile’ data (see Electronic Appendix, table 3), was PCR amplified and sequenced was for the cox1 Folmer region according to established procedure (Hebert et al. 2003a,b). The only variation was use of the Expand High Fidelity (Roche Diagnostics) polymerase system instead of Taq in the PCR. For use with the tussock moth profile data set, sequence data for other Lepidopteran species was also included from the Barcode of Life Database (BOLD) and GenBank (see Electronic Appendix, table 2). Sequences were aligned and truncated to a ca 650 bp homologous region using Sequencher (Gene Codes Corp.). A profile neighbour-joining (NJ) tree of Kimura-2-parameter (K2P) distances was constructed from the sequence data using Mega v.2.1. The K2P model provides a suitable metric model when genetic distances are low (Nei & Kumar 2000) as anticipated with many of the species here. The simple NJ algorithm was considered at this juncture to be an appropriate starting point for the analyses, given that specimen identification is based entirely on sequence similarity, rather than on strictly phylogenetic relationships, and the speed of analysis that is necessary for biosecurity diagnostic purposes.

(a) Case study 1: tussock moths

Background: around 30 species of tussock moths have been determined to be unwanted organisms under the Biosecurity Act (MAF Biosecurity, Unwanted Organisms Register). Based on their pest status, polyphagous nature and invasive potential, seven northern hemisphere species are considered to present the greatest risk to New Zealand forestry. These are Asian and European gypsy moth (L. dispar), nun moth (L. monacha), pink or rose gypsy moth (L. mathura), vapourer or rusty tussock moth (Orgyia antiqua), white marked tussock moth (Orgyia leucostigma), Douglas fir tussock moth (Orgyia pseudotsugata) and white spotted tussock moth (Orgyia thyellinax; Armstrong et al. 2003). Specific life history strategies, such as long overwintering phases in the egg stage and indiscriminate oviposition on inanimate surfaces, such as containers, ship superstructures, forestry equipment and used vehicles could enable them to arrive in New Zealand.

Of these, the Asian gypsy moth is particularly well equipped to invade as the females are capable of sustained flight (Keena et al. 2001) and are attracted to lights of vehicles and ports (Wallner et al. 1995). The species also has variable occlusion cues enabling hatching to coincide with favourable environmental conditions (Walsh 1993) and has a proven invasive ability (Savotikov et al. 1995). Consequently the tussock moth egg masses commonly intercepted on imported used vehicles had previously been assumed by quarantine officers to be Asian gypsy moth. Unfortunately, while this species can be readily distinguished from the others based on adult morphology the early life stages cannot. This is compounded by their arrival on inanimate objects with unknown origin, providing limited host or geographic information to indicate their likely identity. Consequently there has been no accurate record of which species actually arrive in New Zealand. This has serious implications for the suitability of the post-border quarantine systems that are in place.

To improve interception records, a molecular diagnostic method was developed based on PCR–RFLP of ca 1.5 kb nuclear ribosomal DNA (rDNA) incorporating partial 18S plus complete ITS1, 5.8S and ITS2 regions. This has since been used to routinely identify the egg masses intercepted on imported used vehicles (Armstrong et al. 2003). Samples also arrive in very poor condition with potentially degraded DNA. Although not a risk in themselves, their accurate identification is necessary for a comprehensive risk analysis. Consequently a species-specific PCR method was designed to supplement the main RFLP diagnoses for specimens failing to amplify the 1.5 kb nuclear rDNA region (unpublished). Specific PCR primers were designed to amplify a 150–300 bp nested region of the ITS1 for the Asian species, L. dispar, L. mathura, L. monacha and O. thyellina. Used together with a control amplicon of 350 bp of the 18S rDNA, positive amplification indicated a positive identification. Incorporating these original methods into operational procedures the large majority of specimens were confirmed to be gypsy moth, plus two other high risk species, L. monacha and O. thyellina. Of concern however were the specimens that could not be identified. Some failed to PCR amplify. Others produced novel RFLP patterns for which no species or even genus could be inferred.

Recently, cox1 barcodes have been demonstrated to hold great potential for tussock moth species identification (Ball & Armstrong 2005). In that study 81 ‘test’ specimens were used to interrogate a cox1 sequence profile composed of 18 lymantriid species across four genera. 100% of the cox1 identifications agreed with their prior morphological identification, i.e. in all cases test sequences grouped more closely with their conspecifics than with any other species. This result is consistent with previous DNA barcoding studies of Lepidoptera (Hebert et al. 2003a). Testing this further as a biosecurity tool, new data are presented here for specimens intercepted at the border that had previously been identified to species, or otherwise, by PCR–RFLP or specific PCR.

Results: of the 57 border interception specimens analysed here, 49 had previously been identified to species by RFLP or by L. dispar-specific primers. Eight others had been unidentifiable (table 1). Of the latter, five could be placed with confidence (80–99% bootstrap support) by their cox1 sequence to a genus or species within the profile tree (table 1). This improved the previous RFLP identification rate from 86% to 93%.

View this table:
Table 1

Comparisons of previous molecular species identifications with DNA barcode identifications for New Zealand border specimens.

Interestingly four of the five additional identifications were not tussock moths. Specimen MAF812 associated closely with two species of Spodoptera and MAF773 with two species of Clostera (Electronic Appendix, figure 1). While these appear to be the most likely congeners, the interspecifc divergences suggest that the actual species are not represented in the profile dataset, e.g. 7.3% between Clostera albostigma and Clostera apicalis is of the same order as between MAF773 and each of those species (6.4% and 8.4%, respectively). A third specimen, MAF775, previously unamplifiable for subsequent RFLP or by L. dispar-specific primers, was amplified with the universal species cox1 primers. The sequence identified it as possibly a species of Dasychira, although there was only one species in the profile dataset (Dasychira dorsipennata) representing this genus (Electronic Appendix, figure 1). A fourth specimen, MAF913 produced an ambiguous RFLP haplotype, but was identified here as Hyphantria cunea (Electronic Appendix, figure 1) with a mean sequence divergence of 1.4% from three H. cunea profile sequences. A fifth specimen, MAF891, appears to be a divergent form of L. dispar with 2.4% sequence divergence from all other non-Hokkaidoensis L. dispar specimens. The remaining three of the eight previous unidentifiables, MAF816, MAF851 and MAF912, could not be placed with confidence in the profile dataset. They grouped most closely with Hypnea humuli, Leuhdorfia japonica and Calophasia lunula, respectively (Electronic Appendix, figure 1), but the bootstrap supports were weak (12, 34 and 33%, respectively). The cox1 sequences suggest that these specimens belong to taxa not represented in this dataset, ranging from other species to other families. It is no surprise therefore that they were outside the diagnostic scope of the original ‘tussock moth specific’ RFLP method and explains why they were difficult to analyse. It also suggests very positively for barcoding that, with greater representation of lepidopteran taxa in the profile dataset, this method could achieve a 100% identification rate which is not possible by other current molecular methods.

Of the 49 previously identified specimens, the barcode identification disagreed with five of them (table 1). Each had been diagnosed as L. dispar. Four of them were identified using the L. dispar specific PCR, as they had been difficult to analyse by the PCR–RFLP. Three of those were indicated by barcoding to belong to species not represented in the profile dataset. MAF915 and MAF729 associated with Spilosoma sp. (99% bootstrap support) and Hyphantria sp. (identical sequence) respectively, belonging to another lepidopteran family, the Arctiidae, and MAF839 came out on a long branch within the Lymantriidae, between L. xylina and L. dispar (80% bootstrap support). The fourth, MAF914, associated with high bootstrap support to the Korean haplotype of L. mathura which is genetically divergent from those in Japan (Electronic Appendix, figure 1). This implies the potential use of the barcode data to provide useful geographic origin information that was not possible previously. Data for the four specimens also indicate that the original species- specific primer test (unpublished) was not as broad as it had needed to be and again demonstrates the limitation of previous methods to deal with species outside the anticipated taxonomic range. The fifth specimen amongst those that disagreed, MAF795, was previously identified as L. dispar by RFLP, but 100% cox1 homology to H. cunea suggests otherwise. This specimen has been flagged as one that needs further analysis to determine its true identity.

(b) Case study 2: fruit flies

Background: Of the some 4000 species of fruit fly, around 250 are considered economic pests (White & Elson-Harris 1992). New Zealand remains the only major fruit producing country in the world that is free from them and significant investment has been placed in monitoring and surveillance systems to ensure their early detection and to minimize pathway risk (Cowley & Frampton 1989; Frampton 2000; Stephenson et al. 2003). The species, however, present different degrees of risk to New Zealand based on their host and climatic preferences and differing quarantine actions can result, i.e. to treat, re-ship or destroy the imported produce. Distinction of regulated and non-regulated species groups is the minimum diagnostic requirement, but identification to species is essential for accurate interception data and assessment of pathway risk. As for the tussock moths the majority of these species can be readily distinguished based on their adult morphology (White & Elson-Harris 1992), and late instar larval morphological keys are also becoming available for an increasing number of species (Carroll et al. 2004). Unfortunately it is usually the early instar larvae or eggs that are intercepted at the border in fruit of commercial consignments or accompanying overseas travellers cannot be identified morphologically beyond the family level.

Since 1994 a molecular diagnostic technique based on PCR–RFLP of the ITS1, 5.8S plus ITS2 rDNA regions (Armstrong et al. 1997b), has been used routinely to rapidly identify fruit fly intercepted at the New Zealand border to the species level. Also, in a similar manner to the tussock moth method, a species-specific PCR amplifying a 200 bp nested region has been used for the identification of degraded DNA associated with eggs found in cooked breadfruit (unpublished). Again while these do not present a threat themselves, accurate identification is necessary for comprehensive risk assessment. These methods effectively replaced the need to rear immature stages through to adults for identification, which was often unsuccessful or at best too slow for making timely biosecurity management decisions (Armstrong et al. 1997a). However, this approach has evolved from use with 19 original species to 49 (unpublished) and relies heavily on the host and geographic origin information to limit the list of likely species. Continuing to add more species has increasingly compromised the diagnostic sensitivity of the method as overlapping RFLP patterns become more common. Using DNA barcodes was therefore considered here as a method that might enable accurate identifications amongst a large number of species. The first data towards a fruit fly cox1 species profile, including the only two tephritid barcode region sequences available in Genbank at the time (see Electronic Appendix, table 3), is presented here and used to re-identify specimens intercepted at the border.

Results: one hundred and ninety three sequences, representing 60 species were used to create a tephritid cox1 reference profile (Electronic Appendix, figure 2). Forty one species were represented by 2–14 specimens taken from across their geographic range where possible. Nineteen species were represented by only one specimen. The profile NJ analysis generally resolved the taxa according to their morphologically derived taxonomy as genera, subgenera, species and species complexes (White & Elson-Harris 1992). Bootstrap support was high at the nodes (greater than 80%) for species that were not part of a species complex (discussed below). There were two exceptions. Bactrocera psidii ‘clustered’ weakly with Bactrocera trilineola; sequence divergence between them was 2.9%, compared to intra-specific 1.8% and 2.0%, respectively. Also, Bactrocera curvipennis was placed within the Bactrocera tryoni complex, but with weak bootstrap support. Interestingly, the latter are also difficult to distinguish by the PCR–RFLP method (Armstrong & Cameron 2000) although the adults are morphologically distinct. Others that came out on long branches, Bactrocera minuta, Bactrocera arecae, Bactrocera distincta and Dacus demmerezi, suggests that there is insufficient taxonomic representation around these species within the current dataset. There were also two discrepancies that warrant further investigation. One is the Bactrocera cognata specimen 1009 which did not cluster with its conspecifics (specimens 975, 976 and 1011) that were correctly located within the B. dorsalis species complex (Drew & Hancock 1994; figure 2 in the Electronic Appendix, here considered to include the Asian and Australian species within the branch that includes B. dorsalis (specimen 726) through to B. endiandrae (specimen 789)). The second was B. arecae which is part of the B. dorsalis complex but placed distantly from it. These specimens are flagged for morphological and/or molecular re-examination. Importantly, besides some exceptions within the species complexes, there were no sequences that were shared by different species. This is in contrast to the current nuclear rDNA method where some RFLP haplotypes are common to several species.

The cox1 sequences appear to be limited in their ability to distinguish taxa within the species complexes of B. dorsalis, B. tryoni (Morrow et al. 2000; figure 2 in the Electronic Appendix, species B. tryoni, B. neohumeralis and B. aquilonis) and A. fraterculus (Norrbom et al. 1999; figure 2 in the Electronic Appendix, all species within the branch starting from A. ludens through to A. fraterculus). This is interesting given that there were cases here, as for the tussock moth data set, where known subspecies could be distinguished. For example, the B. cucurbitae strains A and B (3.5% divergence between them here, versus 0.0% and 0.8% respectively within each) are anecdotally separated by host range (unpublished), and Bactrocera xanthodes and Bactrocera paraxanthodes (7.2% divergence between them here, versus 0.0% within each, respectively) are also separated by host (Drew et al. 1997). Each of these and other aspects of the reference profile data, such as the placement of B. xanthodes and B. paraxanthodes etc, are the subject of a more in-depth barcoding treatment in a separate publication in preparation.

Eighty one border intercepted specimens were identified by appending their cox1 sequences to the profile dataset (Electronic Appendix, figure 2). Based on the closest species which they associated with in the NJ tree, 73 (94%) of these identifications were in agreement with the previous RFLP method (table 1) with high bootstrap support (84–100%). The mean sequence divergence between the intercepted specimens and the profile sequences they grouped with was 0.9% (range =0.1%–5.3%). This is consistent with intraspecific cox1 divergences observed for a variety of insect taxa (Hebert et al. 2003a,b). Four identifications were in disagreement. Three of these were supported by high bootstrap support (Electronic Appendix, figure 2). Specifically, MAF274 was identified within the B. tyroni species complex by PCR–RFLP but as B. facialis here, MAF665 was B. passiflorae but B. facialis here and MAF940 was within the B. tyroni complex but Dirioxa pornia here. The fourth, MAF144, was weakly supported as being within the B. dorsalis species complex. The previous PCR–RFLP method had identified it either as Bactrocera kirki, Bactrocera trilineola or Bactrocera frauenfeldi which share common restriction profiles. However, bootstrap support for the entire B. dorsalis complex node was generally low in the profile dataset, indicating little confidence in the ability of these sequences to identify ‘unknowns’ within the complex. Further evaluation of these aberrant results as well as the groupings within the species complexes is underway. Finally, of the 81 border intercepted specimens analysed, two were previously classified as ‘unknown’, due to novel RFLP patterns. The cox1 analysis clearly placed them within the B. (B.) dorsalis complex, but interestingly not within the clusters containing the only four species (B. dorsalis, Bactrocera philippinensis, Bactrocera papayae and Bactrocera carambolae) for which RFLP profiles had been determined.

Conclusion: as the analysis stands the fruit fly cox1 sequences provide slightly better resolution, and also quantitative support in terms of bootstraps and divergence values, for species-level identification than was previously possible. The dataset however highlights the limit of cox1 to provide confident identifications within species complexes. For those outside the complexes, the cox1 data sort the species well in terms of their morphologically-based taxonomy, from genus to species. Ambiguity within the complexes may be a consequence of insufficient variation in cox1 to accurately reconstruct such recent divergences, but that reasoning does not appear to hold for other sub-species separations, such as that within B. cucurbitae. The disparity may be a function of the status of the alpha taxonomy of this genus which is still under scrutiny (Smith et al. 2003) and also the amount of systematic interest that certain taxa have received. In this case, in contrast to B. cucurbitae, the B. dorsalis complex has been extensively studied taxonomically. Until recently B. dorsalis was considered the single most significant fruit fly pest species in Asia. A recent revision by Drew & Hancock (1994) now recognizes it as part of a complex of 52 sibling species of which eight are economically important, and more species continue to be described (see Clarke et al. 2005). In addition, with the highly specialized taxonomic expertise required to distinguish these species with any confidence, their apparent ‘poor resolution’ in the current analysis may in part be a consequence of mistaken specimen identification by the suppliers.

Does this limitation at the level of species complexes invalidate the ability of this method to correctly identify unknowns? As the method stands, it appears to be no less accurate than the existing PCR–RFLP method. For the B. dorsalis complex, interspersion of species such as B. dorsalis sensu stricto, B. papayae and B. philippinensis is no less informative than before, and from a biosecurity point this is not an issue as they are all regulated species. In fact there have been no molecular studies that have been able to satisfactorily distinguish these three species to date (Clarke et al. 2005). The method does however promise to be more informative for other species within the complex such as Bactrocera kandiensis and Bactrocera caryeae which form a distinct group; interestingly this identified two border intercepted specimens that previously could only be resolved as far as the species complex. B. carambolae also forms a discrete group although it is not well supported. This is consistent with the PCR–RFLP method although it was very confidently distinguished from the others in the complex with that method. Additional notable improvements on the previous method are the separation of the species B. kirki, B. trilineola and B. frauenfeldi, which previously shared a common haplotype. Also, specimens that previously gave novel (unknown) PCR–RFLP haplotypes are now identifiable, associating with reasonable bootstrap support to particular profile taxa and providing clues as to the species gaps that need filling.

5. Discussion

(a) Comparative utility of DNA barcodes

In contrast to the other molecular diagnostic methods referred to here, DNA barcoding has some significant advantages: (i) it provides a more accurate and robust approach to diagnosis by using all of the targeted genetic data. Species-specific PCR and PCR–RFLP diagnoses, on the other hand, utilize small windows of the data at priming or restriction sites, ignoring most of the genetic information. (ii) incorporating the range of intra-specific polymorphism by adding as many reference sequences as possible clearly enhances the robustness of any key and assignment of species identification. In contrast, our PCR–RFLP procedures actively avoided using restriction enzymes that detected sub-specific polymorphism because of the ambiguity in interpreting it as species- or population-level variation. (iii) using a tree-based approach enables all the data to be observed at a glance; this is very cumbersome to manage with PCR–RFLP. (iv) the NJ analysis also provides quantitative data with sequence divergences and bootstrap values that give a measure of confidence in the identifications. In general terms, our conclusions about the relative benefits of sequence ‘tree’-based species identification methods, concur with those of other similar molecular identification keys for thrips (Brunner et al. 2002), nematodes (Floyd et al. 2002) and whales (Ross et al. 2003).

From a practical perspective, identifications can be achieved on a par with RFLP analyses, within a 24 h period from extraction to NJ analysis and for around the same cost, if not less. This is given in-house sequencing facilities and an appropriate reference dataset. However, even if the latter is not available, it is much easier to build this up over a relatively short period of time compared to the same for a diagnostic suite of restriction patterns. Assuming that the same care is taken over sequence quality and interpretation as it would be for quality and interpretation of electrophoretic gels, DNA barcodes provide a robust alternative to PCR–RFLP. With the exception of speed, this is also the case with species-specific primer methods. However, that appeared here to be the least reliable method which is not unexpected given the potential ambiguity associated with presence/absence of a PCR product and assumed ‘specificity’ of PCR primers. Brunner et al. (2002) decided not to dispose of the PCR–RFLP profiles for the benefit of other laboratories that do not have convenient DNA sequencing facilities. However, this is unlikely to present a barrier in the future. Even now, DNA sequencing technology is becoming more accessible through a number of dedicated local and offshore commercial sequencing facilities structured for competitive pricing and rapid turnarounds.

(b) Test cases

The two groups of taxa analysed here presented different challenges to using cox1 barcodes for species identification. For the tussock moths there had been difficulty in placing previous RFLP ‘unknowns’ to species. This obstacle was overcome here to a certain extent with subsequent inclusion of a much broader taxonomic range of lepidopteran cox1 sequence data, available through BOLD and to a lesser extent in Genbank. That enabled several unknowns to be assigned to likely genus and species within the Noctuidae and Arctiidae. In retrospect it is not unrealistic to expect that other, non-lymantriid moth species were being intercepted. Females of other lepidoptera, besides gypsy moth (Leonard 1981), oviposit indiscriminately, including on inanimate objects during population outbreaks or when attracted to lights of human settlements. As arrival of these species can not be easily predicted the design of a more comprehensive PCR–RFLP method was not possible. In contrast, placing sequences of the unknowns within a broader taxonomic context was made possible by publically available lepidopteran cox1 data. This now provides a guide as to how best to target further collections and fill the reference species gaps.

Of the previously unknown ‘tussock moth’ species, four were identified as the fall web worm, Hyphantria cunea. This evidence of multiple entry events could elevate this species in terms of risk. In fact, around the same time in 2003, this species was found to have established a localized population in New Zealand. Fortunately, this was eradicated. An active surveillance campaign has operated since that time and interestingly, two more finds have been made recently. Fall webworm had never been found in New Zealand or Australia prior to 2003. It is native to North America and Mexico, but since establishing in Europe and parts of Asia in the 1940s and 1950s, it has become a significant pest of trees in these continents. The identification here of immature life stages on imported used vehicles is evidence that a pathway of entry exists. It also highlights that species other than gypsy moth enter New Zealand via the imported used vehicle pathway more frequently than was originally thought.

In contrast to the tussock moths, the species of immature fruit flies entering and the pathways involved are more predictable due to their close host association. Consequently it has been easier to target species that should be included in an appropriately comprehensive cox1 reference dataset. This was reflected here with considerable confidence in the dataset for making identifications within the Tephritidae. The challenge for barcoding instead was recent evolutionary divergences, at the level of species complexes where blurry species boundaries exist and undermine confidence in identifications. In spite of this, for some cases it was an improvement over the PCR–RFLP method. For example, it was possible to place specimens previously identified as ‘B. dorsalis complex’ with reasonable bootstrap support and minimal sequence divergence (see figure 2 in the Electronic Appendix) to a likely species, B. caryeae.

Distinguishing recently diverged taxa is no less an issue for morphologically-based identifications. The reasons for this are varied, but not least is the status of the alpha taxonomy. Confusing species limits may exist through species being ‘oversplit’ or ‘overlumped’ and the phenotypic boundaries not adequately reflecting the speciation process (Funk & Omland 2003). Nevertheless precise species identification remains very important for biosecurity. Firstly, some species within these complexes are more aggressive in their ability to invade and reach pest status than others. B. dorsalis sensu stricto and B. carambolae are very successful invaders, but they are difficult to distinguish from, and have overlapping host and geographic ranges with B. verbascifoliae, which is not a recognized pest. In reality many fruit fly species, especially tropical ones, are unlikely to be serious pests per se in New Zealand, but accurate diagnosis of regulated pests is still necessary to avoid trade restrictions of significant impact. Secondly, some morphologically indistinct regulated species, such as B. philippinensis and B. papayae, have different host and geographic ranges, which is important information for assessing the specific risk and pathway involved. In that situation, extension of the barcode system to include a second (or more) gene region for resolving what is not possible with the cox1 Folmer region is justified. This might involve a nested approach to include regions such as 3′ cox1 or the hypervariable 16S rDNA that may evolve at a slightly faster rate. It also warrants considerable effort to include specimens from across the host and geographic range.

(c) Barcodes for biosecurity in the future

Access to ‘historical’ datasets and the same specimens has provided a valuable opportunity to comparatively test the appropriateness and power of the barcoding approach for identifying unknown organisms. The test cases here support the view that cox1 barcodes offer the best opportunity to date to form the foundation of a flexible and accurate identification system for invasive insect species. Given the potential universality of the application, this approach promises to address the standardization and efficiency needs that are severely lacking at the moment for the international biosecurity community. The approach is even more appealing with the ease of technical transfer between laboratories, enabling consistency of the process from PCR to the alignment of homologous sequence data and sharing of sequence data from diverse and unrelated sources. It lends itself well to automation and as a basis for development of the next generation of diagnostic tools such as micro-array technology (Kochzius et al. 2004). DNA barcoding may also contribute to digitized collections more easily than that proposed for morphologically based identifications (Gaston & O'Neil 2004).

Nevertheless there are a number of issues that will need to be taken into account. There continues to be debate as to the reliability of a cox1 barcode species identifier given issues of diversity related to mtDNA phylogeography and the rules determining ‘species’ limits (Moritz & Cicero 2004). Suggesting that just one gene can supply the diagnostic needs for all would be rather naive. Even with our data set there are inconsistencies. For example with the fruit flies cox1 does not confidently discriminate some of the species within the B. dorsalis complex, for which an additional gene region may be appropriate. However it clearly separates races within other species. This implies issues with the alpha taxonomy. It also highlights the fundamental need for continued participation of taxonomists to provide accurately identified reference material and to describe new species potentially discovered during the process.

Employing DNA sequences from known adult specimens to identify their morphologically indistinct immature life stages, as has been considered here, illustrates the power of molecular data to complement (and enhance) the morphological approach to insect diagnoses. Using barcode data in this context, i.e. matching sequences within taxa, should not suffer from the same potential for misidentification as in other diagnostic situations where incongruence with a priori predictions based on morphology requires a more comprehensive taxonomic approach (Paquin & Hedin 2004). However as with other molecular diagnostic methods DNA barcode-based identifications will clearly depend heavily on the availability of appropriate reference taxa to avoid problems associated with inadequate taxon sampling (Moritz & Cicero 2004). Despite this, given the largely unique nature of cox1 sequences at the species level demonstrated here (outside of species complexes) and in other studies (e.g. Hebert et al. 2003a,b; Hogg & Hebert 2004; Whiteman et al. 2004; Ball et al. 2005; Janzen et al. 2005; Shander & Willassen 2005), missing taxa are likely to lead to non-identification, not misidentification. Non-identification will be recognized by the ‘unknown’ appearing on a relatively long branch with low bootstrap support. Misidentification on the other hand could result from close proximity to a congener and absence of a conspecific reference sequence. This is most likely to occur for recently diverged taxa, and is no less an issue for any other diagnostic method. Robust diagnostic procedures should however be able to pre-empt this in part or completely by making every effort to understand reference taxa from a ‘whole organism’ point of view, i.e. using common sense to incorporate relevant biological aspects into identification of unknowns.

Accuracy of identifications is also dependent on reliability of the simple sequence-similarity approach. In this case using the NJ model the analysis is not constrained by potential rate variation and different base composition amongst taxa as are phylogenetic treatments of the data such as the use of maximum likelihood. In fact the need for rapid identifications in biosecurity would preclude the use of a phylogenetic analysis with the computation of such large datasets taking several days. Conversely, occasional phylogenetic treatment of the data could provide a useful strategy for determining the addition of species when taxon gaps are inferred by anomalous placings and long branches. Indeed, the most important improvement of the barcode method over all other diagnostic methods is the ability to continually add in more species. This introduces a highly desirable, more anticipatory diagnostic approach that will enable the international IAS diagnostic community to better cope with changing and local species priorities, to capitalize on the efforts of others and to standardize technologies. There is, nevertheless, unlikely to be a ‘one size fits all’ molecular diagnostic approach to biosecurity which benefits from the use of tools on a case-by-case basis. For example DNA barcoding would not replace routine species-specific portable tests such as the serological Lateral Flow Devices (LFDs) used in the field for confirming plant pathogen identification (Hughes et al. 2005). However inclusion of DNA barcoding in the molecular diagnostics ‘toolbox’ does offer a level of transparency for species identification across countries, and sectors within countries, that is not possible with the current uncoordinated adoption of numerous different diagnostic methods.

Aspects that need to be seriously considered towards adopting this on a global scale will be the ability to obtain validated specimens for building robust reference profiles, an agreed framework for the sharing and quality control of sequence data, confidence in the current user-friendly bioinformatic analyses and rules or standardized criteria for interpreting barcode results. Importantly, the rigorous assessment of these will now be possible through the Barcode of Life initiative (, assisted by the Consortium for the Barcode of Life ( The significant international momentum gathering for this programme will facilitate the emergence of a globally collaborative and less fragmented approach to molecular diagnostics and is entirely appropriate for international biosecurity.


We thank the organizers of the inaugural meeting of the Consortium for the Barcoding of Life (London 2005) for inviting presentation of this work. We thank Robyn Cowan, Ruth Frampton and Barney Stephenson for constructive reviews of the manuscript, Lalitha Karunaratne for technical assistance with sample preparation and DNA sequencing, the many collaborators that have generously made time to provide us with specimens and the University of Guelph for BOLD DNA sequences. This work was supported by the Tertiary Education Commission of New Zealand through the Centre of Research Excellence Fund.



    View Abstract