To date most global approaches to functional genomics have centred on genomics, transcriptomics and proteomics. However, since a number of high-profile publications, interest in metabolomics, the global profiling of metabolites in a cell, tissue or organism, has been rapidly increasing. A range of analytical techniques, including 1H NMR spectroscopy, gas chromatography–mass spectrometry (GC–MS), liquid chromatography–mass spectrometry (LC–MS), Fourier Transform mass spectrometry (FT–MS), high performance liquid chromatography (HPLC) and electrochemical array (EC-array), are required in order to maximize the number of metabolites that can be identified in a matrix. Applications have included phenotyping of yeast, mice and plants, understanding drug toxicity in pharmaceutical drug safety assessment, monitoring tumour treatment regimes and disease diagnosis in human populations. These successes are likely to be built on as other analytical and bioinformatic approaches are developed to fully exploit the information obtained in metabolic profiles. To assist in this process, databases of metabolomic data will be necessary to allow the passage of information between laboratories. In this prospective review, the capabilities of metabolomics in the field of medicine will be assessed in an attempt to predict the impact this ‘Cinderella approach’ will have at the ‘functional genomic ball’.
The term metabolomics (and the related term metabonomics) was coined at the end of the 1990s to describe the development of approaches which aim to measure all the metabolites that are present within a cell, tissue or organism during a genetic modification or physiological stimulus (Oliver et al. 1998; Nicholson et al. 1999; figure 1). In this respect, there are obvious parallels with the definitions of transcriptomics and proteomics as tools for measuring global mRNA and protein expression, respectively. The major technologies that have been used for this process include 1H NMR spectroscopy, gas chromatography–mass spectrometry (GC–MS) and liquid chromatography–mass spectrometry (LC–MS), in conjunction with pattern recognition approaches (Griffin et al. 2003; figure 2). However, because of the very diverse range of metabolites found within the cell, in conjunction with the large dynamic range of metabolite concentrations, none of these technologies, even when used in conjunction with one another, can be relied upon to generate a complete metabolomic description of even the simplest organism. To date the upper range of metabolite identification and quantification is of the order of hundreds (Fiehn 2002), which can be compared with the thousands of proteins that can be analysed by proteomic approaches and the capability of measuring mRNA expression across the complete genome of many organisms. In part for this reason, and in part because of the relatively recent description of metabolomic approaches, interest in metabolomics has lagged behind transcriptomics and proteomics.
However, despite the immense technical challenges associated with analysing a wide range of metabolites of varying concentrations the approaches being developed have several distinct advantages. These techniques are high throughput and are cheap on a per sample basis, making them ideal as rapid screening tools. This has led to the development of a number of screening assays for human populations for diseases such as coronary artery disease (Brindle et al. 2002) and diabetes, as well as its use in toxicology as part of the drug safety assessment process (Nicholson et al. 2002; Griffin & Bollard 2004). Metabolomics has also proven sensitive enough to a number of subtle genetic modifications including silent mutations in yeast (Raamsdonk et al. 2001; Allen et al. 2003) and understanding ecotypes in Arabidopsis (Fiehn et al. 2000). In this prospective review, the capabilities of metabolomics in the field of medicine will be assessed in an attempt to predict the impact this technology will make to the field of functional genomics and systems biology, and in particular how the approach may compete with and complement the more widespread approaches of transcriptomics and proteomics. In order to do this, a brief review of the key publications in the field is given.
2. Recent success stories in metabolomics
(a) Comparing silent phenotypes in yeast
Yeast was the first eukaryote to be sequenced and cell banks exist for all the ∼6000 mutants which are viable in this organism (for example, the EU funded project EUROFAN; Oliver et al. 1998). While the challenge of defining a complete metabolome for an organism is confounded by the need to define gene–environment interactions for every physiological or pathological stimulus, the existence of such cell banks suggests that it may be possible to phenotype every viable mutant in yeast. This could have a significant impact on human disease through comparison of gene sequences found in yeast or correlating phenotype and metabolic profiles found in both yeast and man.
To carry out a comprehensive genomic screen of all the viable mutants of an organism a high throughput rapid phenotyping tool is required. The standard way to phenotype yeast strains is to examine how rapidly a strain grows on given substrate mixtures; for example comparing normal media with carbon and nitrogen limited media. However, if the mutation does not alter the rate of growth it is said to be a silent mutation, and it is found that even mutations involved in energy metabolism may conspire with the network of metabolic pathways in yeast to produce a silent phenotype. These metabolic perturbations should be detectable by metabolomics and both NMR and mass spectrometry have been used to distinguish various silent phenotypes in yeast (Raamsdonk et al. 2001; Allen et al. 2003; Steuer et al. 2003). Furthermore, the detected metabolic profiles can be used to cluster genes of a similar function according to the perturbations the genes induce.
In a proof of concept paper Raamsdonk et al. (2001) used 1H NMR spectroscopy to study the metabolic changes induced in the different yeast mutants including two mutants of 6-phosphofructo-2-kinase. Not only did these mutants cluster together following principal components analysis (PCA) and discriminant function analysis, but also mutants of oxidative phosphorylation clustered together. This phenotyping approach can also be applied to the media the yeast is grown in, with this variation to the approach being referred to as ‘metabolic footprinting’ (Allen et al. 2003). Indeed, metabolic footprinting has several advantages over conventional analysis of cell extracts, including easy sampling without destroying the cells and the analysis of excretory metabolites which may concentrate in the media to far higher concentrations than would be viable within the cell.
(b) Genetic modification and environmental interactions in Arabidopsis thaliana
The challenge of metabolomics in the plant kingdom is potentially larger than that in animal and microbial metabolism. While plant genomes typically contain 20 000–50 000 genes there are currently 50 000 metabolites identified in plants with the total number estimated to rise to ∼200 000 metabolites (De Luca & St Pierre 2000).
In one of the landmark papers in plant metabolomics, Fiehn et al. (2000) used GC–MS to identify and quantify 326 distinct metabolites in Arabidopsis leaf extracts. Furthermore, they assigned the chemical structure to half of these metabolites. PCA of the data was able to separate four genotypes representing the interactions between a gene mutation and different wild type backgrounds associated with a given ecotype. Since this paper, the researchers have expanded their approach to include GC–time of flight–MS to detect and characterize ∼1000 metabolites (Fiehn 2001, 2002). They have also used the huge datasets they have acquired from different mutants to identify metabolites which appear correlated during a given biological intervention, referring to these as ‘metabolic cliques’ (Steuer et al. 2004). It is hoped that understanding these metabolic cliques will give a better understanding of how metabolites interact to form a complete metabolic network in an organism and how genetic modifications can cause perturbations in this network. Since these landmark papers a number of researchers have exploited the sensitivity and reproducibility of GC–MS based metabolomics for studying plant metabolism (for example, read Gullberg et al. 2004 for a discussion of optimizing the process).
(c) A high throughput diagnosis of coronary artery disease
A diagnostic tool for any common human disease must be capable of scaling up throughput, without significant costs per sample, so that the tool could potentially be used to screen a large population. With the advent of flow-probe technology NMR spectroscopy is potentially one such approach in terms of screening for high-concentration metabolite biomarkers. This has allowed a number of research groups to screen for metabolic disorders in urine from humans.
One such study has used pattern recognition approaches to analyse 1H NMR spectra of blood plasma in order to determine whether patients had coronary artery disease and predict the severity of this disease (Brindle et al. 2002). The mathematical analysis of the data involved a combined use of a data filtering procedure called orthogonal signal correction (OSC) followed by PCA or partial least squares-discriminate analysis (PLS-DA). OSC removes variation that is orthogonal to the variation associated with the process of interest, and hence removes variation that is not correlated with a disease, toxic insult or other manipulation being examined in a metabolomic study.
The subsequent pattern recognition models were capable of predicting the occurrence and severity of the disease with over 90% accuracy, reporting the severity as one, two or three vessel disease. This analysis could also be performed on samples from both males and females, and patients on statins, drugs used to treat high cholesterol and blood pressure. However, the current gold standard for detecting coronary artery disease is set by angiography, which is ∼99% accurate. Since the publication of this manuscript, the researchers have begun a study entitled MAGE-CAD to use both combined metabolomics and transcriptomics, based on 1H NMR spectroscopy and mRNA microarrays. It is hoped the increased data from the DNA microarrays will further increase the predictability of the pattern recognition models produced so that a biomarker approach will be as diagnostically robust as angiography. Furthermore, such a 1H NMR/pattern recognition analysis of blood plasma has also correlated metabolite changes to measured blood pressure (Brindle et al. 2003). The possibility of NMR spectroscopy playing a relevant role in diagnosis in hospitals is further augmented by the relative cheapness on a per sample basis to carry out such analyses (typically less than £1), the increased automation and simplification of spectrometer operation, and the introduction of shielded magnets which allow the housing of magnets within much smaller rooms.
(d) A marker of peroxisomal proliferation in rats and man
There are three types of peroxisome proliferators-activated receptors (PPARs) referred to as PPAR-α, PPAR-γ and PPAR-δ. Dysfunction of these receptors has been linked to a number of metabolic disorders including type II diabetes and dyslipidaemia. Furthermore, stimulation of these receptors by drugs has produced some of the most effective methods for treating these diseases by stimulating the expression of enzymes involved in β-oxidation and fatty acid degradation. However, one potential concern of agonists of the PPAR family is that some drugs, as well as changing enzyme expression, have produced large scale peroxisome proliferation which leads to liver dysfunction and also hepatic cancer in rats. This serious potential side effect is further compounded by the fact that peroxisome proliferation can only be monitored by electron microscopy of liver tissue. Thus, unless biomarkers for peroxisome proliferation can be identified, this side effect could not be monitored in humans during clinical trials.
To address this issue Ringeissen et al. (2004) examined the urinary metabolites of rats following exposure to two PPAR ligands using a combination of 1H NMR spectroscopy and PCA. They identified two biomarkers, N-methylnicotinamide and N-methyl-4-pyridone-3-carboxamide (4PY) both metabolites found in the tryptophan–NAD+ pathway. Furthermore, the concentrations of both these metabolites were correlated with peroxisome proliferation as measured by electron microscopy during post-mortem analysis of the liver tissue. Real time PCR confirmed transcriptional changes on the tryptophan–NAD+ pathway, indicating agreement between both metabolomics and transcriptomics in terms of deducing the mechanism of action of these drugs. Since then, a LC–MS based assay has been developed for the two biomarkers in order to monitor potential peroxisome proliferation during the stimulation of the PPAR receptors.
(e) Monitoring gene therapy in Duchenne muscular dystrophy
Duchenne muscular dystrophy is an X-linked disorder affecting 1 in 3500 male births (Hoffman et al. 1987). The disease is characterized by progressive muscle wasting as well as a non-progressive neurological deficit (Hoffman et al. 1987; Emery 1989).This reflects that dystrophin is expressed in three full length isoforms as M-dystrophin in muscle tissue, C-dystrophin in the cortex and P-dystrophin in the cerebellum. While the exact role of dystrophin in the brain is still largely undetermined, in muscle tissue dystrophin provides stability to the sarcolemmar membrane.
The mdx mouse is a mouse model of Duchenne muscular dystrophy possessing a gene deletion for dystrophin (Tanabe et al. 1986). While the disease in humans is characterized by severe muscle wasting culminating in death within the second decade (Hoffman et al. 1987; Emery 1989), in the mdx mouse deletion of dystrophin produces a mild pathology, with only a small reduction in longevity in this animal (Tinsley et al. 1996), a pronounced hump, and lower muscle regenerative capacity (McIntosh et al. 1998a,b).
A number of studies have already demonstrated that mdx mice are categorized by both increased concentration of taurine and decreased creatine in the heart and muscle tissue in general (McIntosh et al. 1998a,b; Griffin et al. 2001). This included one study which simultaneously also identified metabolic fingerprints associated with a failure to express dystrophin in the cortex and cerebellum (Griffin et al. 2001). In these studies the concentration of taurine has been correlated with the ability of muscle tissue to regenerate (McIntosh et al. 1998a). Using a combination of both 1H NMR spectroscopy and pattern recognition techniques, Griffin et al. (2002) investigated a mouse model based on the mdx mouse where the lack of dystrophin in muscle tissue, but not cardiac tissue, was compensated for by the upregulation of utrophin, a dystrophin related protein. Increasing utrophin expression has raised much interest as a potential therapy for DMD, as muscle wasting is only apparent in sufferers of DMD after utrophin expression decreases shortly after birth. Furthermore, the dystrophic phenotype normally observed in mdx mice is absent when muscles overexpress utrophin (Tinsley et al. 1998), and utrophin based gene therapy has been suggested as one potential treatment for Duchenne muscular dystrophy (Burton & Davies 2002).
The concentration of taurine in skeletal muscle of mdx mice expressing the utrophin transgene (Tgfulllength/Dmdmdx) was intermediate between mdx and control mice, suggesting that the metabolic profiles determined by 1H NMR spectroscopy could potentially be used to follow gene therapy. One role for the increased taurine in dystrophic tissue could be to compensate for increased intracellular Ca2+ in the tissue (Huxtable 1992).
Following on from these studies of DMD, Jones et al. (2005; figure 3) have simultaneously modelled the metabolic profiles produced by 1H NMR spectroscopy of extracts of heart tissue from the mdx mouse, a mouse model of cardiac hypertrophy (muscle LIM protein knock out mouse) and two mouse models of cardiac arrhythmia. The metabolic profiles demonstrated that strain background is an important component of the global metabolic phenotype of a mouse, providing insight into how a given gene deletion may result in very different responses in diverse populations. Despite these differences associated with strain, multivariate statistics was capable of separating each mouse model from its control strain, demonstrating that metabolic profiles could be generated for each disease. Thus, metabolomics provides a rapid method of phenotyping mouse models of disease.
(f) Understanding apoptosis in tumours
While the use of NMR spectroscopy as a tool for metabolomics is sometimes limited by the sensitivity of the approach, one very important aspect of the approach is that NMR can be applied in vivo (referred to as magnetic resonance spectroscopy—MRS). Perhaps the area that this has most been demonstrated is the field of cancer research. Studies using MRS in vivo have been able to distinguish a number of tumour types (for example, Tate et al. 1996a, 1998) despite in vivo spectroscopy being limited by sensitivity and broad metabolite resonances, while in vitro studies of tumours and cell lines have also been able to suggest how these tumours differ in terms of biochemical pathways (Florian et al. 1995a,b). In addition high resolution magic angle spinning (HRMAS) 1H NMR spectroscopy, an approach capable of producing high-resolution spectra from intact tissue, has also been widely used to follow tumour metabolism (Cheng et al. 1996; Tomlins et al. 1998; Millis et al. 1999; Chen et al. 2001; Tate et al. 2003).
One of the first applications of metabolomics both in vivo and in vitro involved the investigation of hypoxia inducible factor (HIF)-1β in tumour metabolism and growth (Griffiths et al. 2002, Griffiths & Stubbs 2003). The expression of HIF-1β is increased in a number of cancers resulting in the upregulation of proteins in a number of metabolic pathways including glucose transporters, glycolytic pathways and vascular endothelium growth factor, thus increasing the flux through glycolysis in these tumours. Hepatoma cells deficient in HIF-1β grown as solid tumours in mice have reduced rates of growth, but magnetic resonance imaging indicated no decreased vascularity in HIF-1β deficient tumours, despite marked metabolic dysfunction in terms of ATP content as measured by 31P NMR spectroscopy.
In vitro NMR spectroscopy indicated a significant decrease in phosphocholine, choline, betaine and glycine, and suggested the cause of the reduced ATP content of the mutant cells. Glycine is formed from the glycolytic intermediate 3-phosphoglycerate and is an important source of one-carbon units for the synthesis of nucleotides, via serine. Thus, if HIF-1β deficient tumours had reduced glycolytic flux, and hence glycine content, this would impair nucleotide synthesis producing a decrease in ATP. As glycine can also be produced via choline and betaine this also explained the other results produced by NMR-based metabolomics.
A combination of in vivo, in vitro and HRMAS 1H NMR spectroscopy has also been used in conjunction with PCA to examine polyunsaturated fatty acids (PUFA) which accumulate in glioma during gene therapy-induced programmed cell death (PCD) (Griffin et al. 2003; Lehtimaki et al. 2003; Valonen et al. in press). Metabolomics demonstrated that the concentration of 1H NMR detectable PUFAs increased threefold, with pattern recognition identifying CH=CH and CH=CHCH2CH=CH as being the most significant in monitoring the dynamics of PCD. Furthermore, a range of biophysical data obtained using relaxation and diffusion rate measurements by NMR indicated the lipids found to increase during apoptosis were most likely in cytoplasmic lipid vesicles. The most likely cause of these 1H NMR detectable lipids formed during PCD is the breakdown of cell constituents forming cytoplasmic lipid vesicles in dying cells.
One of the problems with interpreting NMR spectra obtained in vivo and by HRMAS 1H NMR is co-resonant metabolites, where a number of chemically related metabolites may have overlapping resonances. This suggests that these studies may benefit from further examination of tissue extracts using GC- and LC–mass spectrometry or LC–NMR capable of separating many chemically similar metabolites prior to analysis. However, Griffin et al. (2003) have shown that HRMAS 1H NMR spectroscopy detects the metabolically active metabolites present in a tissue, as opposed to components of the cell membrane and other compartments isolated from normal metabolism, and conversely techniques relying on tissue extracts tend to sample the entire cell or tissue.
3. How many metabolites do we need to measure?
There have been various estimates made of the total number of metabolites found within a cell. Of course, the metabolite complement will be dependent on the exact cell type, with cells from multicellular organisms expected to have a smaller number of metabolites because of differentiation and specialization compared with unicellular organisms. A simple consideration of the metabolites that appear in the metabolic pathways found in mammalian cells estimates the metabolite complement of a mammalian cell to be ∼650, although this number does not reflect the diversity of lipid modifications possible. For example, if it is assumed that there are about 40 commonly occurring fatty acids within the cell, then triglycerides, each containing three fatty acids esterified to a central glycerol backbone, could represent a group consisting of 40×40×40 or 64 000 metabolites. Applying a similar analysis to fatty acids, fatty acid acyl CoA, monoacylglycerides, diacylglycerides and triglycerides this produces a total of 69 000 metabolites (Berger 2004). This simple calculation does not take into consideration the preference of certain types of fatty acids to particular positions on the glycerol backbone, but does demonstrate the potential diversity of metabolites. Furthermore, the dynamic range of metabolites is likely to be of the order of 109 with a wide range of polarities. However, do we need to measure all of these metabolites in order to understand metabolism?
High-resolution 1H NMR spectroscopy is a relatively insensitive technique which only measures the high-concentration metabolites found within a tissue extract, biofluid or intact tissue. Typically, using a simple one-dimensional pulse sequence 30–100 metabolites are observable in urine, 20–30 metabolites in blood plasma or serum and 10–30 metabolites in tissue extracts. Despite this insensitivity, 1H NMR based metabolomics (often termed metabonomics) has been successfully used to distinguish a range of liver and kidney toxins in the rat (Beckwith-Hall et al. 1998; Holmes et al. 1998; Mortishire-Smith et al. 2004), a number of cardiac diseases in mouse models (Griffin et al. 2002; Jones et al. 2005) and silent phenotypes in yeast (Raamsdonk et al. 2001). One can begin to understand these conflicting viewpoints when we consider the metabolites that 1H NMR spectroscopy detects. Many of these high-concentration metabolites are found in a number of metabolic pathways, and in terms of the metabolic network of the cell these metabolites represent central hubs of metabolism (Brindle 2003; Griffin et al. 2003). Thus, a perturbation at one point in the metabolic network can rapidly be transferred to other metabolic pathways through these highly connected metabolites. Thus, we would expect the high-concentration metabolites to be highly variable during a perturbation, and hence 1H NMR spectroscopy sensitive to a wide range of genetic modifications, toxicology insults and physiological stimuli.
However, there is a negative side to the detection of high-concentration metabolites. If these metabolites vary across a wide range of modifications, it then becomes difficult to use these metabolites as unique biomarkers for disease processes or for deducing which pathways have been stimulated during a modification. This can be further confounded by the detection of non-specific effects, which accompany a genetic modification or toxicological insult. For example, Connor et al. (2004) have observed that many so-called biomarkers of liver and kidney toxicity, including hippurate, taurine, citrate and 2-oxoglutarate, in fact arise from the effects of food restriction induced in animals that are unwell following a toxic insult. This illustrates the importance of using a pair fed control group in investigative toxicology studies if the metabolic markers identified are to be used reliably to deduce which metabolic pathways are perturbed. It also highlights the sensitivity of metabolomic studies to changes associated with general stress, which should be considered when carrying out toxicology tests.
An alternative to measuring metabolic fingerprints of high-concentration metabolites is to use approaches capable of detecting unique biomarkers. Because of the unique concentrating power of the kidneys, it is possible to detect such biomarkers in the 1H NMR spectra of the urine, and biomarkers have been postulated for testicular toxicity (Nicholson et al. 1989), phospholipidosis (Nicholls et al. in press) and peroxisome proliferation (Ringeissen et al. 2004). However, in general for subtle genetic modifications or mild drug interventions discrete biomarkers are not detectable among the high-concentration low-molecular weight metabolites detectable by conventional 1H NMR spectroscopy. An alternative is to use more sensitive techniques such as GC–MS or LC–MS to detect lower concentration metabolites. These metabolites should be far more discriminatory compared with metabolic fingerprints, and from a theoretical basis the majority of metabolites within the cell are involved in a single pathway. However, despite the potential of mass spectrometry based approaches to identify unique biomarkers for disease and toxic insults studies, none have been reported currently in the literature.
4. ‘Poly-omics’: cross correlation of metabolomics and other ‘-omic’ technologies
(a) Genomics and metabolomics
One of the first uses of metabolomics was as a rapid phenotyping tool for interesting mutants produced in plants, animals and microbes (Fiehn et al. 2000; Gavaghan et al. 2000; Raamsdonk et al. 2001; Jones et al. 2005). This has produced some surprises in terms of our understanding of functional genomics. For example, when Fiehn et al. (2000) examined mutants of Arabidopsis using GC–MS they found that the dominant effects in the samples they examined were related to ‘ecotypes’ (the genetic background associated with a particular ecological niche) rather than the mutants per se.
Similarly, in mice, Gavaghan et al. (2000) found that C57BL10J and Alpk : ApfCD mouse strains of healthy mice could readily be separated by the metabolic profiles obtainable from 1H NMR spectroscopic analysis of urine (figure 4). This largely resulted from TCA cycle intermediates and methylamine pathway products, demonstrating that some of the key metabolic intermediates were highly variable across different strains. Furthermore, they were able to postulate that one of the strains may be predisposed to certain metabolic disorders as a result of the metabolic phenotype, or ‘metabotype’ associated with that mouse strain. The importance of strain background has previously been noted in a number of physiological studies. Bendall et al. (2002), investigating vascular responses to nitric oxide (NO) in the murine heart, found distinct strain dependences. Comparing MF1, C57BL/6J and 129SV mouse strains, the hearts from the three strains differed in responses to endothelium-derived vasodilators, exogenous NO, eNOS expression and superoxide production. The authors concluded that as an aspect of many pathologies, including cardiac hypertrophy, hypertension and heart failure, was the generation of reactive oxygen species, the strain background might have a profound effect on the development of the disease in a given murine model. Indeed, Linder (2001) has suggested that all the control strains available should be used as the wild type controls for a transgenic or knock-out mouse model. Furthermore, the background for many disease models may be dynamic in nature. Many transgenic and knock-out mice are generated on a mixed background, and with subsequent generations any number of alleles, all potentially interacting with the gene under investigation, could be fixed into the final genome (Hintze & Shesely 2002). Thus, rapid and high through-put processes for phenotyping mouse models are set to become increasingly important to address such issues.
Many of the approaches used to generate a metabolic profile are relatively cheap on a per sample basis. For example, excluding the significant investment required to purchase the analytical equipment, the cost of preparing a sample for NMR spectroscopy is less than £1, and for metabolic profiling by GC–MS and LC–MS this cost is probably only three to four times larger. This allows the analysis of a wide range of tissues, organs or biofluids during a phenotyping study of a given model of disease. This was illustrated by Griffin et al. (2004a,b) when investigating a mouse model of Machado–Joseph disease (spinocerebellar ataxia), where as well as identifying metabolic deficits in the cerebellum, metabolic abnormalities were also identified in the cortex, a region of the brain not thought previously to be affected by the disease. Furthermore, the cheap sample costs allow the profiling of large populations, as illustrated by Lenz et al. (2004) who investigated dietary differences between Swedish and British healthy individuals across a total population of 180 individuals.
The relative ease in which metabolic biomarkers can be examined in a range of species has allowed the simultaneous comparison of several different species exposed to a common physiological or environmental stimulus or lesion. For example, Bundy et al. (2005) have investigated the impact of metal contamination on a range of worm species including Lumbricus rubellus, Lumbricus terrestris and Eisenia andrei. Metabolites identified by 1H NMR spectroscopy and PCA demonstrated that L. rubellus responded to a greater extent when compared with E. andrei, demonstrating how metabolic phenotypes can be used to identify which species are most affected by a given ecotoxicology lesion. Similarly, Bundy et al. (2004) have used metabolomics to investigate different Bacillus cereus strains to separate the microbes according to ecotype. Unlike genomic data, metabolomic analysis using 1H NMR spectroscopy successfully separated the strains into laboratory strains and clinical strains. This approach could be widely applicable to general microbiology in order to identify different functional and physiological ecotypes on bacteria.
(b) Transcriptomics and metabolomics
(i) Fatty liver disease
Supplementation of orotic acid to normal food intake is known to induce fatty liver disease in a number of mammalian species including the rat. While this accumulation is similar to that induced by alcohol, and it is known that orotic acid supplementation disrupts the production of various Apo proteins, the exact mechanism from orotic acid exposure to disrupted lipid transport is unknown. To address this Griffin et al. (2004b) applied a combined genomic, transcriptomic and metabolomic approach to the problem.
While classically the out bred Wistar rat strain has been used to investigate orotic acid induced fatty liver disease, the in bred Kyoto strain was found to be more predisposed both at the transcriptional and metabolic level following a combined analysis of the liver tissue. After exposure to orotic acid, as well as the reduction in circulating LDL and VLDL lipids following reduced production of ApoB and ApoC previously reported in the literature, NMR-based metabolomics also detected increases in the concentration of β-hydroxybutyrate, as occurs during type II diabetes, and a decrease in phosphatidylcholine. Supplementation of the diet with adenosine completely reversed this phenotype.
To compare the solution and solid state spectra from the liver with the transcriptional profiles measured using DNA microarrays the pattern recognition process ‘prediction to latent structures through partial least squares’ was used to cross correlate the metabolite changes (X matrix) to the transcriptional dataset (Y matrix) (figure 5). This combined analysis identified metabolic changes involving uridine metabolism and choline turnover, and transcriptional changes such as the expression of stress proteins. As well as identifying certain targets for drug intervention, such as stearyl-CoA desaturase, the approach was able to in part model Kyoto rats as genomic perturbations of the Wistar rat. Furthermore, the definition of a ‘metabotype’ by NMR spectroscopy allowed the analysis of the transcriptional data in terms of the dynamic responses induced by the drug intervention.
(ii) Hepatotoxicity induced by acetaminophen
A combined transcriptional and metabolomic approach has been used to investigate acetaminophen induced hepatotoxicity (Coen et al. 2004). In this study, the transcriptional profile of liver tissue was supplemented by metabolic profiles of liver tissue (both extracts and intact tissue), blood plasma and urine obtained using 1H NMR spectroscopy. These metabolic profiles identified decreases in glucose and glycogen and increases in lipids in liver tissue, alongside glucose, pyruvate, alanine and lactate increases in blood plasma. These metabolic changes were indicative of an increased glycolytic flux, and this was also confirmed at the transcriptional level demonstrating that the two approaches provided complementary information in the drug safety assessment process.
(c) Transcriptomics, proteomics and metabolomics
A number of studies are underway to examine complex disease processes through the three major ‘omic’ approaches for profiling tissue. Such poly-omic approaches are particularly appropriate to diseases where conventional approaches have drawn a blank. One such example involves a combined transcriptomic, proteomic and metabolomic analysis of brain tissue from schizophrenics, bipolar patients and control tissue (Prabakaran et al. 2004; figure 6). This study made use of a brain bank managed by the Stanley Foundation, which ensured access to control tissue from age-matched individuals. Just as important the control tissue had a similar post-mortem delay in sampling, ensuring that any biochemical changes detected were not artefacts of the tissue storage. All three tiers of biological intermediates indicated a metabolic disorder associated with schizophrenia. Intriguingly the changes detected were more apparent in white matter compared to grey matter. However, one criticism of the interpretation of the results was that proteomics and metabolomics were often only used in a confirmatory role to support the changes detected transcriptionally. In order for studies to benefit from such global approaches to profiling mRNA, proteins and metabolites researchers will need to treat the results from the process equally, rather than reflect prior biases to a particular technology.
This systems biology approach has also been applied to understanding atherosclerosis in the ApoE knock-out mouse (Clish et al. 2004). In this mouse model human ApoE, a protein involved in the transport of fats in blood plasma, is over expressed in the mouse, preventing the clearance of lipids from the blood. While animals at nine weeks are phenotypically normal, by 25 weeks there are atherosclerotic plaques. Using LC–MS to profile lipids they were able to demonstrate altered fatty acid metabolism at 9 weeks in the presymptomatic mice, highlighting an increase in lipid triglycerides and a decrease in lysophosphocholines in ApoE knock-out mice. To examine these perturbations further Clish and colleagues investigated the alterations in the liver at the transcriptomic, proteomic and metabolomic levels. LC–MS identified a significant increase in certain lysophosphocholines, diacylglycerides, phosphocholine and triglycerides. From a total of 21 000 measured components they were also able to identify a range of transcripts, proteins and metabolites that were changed in the ApoE knock-out mouse. Finally, these changes were used to produce a correlation network, highlighting which metabolic pathways were most perturbed.
5. Metabolomic databases
To a degree databases already exist that contain metabolic profiles or fingerprints in the form of large scale clinical chemistry assays or in vivo magnetic resonance spectra. This approach has been particularly useful at diagnosing and distinguishing different types of cancer. Pattern recognition has been used to simplify large sets of human MRS spectra for over a decade (Howells et al. 1992; Tate et al. 1996b; Usenius et al. 1996; Hagberg 1998; Preul et al. 1998; Gerstle et al. 2000). Furthermore, these approaches have also been used to identify metabolic fingerprints for a given tumour type (Gray et al. 1998; Gribbestad et al. 1999). These intelligent systems are now being used in cross centre studies such as that centred around the Interpret nmr database, developed as part of a collaboration between a number of European hospitals (Underwood et al. 2001; Howe et al. 2003). An automated pattern recognition approach has been implemented to help radiologists categorize magnetic resonance spectroscopy data of brain tumours according to histological type and grade. It can successfully distinguish meningiomas, low-grade astrocytomas, and ‘aggressive tumours’ (glioblastomas and metastases). This pattern recognition process was even successful at predicting tumour types using data acquired at another site using a different magnetic resonance imaging instrument, indicating pattern recognition algorithms are less sensitive to acquisition parameters than had been expected.
The development of central databases for high-resolution/multivariate metabolomic data, that uses information from several different sites, will require the development of a mark up language and an ontology that completely describes a metabolomic experiment. Such languages have already been described for transcriptomics in the form of MIAME and proteomics in the form of MIAPE (Brazma et al. 2001; Taylor et al. 2003). Jenkins and colleagues have described a data model for plant metabolomics referred to as ArMet (architecture fo metabolomics) (Jenkins et al. 2004; www.armet.org). This approach reflected in part the authors' biases towards GC–MS and plant metabolomics. In contrast, a standard metabonomic reporting structure (SMRS; www.smrs.org) has been proposed by those largely involved in toxicology/NMR based metabolomics. Of course, there is much common ground between these two approaches and it is hoped that a common formal description can be produced to allow the rapid comparison and exchange of metabolomic data in the future.
6. Preparing for the ball: looking to the future
(a) New technology
One of the key areas for improvement if metabolomics is to realize its primary objective of providing complete coverage of a cell, tissue, organ or organism metabolism is that metabolomic technology must be increased in terms of metabolite coverage. With metabolites in the cell spread across a concentration range of 109, and a log polarity range of ∼−6 to 14 this is a huge challenge. To address this challenge a wide range of technological approaches are currently being examined. In addition to NMR and mass spectrometry based approaches researchers are using a wide range of other approaches including thin layer chromatography, coulometric arrays and metabolite arrays. For example, Kristal (2004) is using liquid chromatography in conjunction with coulometric arrays to identify discrete serotypes in mice undergoing dietary restriction compared with those animals fed a normal diet.
The popularity of LC–MS and GC–MS as metabolic profiling tools is set to increase as the chromatography becomes more reliable, and the software for matching mass fragmentation patterns is improved. Indeed, a number of manufacturers are developing the necessary software to allow confident retention time matching in LC–MS data, allowing the import of data directly into pattern recognition software. As well as GC–MS and LC–MS, a number of researchers are examining the advantages of also using other types of mass spectrometry including Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS; Brown et al. 2005), matrix assisted laser desorption ionisation (MALDI)–MS and capillary electrophoresis–mass spectrometry (Sato et al. 2004).
There are also technologies being developed to make NMR spectroscopy a more sensitive approach both in terms of the sample volume examined and the number of metabolites that can be identified. Miniaturization has allowed the analysis of very small samples, including the CSF of a mouse (∼1–3 μl of volume of fluid; Griffin et al. 2003). Similarly, Khandelwal et al. (2004) have investigated tetrodotoxin toxicity in the rat frontal cortex using microdialysis and microprobe NMR analysis. Tetrodotoxin induced decreases in glutamate, isoleucine, valine, alanine and alpha- and beta-hydroxybutyrate suggested that these metabolites were normally released by neurons. In addition, a number of metabolic studies have been carried out on tissue dissected from laser capture arrangement. This demonstrates that NMR spectroscopy can examine very small sample sizes.
To date NMR-based techniques have focused on using 1H NMR spectroscopy, but this approach suffers from a small chemical shift range, producing significant overlap of the resonances of a number of different metabolites. While 13C NMR spectroscopy has a much larger chemical shift range, allowing the resolution of a wider range of metabolites, the approach is intrinsically less sensitive compared with 1H NMR spectroscopy, as a result of the lower gyromagnetic ratio of the 13C nucleus compared with the 1H. However, in cryoprobes, a NMR probe where the receiver and transmitter coil is cooled using liquid helium, a significant improvement in sensitivity can be achieved by cooling the coil of a NMR probe to ∼4 K, allowing the rapid acquisition of 13C NMR spectra. Keun et al. (2002) have already applied this approach to studying hydrazine toxicity through biofluid 13C NMR spectroscopy. While in this particular study the biomarkers of hydrazine toxicity were already largely known, the use of 13C spectroscopy did allow the identification of these metabolites largely from one-dimensional spectroscopy, without the need to identify the metabolites responsible for key resonances from a series of two-dimensional approaches. Furthermore, this approach may be particularly good for quantifying concentrations of metabolites which only produce singlets in 1H NMR spectra. While 1H NMR may not conclusively identify the metabolite, the extra chemical shift range for 13C NMR is usually enough to allow unambiguous assignment of a given singlet.
One study that shows the potential of 13C NMR spectroscopy was carried out by Boros et al. (2003), describing a technique for stable isotope-based dynamic metabolic profiling (SIDMAP). The approach relies on using (1,2-13C2)glucose to label a proportion of the metabolome to identify which metabolic pathways are most active. Proliferating tumours have a distinct metabolic phenotype characterized by increased and preferential utilization of glucose through the non-oxidative pathway of the pentose cycle for nucleic acid synthesis, and a reduced rate of de novo fatty acid synthesis and TCA cycle glucose oxidation. By using steady state labellings of the pentose phosphate pathway intermediates as biomarkers of cell proliferation it is hoped that such approaches could be extended to assessing treatments targeted at cellular proliferation, and in particular to understanding why some tumours are resistant to treatment by drugs such as Gleevac.
Another approach to improve the sensitivity of the NMR experiment is to hyphenate the NMR spectroscopy with liquid chromatography. This improves sensitivity by two mechanisms. Firstly, high- and low-concentration metabolites are separated by the liquid chromatography, reducing the likelihood of co-resonant peaks and also improving the dynamic range of the NMR experiment for the low-concentration metabolites. Secondly, metabolites are concentrated by the chromatography, further aiding the detection of low-concentration metabolites. Bailey et al. (2002, 2003) have used this approach of LC–NMR spectroscopy to metabolically profile a number of plants. This may be a particularly useful approach if hyphenated further with cryoprobe technology to allow cryo–LC–NMR.
The development of in vivo MRS continues, and with improved coil design and better localization pulse sequences spectra are obtainable that are comparable in resonance line width to those obtainable using 1H MAS NMR spectroscopy, particularly for fatty tissue. Even for cerebral tissue a large number of metabolites can now be detected and quantified in vivo, with Pfeuffer et al. (1999) detecting 18 metabolites in the human brain. Thus, the ultimate future for the tool may be as an in vivo functional genomic tool.
Perhaps the biggest challenge will come in terms of the data integration that will be necessary in order to cross correlate information obtained from all these different approaches. However, only by combining technologies will we be able to expand our coverage of an organism's metabolome. Thus, there is an urgent need for improved pattern recognition processes for integrating the information produced by a variety of analytical approaches to provide a fuller coverage of the metabolome. Van der Greef (2004) is currently applying such global analytical approaches using a combination of NMR, GC–MS, LC–MS and FT–MS to profile the maximum number of metabolites in a tissue or biofluid. The techniques include both non-directed and class specific analysis of the medium, including specific analysis of fatty acids, amino acids, steroids, peptides and trace elements. The strategy of the process is to identify metabolic profiles associated with the disease before identifying unique biomarkers. Similarly, Gamache et al. (2004) have used a combination of LC–MS and HPLC electrochemical array (EC-array) to examine xenobiotic toxicity in rats as measured through urinary analysis. This approach allowed the quantification of metabolites at the fmol level. They also suggest the use of EC reactor cells in series with MS so as to facilitate structural elucidation.
(b) New software for metabolomics
Software development for metabolomics is occurring on two fronts. The development of improved pattern recognition and multivariate statistics approaches are needed to generate biomarkers and metabolite profiles associated with a process. There is also a need to develop improved peak alignment software for identifying metabolites in chromatograms, mass and NMR spectra. One such algorithm increased the number of peaks detected in 1H NMR spectra of urine from ∼240 spectral regions of fixed widths to ∼1000 individual peaks (Torgrip et al. 2003). Similarly, Cloarec et al. (2005) have developed an approach that takes into consideration the variability associated with pH and chelation of cations to improve classification in PCA models of NMR spectra from analysis of urine from drug toxicity studies.
There is also a need to integrate the data obtained from metabolomics with more classical approaches to metabolism. Metabolic pathways have been at the centre of our understanding of metabolism and there is a need to relate changes in discrete metabolite levels to global changes across metabolic pathways and networks, so that the causes of a perturbation can be separated from the collateral damage. Lange & Ghassemian (2005) have developed a visual interface, which combines biochemical pathway information with a gene-function database, which could be combined with transcriptional, proteomic and metabolomic data for Arabidopsis thaliana. Similarly, Romero et al. (2005) have used computational analysis of the human genome to identify possible unidentified enzymes in human metabolism. These ‘pathway holes’ were calculated to number 203, and 25 of these were assigned to putative genes.
While metabolomic approaches have figured to a lesser extent in functional genomics compared with transcriptomics and proteomics, some of the recent successes using global profiling tools for metabolism indicate that it will be increasingly used both in its own right and as a means for cross correlating information from genomics, transcriptomics and proteomics. Furthermore, metabolomics may define at which time points metabolism is most perturbed, highlighting where other functional genomic tools should focus. Mathematical tools will be needed for data fusion across the ‘omic’ platforms to achieve this. Metabolomics is also set to be used increasingly for phenotyping organisms, especially following the proliferation of techniques used to genetically modify an organism such as gene knock outs, gene knock ins and RNAi. This increased interest will necessitate metabolomic databases if information is to be passed between research groups, in turn necessitating a MIAME scheme for metabolomics. Currently, there is no consensus as to what information should be reported alongside a metabolomic study so that the data can be interpreted by other researchers. Such metadata (i.e. data about the data) will be vital if researchers are to generate databases akin to those being produced by the microarray and proteomic communities. If metabolomics is to attend the functional genomic ball it will be necessary to ‘dress’ in similar finery to that worn by other omic approaches.
- Received April 4, 2005.
- Accepted August 12, 2005.
- © 2005 The Royal Society