Phosphorylation sites are formed by protein kinases (‘writers’), frequently exert their effects following recognition by phospho-binding proteins (‘readers’) and are removed by protein phosphatases (‘erasers’). This writer–reader–eraser toolkit allows phosphorylation events to control a broad range of regulatory processes, and has been pivotal in the evolution of new functions required for the development of multi-cellular animals. The proteins that comprise this system of protein kinases, phospho-binding targets and phosphatases are typically modular in organization, in the sense that they are composed of multiple globular domains and smaller peptide motifs with binding or catalytic properties. The linkage of these binding and catalytic modules in new ways through genetic recombination, and the selection of particular domain combinations, has promoted the evolution of novel, biologically useful processes. Conversely, the joining of domains in aberrant combinations can subvert cell signalling and be causative in diseases such as cancer. Major inventions such as phosphotyrosine (pTyr)-mediated signalling that flourished in the first multi-cellular animals and their immediate predecessors resulted from stepwise evolutionary progression. This involved changes in the binding properties of interaction domains such as SH2 and their linkage to new domain types, and alterations in the catalytic specificities of kinases and phosphatases. This review will focus on the modular aspects of signalling networks and the mechanism by which they may have evolved.
1. Introduction: a writer–reader–eraser toolkit for phosphorylation
A driving force in the evolution of single-celled organisms to metazoan species has been the adaptation of reversible protein phosphorylation to allow for increasingly complex modes of intercellular communication [1,2]. Unlike many other forms of post-translational modification (PTM)—for example, methylation, acetylation and ubiquitylation, phosphorylation on a single residue is unitary, as only one phosphate can be added to an acceptor amino acid. Protein phosphorylation in eukaryotes occurs primarily on the hydroxyamino acids serine, threonine and tyrosine, and also infrequently on histidines and cysteines , as well as on lysines and arginines . Most bacteria have phosphotransferase systems that phosphorylate histidine, aspartate, glutamate and cysteine residues , and also encode proteins with a protein-kinase-like fold  that probably gave rise to the eukaryotic protein kinases (ePKs) . There is also a distinct class of bacterial tyrosine kinases (TKs) , but these are not directly related to the eukaryotic TKs that only appeared later in the tree of life. Here, we will focus on aspects of serine/threonine and tyrosine phosphorylation, and the recognition of phosphosites, in eukaryotic evolution.
Protein phosphorylation and the resulting cellular response commonly require a three-part toolkit in which the kinases that phosphorylate substrate proteins can be viewed as ‘writers’, phosphatases that dephosphorylate phosphoproteins act as ‘erasers’ and modular protein domains that recognize phosphorylated motifs to deliver a downstream signal function as ‘readers’ (figure 1a) . In eukaryotes, the writers are principally composed of serine/threonine kinases (STKs), TKs, and dual-specificity kinases (DSKs), which are similar to STKs but can phosphorylate tyrosine as well as serine/threonine. The catalytic domains of these kinases are related in primary sequence and share a common structural fold [10,11].
Phosphoserine and phosphothreonine (pSer/pThr) sites are primarily dephosphorylated by members of the phosphoprotein phosphatase (PPP) family, which includes PP1, PP2A, PP2B and PP4–7 in human cells, and the metallo-dependent protein phosphatase (PPM) family (as represented by PP2C) . The PPP and PPM families are unrelated in sequence and probably evolved from two unique ancestral genes (figure 1b), but remarkably have converged to possess highly related structures at their catalytic centres . The principal protein-tyrosine phosphatase (PTP) family is different yet again; although its members share a common class of catalytic domain , they are very diverse in their substrate preferences, with some family enzymes being known to dephosphorylate non-protein targets, including carbohydrate, mRNA and phosphoinositides [15,16].
One means by which phosphorylation can modify a protein's function is to induce a conformational change (allostery), which in the case of enzymes can lead to altered catalytic activity. Protein kinases themselves provide intensely studied examples of allosteric regulation, in which phosphorylation within the activation loop of the catalytic domain or in regions flanking the kinase domain causes structural rearrangements that enhance or suppress enzymatic activity [17,18]. A second device through which phosphorylation alters protein activity involves the inducible binding of modular interaction domains to phosphorylated residues and their flanking amino acids, resulting in a phospho-dependent protein–protein interaction. This was originally identified in the context of the Src homology 2 (SH2) domain and its binding to phosphotyrosine (pTyr) sites, such as those found on activated growth factor receptor tyrosine kinases (RTK) [19,20]. This set the stage for the characterization of a remarkable number of domains that recognize not only phosphorylated sites, but many other forms of PTM . These include other classes of pTyr recognition modules, such as pTyr-binding (PTB) domains [22,23], the C2 domain of PKCδ  and the HYB domain from the E3 protein-ubiquitin ligase Hakai . In addition, there is a growing list of at least 14 unrelated domain types that selectively bind pSer/pThr sites (figure 1c) . The presence of such a large number of structurally distinct classes of pSer/pThr-binding modules indicates that pSer/pThr recognition evolved numerous different times, which indicates its importance in cellular function. The relative abundance of pSer/pThr-binding domains when compared with pTyr-recognition modules reflects the fact that the phosphorylation of serine/threonine sites is more ancient and more prevalent than tyrosine phosphorylation. A particular phosphopeptide-binding domain is often found within a sizeable group of proteins. For example, there are 121 SH2 domains in 111 human proteins that regulate such varied activities as protein phosphorylation, lipid metabolism, Ras GTPase superfamily activity, cytoskeletal organization, protein ubiquitylation and transcription . SH2 domains therefore provide a common mechanism for proteins with entirely different biochemical properties to couple to upstream pTyr signals.
The transmission of information in signalling pathways is further enhanced by many different classes of interaction domains that recognize unmodified peptide motifs (the paradigmatic example being the binding of SH3 domains to proline-rich motifs), or that bind non-protein ligands such as specific phospholipids [28,29]. Phosphoinositide-binding was originally discovered for Pleckstrin homology (PH) domains ; these have the same structural fold as PTB domains, although they are not obviously related by sequence , suggesting that this modular structure is particularly fitted for the recognition of both protein and non-protein ligands. The formation of signalling networks is frequently aided by proteins (termed ‘adaptors’ or ‘scaffolds’) that are composed exclusively of interaction domains and binding motifs, and thereby assemble multi-protein complexes through which signals are coordinated and transmitted .
There are many excellent reviews that address the evolution of kinases, phosphatases and individual phospho-binding domains [12,33–35]. Here, we expand on the evolutionary forces that shaped phosphorylation-based systems as a whole.
2. A biological rationale for the evolution and expansion of phospho-binding domains
As noted, phosphorylation alters protein function in one of two principal ways—by allostery and by binding to interaction domains. In allosteric regulation, phosphorylation modifies the conformation of the relevant protein at distant sites, and allows it to toggle between inactive and active states, whereas formation of a docking site for a phospho-binding domain creates a protein–protein interaction. Allostery is an elegant mode of biological regulation, and one might therefore wonder at the molecular rationale for the rise and prominence of phospho-binding domains in cell signalling. This can best be understood from an evolutionary point of view. Direct allosteric regulation is a sophisticated molecular device, but for this very reason cannot be extrapolated from one type of enzyme to another. Thus, relying on allostery for all aspects of phospho-dependent protein regulation may have been incompatible with the evolution of signalling pathways required for multi-cellular life within a reasonable time frame, because the precise mode of allosteric regulation would have to evolve independently for different classes of proteins. In contrast, once a phospho-binding module such as an SH2 domain had evolved, it could be duplicated and incorporated into new families of proteins, immediately endowing them with the ability to recognize phosphorylated motifs, such as those found on activated, autophosphorylated RTKs. This simple manoeuvre would be effective for proteins with entirely unrelated structures and functions. Although in the initial instance the resulting signal might not be especially strong or specific, it could be of sufficient utility to be retained, and then improved by further rounds of mutation and selection.
3. Cytoplasmic tyrosine kinases as a paradigm for combined phosphoregulation by protein interactions and allostery
Allostery and regulated protein–protein interactions are by no means mutually exclusive ways through which phosphorylation regulates protein function. In the c-Src cytoplasmic TK, for example, the kinase domain is preceded by SH2 and SH3 domains. Phosphorylation of c-Src at Tyr527, located within a short C-terminal tail, by the Csk TK creates an internal binding site for c-Src's own SH2 domain. The resulting intramolecular pTyr–SH2 interaction causes a conformational change in the kinase domain that inhibits c-Src enzymatic activity [36,37] (figure 2c). It is likely that during the evolution of this system, the c-Src SH2 and kinase domains were initially joined like beads-on-a-string to target the kinase to its substrates, and promote processive substrate phosphorylation. The inhibitory allosteric regulation imparted by the pTyr527–SH2 intramolecular interaction then evolved subsequently to provide tighter control of c-Src's potentially oncogenic activity. This timeline is supported by the observations that the single-cell choanoflagellate Monosiga brevicollis has an ancestral c-Src TK (MbSrc) with a Tyr527 site, and also has a Csk kinase, but that phosphorylation of Tyr527 in MbSrc by MbCsk does not significantly affect MbSrc TK activity. This indicates that at this early stage of pTyr signalling the Src SH2 and kinase domains were not functionally coupled for allosteric regulation .
A similar type of evolutionary process may have given rise to the mammalian Fes cytoplasmic TK, but in this case the intramolecular interaction between the SH2 and kinase domains has an exclusively positive effect on kinase activity. In the Fes active state, the SH2 domain interacts with the tip of the kinase N-terminal lobe through both electrostatic and packing interactions, involving SH2 surfaces that are distant from the conserved pTyr-binding pocket . This interface stabilizes the conformation of the kinase αC-helix in a fashion that is critical for catalytic activity. The stimulatory SH2–kinase interaction is apparently promoted by binding of a pTyr ligand to the SH2 domain, because in the absence of a phosphopeptide ligand the SH2 domain is relatively disordered. Such data have led to a model in which binding to a tyrosine phosphorylated (‘primed’) substrate protein stabilizes the Fes SH2 domain and favours its interaction with the kinase N-lobe, thereby stimulating kinase activity. At the same time, engagement of the substrate with the SH2 domain positions an appropriately spaced substrate tyrosine at the active site of the kinase for phosphorylation. Fes kinase activity is further stimulated by a short anti-parallel β-sheet formed between the substrate peptide and the activation loop, and by autophosphorylation within the activation loop. In this regard, the region of Fes N-terminal to the SH2 domain is composed of an F-BAR domain that can dimerize, which could promote intermolecular autophosphorylation . This scheme raises the possibility that the Fes kinase is only fully active when localized to membranes through the F-BAR domain, and when juxtaposed to a primed substrate that interacts with both the SH2 domain and the kinase activation loop. The SH2 and F-BAR interaction domains thereby impose specificity on a TK domain that is otherwise rather non-selective for substrate recognition. The intricate series of regulatory allosteric interactions involving the Fes SH2 and kinase domains, and substrate, presumably evolved following an initial fusion of the Fes SH2 and catalytic domains (figure 2c).
Fes has some historic interest, because it was the protein in which the SH2 domain and F-BAR domain (initially termed N-Fps ) were originally identified in a mutagenesis screen involving a retroviral oncoprotein in which the entire avian Fes polypeptide (termed Fps) is fused to an N-terminal retroviral coat protein element (Gag) . Gag apparently stimulates Fps/Fes transforming activity by enhancing both membrane binding, autophosphorylation and substrate recruitment, eliciting a constitutive pTyr signal and aberrant mitogenic signalling . In this screen, dipeptide or tetrapepide insertions were made at numerous sites throughout the length of the Gag-Fps oncoprotein, with the idea that they might interfere with transforming activity if located at functionally critical sites, especially in the context of folded domains. Not surprisingly, insertions in the kinase domain were inhibitory to both kinase and transforming activities, whereas insertions in the Gag region had only a modest effect. However, insertion mutations in the sequence of approximately 100 amino acids located immediately N-terminal to the kinase domain led to a loss of kinase activity, substrate recruitment and transforming capacity, prompting the identification of the SH2 domain and the realization that this module is conserved among other cytoplasmic TKs. Serendipitously, two of the most detrimental insertions in the viral Fps/Fes SH2 domain were located at the critical sites of packing, and electrostatic interactions with the kinase domain were subsequently observed in the structure of the human SH2-kinase Fes cassette.
Therefore, in the c-Src and Fes kinases, the SH2 and kinase domains have undergone a mutual coevolution resulting in intramolecular interactions that stabilize either the inactive (c-Src) or active (Fes) states. The situation in the Abl TK is yet more complex, because the SH2 domain participates in two entirely different intramolecular interactions with the kinase domain that are inhibitory on the one hand, but stimulatory on the other [44,45]. In the inactive Abl conformation, the SH2 and preceding SH3 domains pack against the backside of the kinase away from the active site, but through allosteric interactions promote an inactive kinase conformation, in part by positioning an N-terminal myristate group for an inhibitory interaction with the large lobe of the kinase domain (figure 2c). This autoinhibitory interaction is also mutually exclusive with binding of a pTyr-containing peptide to the SH2 domain; thus association of the Abl SH2 and SH3 domains with external pTyr or proline-rich ligands, respectively, disrupts this inhibitory configuration and frees the kinase domain to adopt an active state. In the active conformation, the SH2 domain moves to interact with the N-lobe of the kinase domain to stimulate catalytic activity. In Abl, this positive SH2–kinase interaction is centred on an Ile in the SH2 domain, and is different in detail from that observed in Fes, but has the same consequence of stimulating kinase activity and promoting substrate phosphorylation. The stimulatory SH2–kinase interface has therefore apparently evolved independently in Fes and Abl. With respect to Abl, the interaction between the SH2 domain and kinase N-lobe has therapeutic implications, as its disruption either by mutating the key SH2 Ile residue, or by masking of the relevant SH2 surface with a fibronectin type III monobody, inhibits the pro-oncogenic properties of the human Bcr-Abl oncoprotein that causes chronic myelogenous leukaemia (CML) in both an animal model and in primary human CML cells . Interestingly, a T212R mutation in the SH2 domain of Bcr-Abl promotes drug resistance in CML patients treated with the kinase inhibitor imatinib , potentially by stabilizing the active SH2–kinase interface.
4. The evolutionary trajectories of kinases, phosphatases and phospho-binding proteins
The human genome encodes at least 518 protein kinases , over 150 catalytic protein phosphatases , and several hundred proteins with domains that recognize pSer/pThr or pTyr residues [23,26,50]. Each enzyme and recognition module has evolved a degree of selectivity for its substrate or binding motifs, respectively, that allows coordination among specific writers, readers and erasers to ensure proper functioning of the cell. Therefore, selective forces may have acted on the system as a whole during evolution. To address this point, we first discuss the evolutionary histories of these three intertwining components separately.
(a) Most eukaryotic protein kinases share a common ancestral catalytic domain
On the basis of sequence and structural information, the origin of eukaryotic protein kinases can be traced back to a common ancestor. It is possible to place kinases on an evolutionary tree based on their inter-relatedness both within species and across phyla [33,48]. Many eukaryotic kinases possess a common protein kinase-like fold that is shared with proteins in bacteria, archaea and viruses that often phosphorylate small molecules, and were probably the ancestors of the eukaryotic serine/threonine/tyrosine protein kinases. The 518 human protein kinases can be further subclassified into 10 groups based on the sequences and functions of their catalytic domains. Most human protein kinases are categorized as STKs, with the rest being either bona fide TKs or DSKs . The group of 90 TKs apparently evolved most recently, coincident with the emergence of multi-cellular animals and their precursors .
The catalytic domains of protein kinases are often covalently linked to non-catalytic interaction domains that can dictate the subcellular localization, substrate specificity and activity of the associated enzymatic domain, as discussed earlier for cytoplasmic TKs . Indeed, about half of human protein kinases possess one or more interaction domains. Different kinase sub-families, such as the TK, TKL (tyrosine kinase-like) and RGC groups, are linked to distinct types of binding modules, which can then target the various kinases to different subcellular sites, substrates or regulators. In contrast, some kinase groups, such as CK and CMGC (including cyclin-dependent kinases (CDKs), mitogen-activated protein kinases, glycogen synthase kinases and CDK-like kinases), do not possess intrinsic non-catalytic domains, but instead often function in association with designated regulatory subunit proteins (e.g. CDKs) that serve a similar purpose. Clustering kinases by comparing either the sequence homologies of their kinase domains or the composition of their interaction domains gives similar kinase subgroups. This suggests that the linked catalytic and interaction domains of particular protein kinases have coevolved, yielding mechanisms of kinase regulation and substrate selection that are distinct for each kinase subclass .
(b) The major classes of protein phosphatases probably evolved from different genes
In contrast to the kinases, the major classes of protein phosphatases have distinct evolutionary origins, although some of the core features of their catalytic domains may have converged during evolutionary development. The human protein phosphatases that dephosphorylate pSer/pThr or pTyr represent at least four structurally different groups, on the basis of their phosphatase domain sequences and substrate preference . The PPP and PPM groups are serine/threonine-specific phosphatases that appear to have evolved independently of each other [13,55]. The majority of PTPs belong to the same class, but can be assigned to distinct subfamilies on the basis of their selectivity for proteins phosphorylated on tyrosine or tyrosine/serine/threonine, or for non-protein substrates . A further group of tyrosine phosphatases is composed of aspartate-based phosphatases, which include the FCP/SCP (small CTD phosphatase) [56,57] and TAD (haloacid dehalogenase) family enzymes .
It is common for these phosphatases to have catalytic activities towards non-protein substrates, or to share homology with non-protein phosphatases, or both. For example, the PTP family member Laforin possesses a dual specificity (DSP)-type PTP domain to the C-terminus of a carbohydrate-binding module (CBM20) . A mutation of the EPM2A gene that encodes human Laforin is associated with Lafora disease , an autosomal recessive disorder characterized by long strand glycogen [59,61,62]. Consistent with this observation, Laforin is now recognized as a glycogen phosphatase found in all vertebrates and several protists, although not in any invertebrate animals . Laforin provides an example of convergent evolution, in the sense that other classes of carbohydrate phosphatases have quite different types of catalytic domains, and of divergent evolution with respect to the PTP domain, which has acquired the ability to dephosphorylate several distinct types of phosphorylated biomolecules (proteins, carbohydrates, lipids, among others). A related glycogen phosphatase, SEX4 (starch excess 4), in plants mediates accumulation and appropriate mobilization of starch in leaves [64,65]. The Arabidopsis SEX4 also possesses DSP and CBM20 domains, but in this case the domain order is reversed, suggesting that SEX4 and Laforin may have independently evolved the same functions through equivalent domain linkages [59,66]. The evolution of proteins such as Laforin may therefore have involved an interplay between the carbohydrate-binding and PTP domains; as one scenario, the carbohydrate-binding domain may have directed a DSP PTP domain to dephosphorylate glycogen, and subsequent mutations in the catalytic domain may then have increased activity and specificity for the carbohydrate substrate.
A common feature of the protein phosphatases is the presence of either covalently linked domains or associated non-catalytic subunits that provide a means of subcellular targeting and substrate recruitment. For example, the human genome encodes nearly 100 regulatory proteins that associate with the PPP phosphatases , whereas PPM, PTP and aspartate-based phosphatase family members often have interaction domains covalently fused with their catalytic domains . This underlines the importance of considering phosphatase evolution in the context of their regulatory and modular components.
(c) The evolution of phospho-binding domains
As noted, one function of protein phosphorylation is to create binding sites for phospho-dependent interaction domains, resulting in the formation of protein complexes that regulate signalling pathways. Typically, these domain families have some members that bind specific phosphorylated sites, and others that do not require phosphorylation for ligand-binding. This raises the possibility that existing phospho-binding domains may have evolved from modules with different binding properties, and in some cases it is possible to infer a family's ancient origins by comparing the gene products of modern day organisms. In the case of the pTyr-binding SH2 domains, all 121 human family members are predicted to adopt a general structural fold consisting of a large seven-strand β-sheet flanked by two α-helices, and indeed there are solved structures for more than 60 SH2 domain family members [68,69]. Humans and yeast share only one related protein that possesses a sequence corresponding to an SH2 domain (Supt6H/Spt6), and this is the only evident SH2-containing protein in yeast . Structural analysis has shown that yeast Spt6 actually has two rigidly linked tandem SH2 domains, of which only the N-terminal domain (SH2N) was identified by sequence [70,71]. The SH2N domain has a binding pocket that shows significant homology to the pTyr-binding pocket of conventional SH2 domains, although it binds pSer as well as pTyr. The C-terminal Spt6 SH2 domain (SH2C) represents a new branch of the SH2 family, as it is not obviously related to other SH2 domains by sequence, and lacks the conserved phosphoamino acid-binding pocket, although it has an atypical and very weak pSer-binding surface at a distinct location. The tandem SH2 domains of Spt6 selectively bind the phosphorylated C-terminal tail domain (CTD) of RNA polymerase II, which contains dozens of repeats of a heptad-peptide motif, 1YSPTSPS7 with several sites of phosphorylation, of which pSer2, pSer5 and pSer7 all appear important. The tandem SH2 domains of Spt6 apparently bind CTD heptad repeats containing multiple sites of Ser phosphorylation, and may act to sense the density of CTD phosphorylation. The SH2 domains of Spt6 are linked to S1 and YqgFc domains, both commonly found in ribosomal and RNA-associated proteins, and Spt6 is indeed important for transcriptional regulation and yeast viability. These findings have raised the possibility that the SH2N domain of Spt6 represents an ancestral SH2 module with rather non-specific phosphopeptide-binding properties that was subsequently coopted to form the selective pTyr-binding domains required for TK signalling. The features of the SH2C domain of Spt6 underline the remarkable plasticity of interaction domain folds . Such variability is also displayed by the PTB and C2 domain families, which possess only selected members that bind pTyr sites, while the majority of these domain types engage different ligands [24,73]. The fact the Spt6 has been conserved through eukaryotic evolution implies that its SH2N phospho-binding function has been preserved because its relaxed phosphate-binding specificity is important.
Recent data suggest that a phospho-binding domain can evolve from a kinase domain, albeit a kinase for a small molecule with a different fold from the protein kinases. In this example, the MAGUK adaptor proteins, which mediate protein–protein interactions and thereby regulate critical functions in cell polarity and at neuronal synapses, have a C-terminal region that resembles a guanylate kinase domain but cannot convert GMP to GDP. Recent data indicate that the guanylate kinase-like domain of MAGUK proteins binds its targets through recognition of specific pSer/pThr-containing motifs, and that its pSer/pThr-binding pocket evolved from the GMP-binding pocket of a catalytically active guanylate kinase . Therefore, in the case of the MAGUK proteins, a pSer/pThr-binding domain probably evolved from an enzyme.
5. 14-3-3 proteins: the exception that proves the rule?
The pSer/pThr-binding 14-3-3 proteins, of which there are seven isoforms, are exceptions, in the sense that they are entirely composed of a single domain that is never linked in a physiological protein to other sequences. [75–77]. 14-3-3 proteins have an all-helical structure, and likely evolved from the tetratrico peptide repeat (TPR) superfamily . They usually form homo- or hetero-dimers with a horseshoe-shaped structure that can bind simultaneously to two pSer/pThr-containing peptides within the same protein, provided that these have an appropriate consensus sequence (most commonly RSXpS/TXP or RXXXpSXP). Through this mechanism, 14-3-3 dimers can regulate the conformation and catalytic activity of multi-phosphorylated proteins, such as the Raf-1 STK. They can also control the subcellular localization of their phosphorylated targets, one example being the YAP/TAZ transcription factors, which are phosphorylated as an endpoint of the Hippo signalling pathway that controls organ size, and therefore bind 14-3-3 and are relocated to the cytoplasm. In addition, they can regulate the protein–protein interactions of their phosphorylated protein ligands by displacing them from alternate binding partners [79,80].
We have proposed that domains such as 14-3-3 that are never normally joined to other modules are maintained in this isolated state owing to their central role in numerous aspects of cellular regulation and organization, which might be corrupted by incorporation into a larger polypeptide . An alternative possibility is that the joining of a 14-3-3 domain to another sequence might interfere with its structure and function on biophysical grounds. Arguing against this latter notion, the 30 kDa 14-3-3ε protein (also known as YWHAE) was recently found fused to FAM22A or FAM22B as a consequence of gene rearrangements in high-grade endometrial stromal sarcoma, yielding oncoproteins of 110 and 140 kDa, respectively. 14-3-3ε (YWHAE) probably retains its phosphopeptide-binding and dimerizing properties in the context of the YWHAE–FAM22 fusion proteins, but is relocalized from the cytoplasm to the nucleus by FAM22, where it potentially engages a new set of targets . These data suggest that 14-3-3 domains can indeed be linked to other protein sequences, but that such fusions have been selected against in the course of evolution owing to their pathological potential.
6. Modular domains as units of normal and pathologic evolution
Protein domains with catalytic or binding properties are both functional and evolutionary units. They can be shuffled and recombined in new combinations, generating new connections within the cell, or following duplication they can diverge through sequence permutations, yielding new intrinsic activities; these provide powerful mechanisms contributing to the evolution of organismal complexity [82,83]. In other words, selective forces will constantly test the adaptivity of these modules in the context of their local subcellular networks for the fitness advantage of the organism [53,84]. As an example, yeast have SH3 domains but, as we have discussed, no conventional SH2 domains. However, once they evolved, SH2 domains with pTyr-binding activity were duplicated and linked to SH3 domains, yielding adaptor proteins such as Grb2 (with the domain architecture SH3–SH2–SH3) or Nck (SH3–SH3–SH3–SH2) [85–92]). These adaptors provided a means of coupling pTyr signals to proteins with proline-rich SH3-binding motifs, such as the Sos Ras guanine nucleotide exchange factor (GEF) in the case of Grb2, which regulates the Ras-Erk MAP kinase pathway and thus mitotic signalling, and regulators of the actin cytoskeleton such as N-Wasp for Nck. Thus, the juxtaposition of domains in new combinations during the course of evolution was a simple way to forge new signalling pathways. This evolutionary process can be mimicked by artificially linking domains that are never normally fused to one another—thus, joining an SH2 or PTB domain to a death effector domain directs pTyr signals to an apoptotic pathway, so that cells die in response to a growth factor stimulus that would normally be mitogenic and pro-survival . The linking of domains in abnormal combinations with aberrant effects on cellular behaviour is commonly seen in cancer cells, as in the YWHAE–FAM22 fusion noted earlier; in the context of the evolving cancer cell, these are selected because they provide an advantage to its growth and survival, although to the detriment of the organism.
Once an SH2/SH3 adaptor had been generated, mutation could then work to modify the affinities of the domains for particular motifs. For example, the SH2 domain of Grb2 binds preferentially to pY-X-N-X motifs, where X is generally a hydrophobic residue, the selection for Asn at the +2 position being imposed by a Trp in the Grb2 SH2 domain that forces the peptide into a β-turn. The selectivity and biological activity of the Src SH2 domain can be experimentally changed to that of Grb2 simply by substituting a Thr with Trp at the relevant position , and it is therefore straightforward to imagine how such mutations could fine tune the binding selectivity of existing domains.
7. Phosphorylation-based signalling pathways require kinases, phosphatases, binding modules and other forms of post-translational modification to work in concert
Writers, readers and erasers are also features of pathways involving other forms of PTM. For example, ubiquitylation sites are laid down by E3 protein-ubiquitin ligases, exert their effects by recruiting proteins with ubiquitin-binding domains and are removed by deubiquitinases, all involving polypeptides with a modular architecture. Here, we discuss how similar and fundamentally simple principles of modularity and protein modification are used by the cell to respond to two quite different cues, namely extracellular stimulation by epidermal growth factor (EGF) or intrinsic DNA damage (figure 3).
Binding of EGF to the EGF receptor (EGFR) TK induces receptor dimerization, which triggers autoactivation and transphosphorylation of the receptor on several tyrosine sites within its C-terminal tail (figure 3a) . This leads to the recruitment of several proteins with SH2 domains , notably the Grb2 adaptor that couples the receptor to the Ras-Erk MAPK pathway through the Sos RasGEF. Grb2 can bind the receptor directly at a pY1068INQ site, as well as indirectly through the Shc1 scaffold. Shc1 recognizes an NXXpY1172 motif in the activated EGFR through its PTB domain , and once bound to the receptor is phosphorylated on three tyrosine residues (Y239Y240ND and Y313VNV), which creates two further pYXNX binding sites for Grb2. Src family kinases also bind the autophosphorylated EGFR through their SH2 domains and phosphorylate the receptor on additional tyrosines to augment signal strength ; conversely, receptor and membrane-associated non-receptor-type PTPs ultimately reduce the level of EGFR phosphorylation. This brief snapshot touches on the complex interplay of enzymes, adaptors and scaffold in EGFR signalling.
Signalling initiated by double-strand DNA breaks in the nucleus also depends on a series of inducible protein–protein interactions, but in this case mediated by serine/threonine phosphorylation and ubiquitylation (as schematized in figure 3b). DNA double-strand breaks are highly cytotoxic, and their repair is induced by the recruitment of successive waves of proteins to the surrounding chromatin, through a hierarchy of phosphorylation- and ubiquitin-dependent protein–protein interactions that form microscopically visible foci. The double-strand breaks are recognized by the MRN (MRE11/RAD50/NBS1) complex together with the Ku70/80 proteins that bind and stimulate members of the PIKK family of serine/threonine kinases, including ATM (ataxia telangiectasia mutated) and DNA-PKcs (DNA-dependent protein kinase catalytic subunit) [99,100]. These kinases selectively phosphorylate S/T-Q motifs, with an early substrate being the histone variant H2AX, yielding γH2AX with the phosphorylated motif pS139QEY at its C-terminus. This phosphosite recruits the tandem BRCT domains of the scaffold protein MDC1, which is essential for the DNA damage checkpoint [101–103], in a fashion that depends not only on recognition of the pSXY sequence but also on the C-terminal carboxylate ion of γH2AX. Interestingly, the adjacent BRCT domains of MDC1 fold together into a structure that binds a single γH2AX phosphopeptide . MDC1 is then itself phosphorylated by DNA damage activated PIKKs on a repeated pTQF motif that binds the FHA domain of the Ring finger E3 ubiquitin ligase RNF8 [104,105]. This provides a fascinating example of convergent evolution, as the BRCT domains of MDC1 and the FHA domain of RNF8 have both evolved to recognize pS/T-Q-Y/F motifs phosphorylated by ATM/ATR, albeit that the BRCT domains favour pSer sites, whereas FHA domains only bind pThr and the glutamine is not required for BRCT or FHA domain binding.
Once recruited to DNA damage foci though the phosphopeptide-binding properties of its FHA domain, RNF8 together with the E2 ubiquitin-conjugating enzyme Ubc13 adds K63-linked ubiquitin to histones, resulting through several more steps in the recruitment of BRCA1 and 53BP1, both of which stimulate DNA repair. RNF8 therefore converts a phosphorylation signal to a ubiquitylation output. The ubiquitylated histone sites induced by RNF8 are recognized by tandem ubiquitin-binding motifs termed MIUs (motifs interacting with ubiquitin) of another E3 ubiquitin ligase, RNF168, which then collaborates with RNF8 in building UbK63 chains on histones that act as recruitment sites for additional components of the DNA damage response machinery [106,107]. Notably, the UbK63 chains are recognized by the ubiquitin interaction motifs (UIM) of RAP-80, an adaptor that associates with the protein Abraxas; Abraxas is phosphorylated at a pSXXF motif that recruits the tandem BRCT domains of BRCA1. Recruitment of 53BP1 to DNA damage foci also requires ubiquitylation, but in addition 53BP1 employs a Tudor domain to bind a methylated lysine on histone H4, revealing yet another class of PTM-dependent protein–protein interaction required for the DNA damage response . The basis for the ubiquitin-dependence of 53BP1 recruitment remains enigmatic, although the most parsimonious interpretation would be direct recognition of a ubiquitylated site by 53BP1.
Although the DNA damage response is much more complex, the pathways outlined here reveal several points. (i) The proteins involved in the DNA damage response are highly modular, with multiple binding and catalytic domains and motifs. (ii) The DNA damage response depends on a hierarchy of protein–protein interactions regulated by several different types of PTM  and their corresponding interaction domains, which induce a remarkable signal amplification yielding a cytologically visible structure in a short period of time. (iii) The response involves a real branching network, rather than an isolated linear pathway, that controls passage through the cell cycle on the one hand and repair of the DNA lesion on the other. There is still much to be learned about how the DNA damage response signal is actually coupled to the repair process.
8. The evolution of the phosphotyrosine-based signalling system in metazoan evolution
It is relatively straightforward to imagine how a protein family with new properties might have evolved through stepwise alterations in domain combinations and functions (figure 4a,b), but much more puzzling to decipher how a whole system of writers, readers and erasers might have coordinately emerged to sustain a major evolutionary transition, such as the appearance of multi-cellular animals. Analysing the eukaryotic pTyr signalling system is useful from this point of view, because it is a relatively recent innovation, essential for the intercellular communication required to form and maintain complex tissues, and has been intensively studied biochemically.
A theoretical history of the emergence and subsequent expansion of the pTyr signalling system has been reconstructed , based on the proteomes of fungi, slime mould, unicellular choanoflagellates and multicellular metazoa (figure 4c). In this scheme, tyrosine phosphorylation arose as a sporadic property of the ancient serine/threonine kinases, for example in the phosphorylation and activation of MAP kinases. The first bona fide member of the modern day pTyr system was probably a PTP, of which there are approximately five in fungi such as Saccharomyces cerevisiae. These probably arose from dual specificity serine/threonine/tyrosine phosphatases, and in yeast have a very simple domain organization compared with the much more elaborate architecture of metazoan PTPs. Thus, they probably had a limited number of substrates, but were of sufficient utility to be retained. The next step in the evolution of pTyr signalling appears to have been the selection of pTyr-binding SH2 domains. The slime mould Dictyostelium discoideum has a simple repertoire of 13 SH2 domains that are known or predicted to bind pTyr, among which are proteins resembling the metazoan STAT (signal transducer and activator of transcription) transcription factors and the Cbl E3 protein-ubiquitin ligase . STATs dimerize through intermolecular pTyr–SH2 interactions, and translocate to the nucleus where they bind DNA and regulate the expression of specific genes. Dictyostelium cells aggregate and differentiate in response to cAMP chemoattractant secretion, elicited by poor nutrient conditions, to ultimately yield a stalk topped by spore cells that can be dispersed in search of a more hospitable environment, a process that is controlled in part by the diffusible morphogen differentiation-inducing factor (DIF) that specifies the development of pre-stalk cells. DIF controls gene expression through the tyrosine phosphorylation and activation of Dictyostelium STATc. The identity of the relevant kinase is unclear, because Dictyostelium lack any conventional TKs, including JAKs that are the usual kinases for metazoan STAT proteins. Dictyostelium does have an expansion of the TKL family with the potential for dual specificity serine/threonine/tyrosine phosphorylation; these include two Shk kinases with a dual specificity kinase domain followed by an SH2 domain, which may represent a waystation on the path to the cytoplasmic TKs. Rather, current data suggest that the Dictyostelium STATs may be constitutively tyrosine phosphorylated, and that DIF inhibits the PTP3 tyrosine phosphatase, thereby stimulating STATc phosphorylation and resulting gene expression . At this stage, the pTyr–SH2 system probably provided a limited benefit, but its full utility was unleashed by the evolution of dedicated TKs, as seen in M. brevicollis and metazoans . This was accompanied by a marked expansion in genes encoding SH2 domain proteins and PTPs, and in the ability of cells to communicate with their environment and one other.
A feature of the proteins involved in pTyr signalling, be they TKs, tyrosine phosphatases or downstream targets, is that they frequently have common sets of interaction domains, which potentially co-localize the different components of the pTyr network in space and time (figure 3a). For example, the majority of the approximately 90 human TKs have one or more non-catalytic domains that can be found in the group of approximately 40 classical type 1 PTPs, and vice versa, , including protein interaction modules such as SH2, FERM, FN3 and Ig domains (figures 2 and 3d: results derived from the study in ). This domain-sharing may facilitate the convergence of specific kinases and phosphatases on common substrates, aided by the presence of similar domains in the targets themselves. That is to say, shared binding domains provide the different proteins in the pTyr signalling network with a common currency through which they interact and are co-regulated.
It is plausible that the principles underlying the evolution of the pTyr signalling network apply to other types of cell signalling systems. Anecdotally, this can be seen in the DNA damage repair network, where many of the proteins share BRCT, FHA and ubiquitin-binding domains. We have also tested this concept for GEFs and GTPase-activating proteins (GAPs) for the Rho GTPase family that control many aspects of cell shape and movement. In this system, the GEFs are writers that activate Rho GTPases by inducing the exchange of GDP for GTP, the proteins that interact with Rho GTPases in the GTP-bound form are readers, and the GAPs that accelerate the hydrolysis of GTP are erasers. Human RhoGEFs and RhoGAPs are large proteins with multiple interaction domains, each family being encoded by approximately 70 genes. They can be grouped into subfamilies of GEFs and GAPs with related protein and phospholipid interaction domains, which may then co-localize specific GEFs and GAPs to the same Rho GTPase family members, of which there are 23 that are targeted to different intracellular membranes. Statistical data indicate that the level of domain sharing observed for the RhoGEFs and GAPs cannot be attributed to random shuffling in the genome, suggesting that it is of evolutionary significance . With respect to pTyr signalling, three RhoGEFs (Vav1–3) and two RhoGAPs (Chaemerin 1 and 2) have SH2 domains that allow them to couple to pTyr sites, and thus to be directly controlled by TKs.
9. Coevolution of interdependent phosphorylation subsystems
Signalling systems clearly do not function in isolation. One type of signal can be converted into another class of molecular information, as we discussed earlier for protein phosphorylation and ubiquitylation, and this can lead to the divergent branching of signalling networks. Conversely, biochemically different types of receptors can converge on the same signalling outputs; in addition, there can be cross-talk between distinct signalling pathways. As an example, we summarize the mechanisms by which modular domains and motifs have allowed different forms of protein and lipid phosphorylation to become interconnected in coupling pTyr signals to Ser/Thr kinases via phosphatidylinositol 3′-kinase (PI3K). RTKs and G protein-coupled receptors (GPCRs) both stimulate class I PI3Ks, that selectively phosphorylate PI(4,5)P2 to yield PI(3,4,5)P3 [112–115], which is specifically bound by the PH domains of a number of proteins, notably the serine/threonine kinase Akt/PKB that stimulates cell survival, growth and metabolism (figure 5c), frequently by creating 14-3-3-binding sites on its substrates [112,116]. The ability of the p110 catalytic subunits of class I PI3Ks to couple to RTKs (class IA) or GPCRs (class IB) depends on the nature of their adaptor subunits. The regulation of PI3K by RTKs emerged through the evolution of an SH2-containing adaptor subunit (figure 5). All such PI3K pTyr adaptors have two C-terminal SH2 domains with an intervening inter-SH2 domain that binds and regulates the p110 catalytic subunit; these elements comprise the majority of the p50/p55/p60 adaptors found in Caenorhabditis elegans and Drosophila [117,118]. The SH2 domains of class IA adaptors bind pYXXM motifs on many RTKs and their associated scaffolds such as IRS-1/2 that mediate insulin-receptor signalling, and this stimulates p110 enzymatic activity and thus PIP3 production at the plasma membrane. In addition to p50/p55 adaptors, mammalian cells also express p85 adaptors with an N-terminal SH3 domain and a RhoGAP homology (or BH) domain flanked by proline-rich motifs. Thus, metazoans first evolved a minimal p50/p55 SH2-containing subunit for pTyr regulation of PI3K, and subsequently a more extended p85 subunit that can receive additional inputs and is important for aspects of insulin signalling in mammals such as glucose homeostasis . p110 has a C-terminal catalytic domain closely related to the ATM/ATR/DNA-PK family atypical kinases, preceded by a PIK domain, a Ras-GTP binding domain and a region that binds the adaptor subunits. Its association with Ras-GTP stimulates p110 activity, both in vitro and in vivo, and this provides a mechanism for positive cross-talk between the Ras-MAPK and PI3K pathways [113–115,120].
The heterodimer of a p101 adaptor and p110 catalytic subunit is responsive to the Gβγ subunits of heterotrimeric G proteins liberated by activated GPCRs, in part through binding of p101 to both Gβ and Gγ at the membrane . The class IB enzymes predate the emergence of class IA PI3K proteins, consistent with GPCRs being more ancient signalling receptors than RTKs (figure 5b,c). Interestingly, yeast have no class I PI3K family members, do not synthesize PIP3 and do not have any PH domains that bind PIP3 (figure 5a). The Dictyostelium version of PI3K (termed PIKA and B, or PI3K1 and 2) is activated by heterotrimeric G-proteins and RasC, but the slime mould lacks SH2-containing p85-related regulatory subunits (figure 5b; and reviewed by [115,122]). There is therefore an evolutionary trajectory in which the ability to make and respond to PIP3 appeared as multi-cellular animals were emerging, initially to link GPCRs to PIP3 production, and subsequently through the evolution of adaptors with SH2 domains to couple RTKs to PI3K, and thus to a network of Ser/Thr phosphorylation via the Akt protein kinase (figures 3 and 5c).
10. The model: compartmentalization and reciprocal linkage between systems and modules
Gene duplication provides a mechanism to generate a reservoir of protein domains that can undergo alterations that are subsequently selected, should they prove beneficial. Genetic alterations that change a domain's intrinsic biochemical properties or that link it to new domain types, or both, may confer an improved biological fitness and, therefore, be fixed by positive selection. This can be seen in several of the examples we have discussed, notably in the broad range of catalytic specificities exhibited by members of the kinase and phosphatase families, the variety of interaction domains with which the kinase and phosphatase catalytic domains are associated and the diverse ligand-binding preferences of these domains. Although proteins with any combination of domains could in principle be generated by genetic rearrangements, computational analysis of existing polypeptides suggests that only a very limited number of domain combinations have actually been selected , and that particular subsets of domains appear largely dedicated to specific aspects of cellular function, such as signalling transduction in the case of SH2 and SH3 domains or chromatin organization for bromo- and chromodomains . Functional subsystems within cells (such as signal transduction, chromatin regulation, vesicle trafficking and so forth) may therefore be marked out and maintained by molecular interactions mediated by a selected collection of ‘resident’ domains, often found in adaptor/scaffold proteins. (A proposed evolutionary model based on domain recombination is schematized in figure 6a.) Proteins with related sets of mutually compatible domains would thereby be targeted to the same interaction networks, and thus compartmentalized in a way that would restrict their functions and promote the evolution of subsystems such as the writer–reader–eraser toolkit for phosphorylation.
However, domain shuffling at the genomic level could form a novel domain combination that targets a domain into a foreign subcellular environment through the activity of its new partner domain. This would normally be detrimental to cellular function and be selected against. The joining of a domain to a partner normally resident in a different subsystem might, however, recruit the domain into this new network, in a way that alters the properties of that network in a beneficial fashion (figure 6). In rare instances, these changes could mark a major evolutionary advance, as in the acquisition of pTyr-binding activity by SH2 domains and their joining to a new set of domains involved in signal transduction.
The authors are indebted to Tony Hunter for editorial input. Work in the authors' laboratory is supported by the Canadian Institutes for Health Research (T.P.: MOP-6849, MOP-57793, MOP-13466; J.J.: Postdoctoral Fellowship), Genome Canada (T.P.), the Ontario Research Fund and the Canadian Cancer Society Research Institute (T.P.).
One contribution of 13 to a Theme Issue ‘The evolution of protein phosphorylation’.
- This journal is © 2012 The Royal Society