l‐Arginine recognition by yeast arginyl‐tRNA synthetase

Jean Cavarelli, Bénédicte Delagoutte, Gilbert Eriani, Jean Gangloff, Dino Moras

Author Affiliations

  1. Jean Cavarelli*,1,
  2. Bénédicte Delagoutte1,
  3. Gilbert Eriani2,
  4. Jean Gangloff2 and
  5. Dino Moras1
  1. 1 UPR 9004 Biologie Structurale, Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, BP 163, 67404, Illkirch Cedex, France
  2. 2 UPR 9002 Structure des Macromolécules Biologiques et Mécanismes de Reconnaissance, Institut de Biologie Moléculaire et Cellulaire du CNRS, 15 rue René Descartes, 67084, Strasbourg, Cedex, France
  1. *Corresponding author. E-mail: cava{at}
View Full Text


The crystal structure of arginyl‐tRNA synthetase (ArgRS) from Saccharomyces cerevisiae, a class I aminoacyl‐tRNA synthetase (aaRS), with l‐Arginine bound to the active site has been solved at 2.75 Å resolution and refined to a crystallographic R‐factor of 19.7%. ArgRS is composed predominantly of α‐helices and can be divided into five domains, including the class I‐specific active site. The N‐terminal domain shows striking similarity to some completely unrelated proteins and defines a module which should participate in specific tRNA recognition. The C‐terminal domain, which is the putative anticodon‐binding module, displays an all‐α‐helix fold highly similar to that of Escherichia coli methionyl‐tRNA synthetase. While ArgRS requires tRNAArg for the first step of the aminoacylation reaction, the results show that its presence is not a prerequisite for l‐Arginine binding. All H‐bond‐forming capability of l‐Arginine is used by the protein for the specific recognition. The guanidinium group forms two salt bridge interactions with two acidic residues, and one H‐bond with a tyrosine residue; these three residues are strictly conserved in all ArgRS sequences. This tyrosine is also conserved in other class I aaRS active sites but plays several functional roles. The ArgRS structure allows the definition of a new framework for sequence alignments and subclass definition in class I aaRSs.


Aminoacyl‐tRNA synthetases (aaRSs) catalyse the esterification of an amino acid to the 3′‐terminal adenosine of a tRNA. Each aaRS is specific for one amino acid and several cognate isoacceptor tRNA species. Aminoacylation of a tRNA with its cognate amino acid is achieved by a two‐step reaction. The first step produces a stable enzyme‐bound intermediate, called aminoacyl‐adenylate, from ATP and amino acid. The second step is the transfer of the aminoacyl moiety to one of the hydroxyl groups of the 3′‐terminal adenosine of the tRNA. AaRSs can be divided into two classes of 10 members each, which correspond to two architectures of the active site core characterized by conserved amino acid residues (Eriani et al., 1990). The class I aaRSs, whose active sites contain a canonical dinucleotide‐binding fold [four‐ or five‐stranded parallel β‐sheet, Rossmann fold (Rossmann et al., 1974)], display two signature amino acid sequences ‘HIGH’ (Barker and Winter, 1982) and ‘KMSKS’ (Webster et al., 1984). Class II aaRSs are built around an antiparallel β‐sheet partly closed by helices (Cusack et al., 1990; Ruff et al., 1991) and contain three characteristic homologous motifs. The topology of each active site is the structural constraint governing the conformation of the ATP molecule and conferring a regiospecificity to the second step of the reaction. Class I aaRSs aminoacylate the 2′‐OH of the 3′‐terminal adenosine of the tRNA, while class II aaRSs aminoacylate the 3′‐OH.

Crystallographic studies of 12 aaRSs revealed the textbook modularity of these enzymes, whereby various insertions and additional domains are appended to the active site. Most of these domains play a functional role either in tRNA recognition or in proofreading mechanisms. The diversity of the appended domains outside the catalytic core, which range from all α‐helical folds to all β‐barrel structures, allow specialized patterns of tRNA recognition that are characteristic of each aminoacylation system (for a recent review, see Francklyn et al., 1997).

Amino acid or aminoacyl‐adenylate recognition has been examined in structural detail for several class I and II aaRSs: TyrRS (Brick et al., 1989), TrpRS (Doublié et al., 1995), GlnRS (Rath et al., 1998), AspRS (Cavarelli et al., 1994; Poterszman et al., 1994), LysRS (Onesti et al., 1995), HisRS (Arnez et al., 1995), SerRS (Belrhali et al., 1995) and, recently, IleRS (Nureki et al., 1998). The large amount of information collected over the past years has produced a unified vision of the aaRSs world, also revealing the complexity of the fine tuning necessary for the aminoacylation reaction.

ArgRS deserves a special place among aaRSs. Like GlnRS and GluRS, ArgRS requires its cognate tRNA for the first step of the aminoacylation reaction, the amino acid activation (Mehler and Mitra, 1967; Mitra and Smith, 1969; Kern and Lapointe, 1980). In spite of extensive investigations performed by several groups, no definitive answer has been found favouring one of the two proposed mechanisms accounting for this peculiar behaviour: either a two‐step mechanism, where tRNA binding is a prerequisite for adenylate formation, or a mechanism where ATP, arginine and tRNA co‐react in a concerted way leading to a one‐step aminoacylation of tRNA (Mehler and Mitra, 1967; Loftfield, 1972). However, results of kinetic investigations using non‐chargeable tRNAArg argue in favour of the classical two‐step mechanism (Fersht et al., 1978; Gerlo et al., 1982). Yeast ArgRS also displays an original feature in its ability to aminoacylate, in addition to its cognate tRNA, a non‐cognate molecule, namely a yeast tRNAAsp deprived of post‐transcriptional modifications (Ebel et al., 1973; Gangloff et al., 1973). Here we describe the first step towards structural comprehension of this peculiar system, the crystal structure of ArgRS from the yeast Saccharomyces cerevisiae with l‐arginine bound to the active site.


Structure determination

Yeast ArgRS is a monomeric protein of 607 residues (mol. wt 69.5 kDa). The enzyme was expressed at high levels from an overproducing Escherichia coli strain. Protein purification involved two chromatographic steps: an anion exchanger followed by a hydrophobic interaction column. The structure of ArgRS was solved using crystals grown from ammonium sulfate at pH 7.0 in the presence of l‐arginine and ATP. Phases to 3.8 Å resolution were obtained by the multiple isomorphous replacement method and anomalous scattering from two heavy atom derivatives (Table I). The 3.8 Å map was improved by solvent flattening, solvent flipping and histogram matching, and then the phases were extended to 3.5 Å. The modified map allowed chain tracing for most parts of the polypeptide chain. The final model has been refined at 2.75 Å resolution to a crystallographic R‐factor of 19.7% (Rfree = 26.9%) with good stereochemistry (see Table I for refinement statistics). The crystallographic asymmetric unit contains one molecule of ArgRS, the l‐arginine substrate and 280 water molecules. The first four residues at the N‐terminus of ArgRS are not visible in the electron density map and are presumed to be disordered.

View this table:
Table 1. Crystallographic data

Anatomy of class I ArgRS

The structure of ArgRS can be divided schematically into five domains: a catalytic domain that contains the class I active site, to which four structurally defined domains are appended (Figure 1). Two of them, called additional domain 1 and 2 (Add‐1 and Add‐2) are attached respectively at the N‐ and C‐terminal sides of the active site. Two domains (Ins‐1 and Ins‐2) are inserted into the catalytic core. The structure is composed predominantly of α‐helices, with 55% of the residues adopting a helical conformation and 9% forming β‐strands. An alignment of ArgRS sequences from different organisms is shown in Figure 2.

Figure 1.

The structure of cytoplasmic ArgRS from the yeast S.cerevisae. (A) Schematic drawing (overall dimension 85×65×40 Å) (drawn with SETOR, Evans, 1993). (B) Topology diagram of the secondary structure elements. Add‐1 is in orange, the catalytic domain in red, Ins‐1 in green, Ins‐2 (CP1) in blue and Add‐2 in yellow.

Figure 2.

Sequence alignment for ArgRSs from different organisms: yRrs = cytoplasmic S.cerevisae, mRrs = mitochondrial S.cerevisae, jRrs = Methanococcus jannaschii, hRrs = Homo sapiens, eRrs = E.coli. The secondary structure, determined with PROMOTIF (Hutchinson and Thornton, 1996), is also displayed for yRrs. Elements of secondary structure are indicated by: H for α‐helix, S for strand of β‐sheet, ξ for 310 helix. Yeast numbering is shown above the sequence of yRrs. The colour code defined in Figure 1 is also used here. The residues of the class I signature sequences are highlighted in red and the residues involved in l‐arginine binding are highlighted in pink. This sequence alignment was extracted from an alignment built with the program CLUSTAL W (Thompson et al., 1994) using 29 sequences of ArgRS from differents organisms. This figure was produced with the program ALSCRIPT (Barton, 1993).

The active site of ArgRS (colour coded red in Figures 1 and 2), which forms the scaffold of the Rossmann fold, is composed of two halves assembled from three peptides; the first half includes residues 143–194 and residues 266–293, while the second half includes residues 345–410. The first signature sequence of class I aaRSs (His159–Ala160–Gly161–His162) belongs to the first half of the nucleotide‐binding fold, while the second motif, ‘MSK’ (Met408–Ser409–Thr410), belongs to the second half. Superimposing the five class I aaRSs for which coordinates are available gives root‐mean‐square (r.m.s.) deviations between corresponding Cα atoms of ∼2.5 Å. The number of spatially equivalent residues ranges from 119 (ArgRS/TyrRS) to 222 (ArgRS/MetRS). This reveals a high structural similarity between ArgRS and MetRS and also significant similarities with GluRS and GlnRS which extend beyond the anticipated correspondence of their respective nucleotide‐binding domains (Figures 3 and 4).

Figure 3.

Superposition of the Cα atoms of the class I aaRSs onto ArgRS. (A) GlnRS (Rould et al., 1989), (B) GluRS (Nureki et al., 1995), (C) MetRS (Brunie et al., 1990) and (D) TrpRS (Doublié et al., 1995). ArgRS is traced in red, GlnRS in blue, GluRs in cyan, MetRS in green and TrpRS in black. Superimposing the six class I aaRSs structures gives r.m.s. deviations between corresponding Cα atoms of 2.56 Å (GluRS/ArgRS over 182 residues), 2.61 Å (GlnRS/ArgRS over 175 residues), 2.47 Å (TyrRS/ArgRS over 119 residues), 2.64 Å (TrpRS/ArgRS over 129 residues) and 2.53 Å (ArgRS/ecMetRS over 222 residues). The structure superpositions have been made with the program SARF (Alexandrov, 1996). This figure was drawn with MOLSCRIPT (Kraulis, 1991).

Figure 4.

Schematic representation of the six class I aaRSs highlighting the positions of the Rossmann fold moieties (RF1, RF2 colour coded red). The anticodon‐binding module whose fold is similar to that of MetRS and ArgRS is colour coded in yellow. Another inserted peptide, called connecting peptide 2 (CP2), has been identified by sequence analysis in five class I aaRSs (LeuRS, ValRS, IleRS, CysRS and MetRS) (Starzyk et al., 1987; Burbaum and Schimmel, 1991). The CP2 domain of MetRS is highlighted in grey. In ArgRS, as in GluRS, GlnRS, TyrRS and TrpRS, there is no insertion peptide corresponding to CP2.

Additional domain 1

ArgRS is first characterized by its N‐terminal domain (Add‐1, 142 residues, colour coded brown in Figures 1 and 2). Its topology, a βαβ single split motif (S2–H4–S3–S4) to which another strand (S1) and three helices (H1–H2–H3) are added, is unique among the known aaRSs. An alignment of 29 sequences of ArgRS (five archea, seven eukaryotes and 17 eubacteria; data not shown) shows that this extension is conserved among all species. The core of this domain, encompassing the β‐sheet and two helices (H2 and H4), is strikingly similar in topology to several others proteins: (i) two Zn‐binding proteins, muramoyl‐pentapetide carboxypeptidase from Streptomyces albus G (PDB code 1lbu) (Dideberg et al., 1982) and the N‐terminal signal domain of Sonic hedgehog (PDB code 1vhh) (Tanaka Hall et al., 1995); (ii) the oligomerization and l‐arginine‐binding domain of the arginine repressor of E.coli (PDB code 1xxa) (Van Duyne et al., 1996); (iii) a nucleotidyl transferase, the palm domain of human DNA polymerase β (PDB code 1bpy) (Sawaya et al., 1994); and (iv) the C‐terminal domain of glyceraldehyde‐3‐phosphate dehydrogenase (GAPDH, PDB code 1cer). The structural similarities were found by the programs SARF (Alexandrov, 1996) and DALI (Holm and Sander, 1994). Superimposition of the different structures gives r.m.s. deviations between corresponding Cα atoms of 2.8 Å (ArgRS/1lbu over 79 residues), 2.2 Å (ArgRS/1bpy over 69 residues), 2.1 Å (ArgRS/1xxa over 48 residues), 2.1 Å (ArgRS/1vhh over 74 residues) and 2.75 Å (ArgRS/1cer over 75 residues). The origin of the structural similarity between these proteins is unclear. Due to the diversity of the functions involved, thermodynamic stability may be the common root. The single split motif is a recurrent protein structure and has been found as a building block for several domains (Orengo and Thornton, 1993). However, the common topology between the N‐terminal domain of ArgRS and the palm domain of DNA polymerase β indicates that the face of the β‐sheet is a platform which can be used for interactions with nucleic acids. A model for tRNA docking on ArgRS suggests that Add‐1 of ArgRS will certainly be involved in tRNA recognition, interacting on the D loop side of the tRNA (see Discussion).

Insertion domain 1

ArgRS possesses an insertion domain in the first half of the Rossmann fold between the second strand (S6) and the second helix (H11) (Ins‐1, residues 195–265, colour coded green in Figures 1 and 2). This connection, which is a short loop in GlnRS, GluRS, MetRS and TrpRS, and a slightly longer one in TyrRS (14 residues), is 80 residues long in ArgRS and contains four helices. This domain closes one side of the active site. The angle between the axes of two consecutive helices is ∼120°. Therefore, helix H11, which is part of the nucleotide‐binding fold, has a different orientation with respect to the β‐sheet and is much longer in ArgRS (17 residues).

Insertion domain 2

Domain Ins‐2 of ArgRS, which ranges from residues 294 to 344 (colour coded blue in Figure 1), corresponds to the so‐called connecting peptide 1 (CP1), which in each class I aaRS links the two halves of the Rossmann fold and is specific to each system (Starzyk et al., 1987; Burbaum and Schimmel, 1991). In ArgRS, CP1 contains a helix (H12), a two‐stranded antiparallel β‐sheet (S8 and S9) and a strand S10, and sits on the top of the second half of the nucleotide‐binding fold.

Though relatively short in size in TyrRS and TrpRS (25 residues, mainly two short helices), CP1 contains >100 residues in GlnRS (residues 100–210) and is folded into a five‐stranded antiparallel β‐sheet flanked by three helices. This domain has been called acceptor‐binding domain in GlnRS because it has been shown to stabilize the hairpin conformation of the 3′ end of the tRNAGln.

Despite an overall structure which is specific for each system, striking similarities between CP1s of GluRS, GlnRS, MetRS and ArgRS can be pointed out. These enzymes share two common substructures: (i) a helix–loop–strand–loop–helix motif (from helix H11 to helix H12) and (ii) a two‐stranded antiparallel β‐sheet (S8 and S9). The first common substructure is very similar in all four enzymes; GluRS possesses an extra long loop (residues 71–80 of GluRS). This common feature has already been mentioned for GlnRS, GluRS and MetRS (Landès et al., 1995); it closes the back of the active site and may play a structural role in stabilizing the overall structure.

For the common β‐sheet substructure, system specificity is expressed in the lengths and conformations of the connection between helix H12 and the first strand of the β‐sheet (S8), the peptide linking the two strands (S8 and S9) and the peptide linking strand S9 to helix H13. In GlnRS, this last peptide, which spatially overlaps Ins‐1 of ArgRS, has been shown to provide a binding pocket for the looped out base C74 of tRNAGln.

Anticodon‐binding module

ArgRS possesses an α‐helical C‐terminal domain (Add‐2, residues 411–607, colour coded yellow in Figures 1 and 2) which contains 10 helices and one strand (S13). By spatial analogy with other aaRSs, this module should be the putative anticodon‐binding domain. Add‐1 is spatially adjacent to Add‐2, which could imply that both modules will cooperate in specific tRNA recognition.

Helices H20 and H21 can be considered as a single helix kinked in the middle by a 310 helix turn. Helices H18, H20, H22 and H23 constitute a four‐helix bundle which has a topology similar to the C‐terminal domain of MetRS (Brunie et al., 1990). Superimposition of the corresponding Cα atoms of the two domains (113 residues) yields r.m.s. deviations of 2.42 Å (ArgRS, residues 455–600; MetRS, residues 340–491). System specificity is expressed in the length of the helices. H19 is only present in ArgRS whereas MetRS has an extra C‐terminal extension peptide.

Helices H17 (residues 475–479) and H18 (residues 486–500) present a striking structural feature. Their axes align with an interaxial angle of 0.8°, so they could make a single helix (which is observed in MetRS). However, a small contiguous peptide of six amino acids (Ser480–Phe481–Glu482–Gly483–Asp484–Thr485) joins these two helices and creates a protruding Ω loop at the surface of the protein. This exposed Ω loop, located just after the tRNA‐anchoring platform (see below), may be used as a molecular switch by ArgRS for specific recognition. In vivo experiments have shown that a mutation of Gly483 to a serine is lethal for cell growth, which gives support for the functional role of this residue (Eriani et al., in preparation).

The similar structures and functions of the anticodon‐binding domains of ArgRS and MetRS (Figure 3C) suggest that they may have evolved from the same ancestor. The original subclass definition of class I aaRSs should, therefore, be revised. Our results would place ArgRS and MetRS in the same subgroup and confirm the subgroup classification proposed by Landès et al. (1995). Sequence alignments based on structure superposition of the two anticodon‐binding domains highlight amino acid residues which are involved mainly in the packing of the helices. Detailed sequence comparisons between MetRS and ArgRS will be published elsewhere (in preparation).

Helices H21, H22 and H23 interact underneath the floor of the active site with helix H6, which carries the ‘HIGH’ signature motif. H6 seems to be a crucial node of communication between the different modules of ArgRS. It interacts (i) with helix H11 anchoring domain Ins‐1, (ii) with helix H5 linking domain Add‐1 to the active site, and also (iii) with helix H16. One can therefore easily imagine how any structural events caused by substrate binding can be communicated all over the structure via those long helices (H23 is 37.3 Å long, H22 is 32.0 Å long, H16 is 30.3 Å long, H6 is 30.7 Å long and H11 is 29.8 Å long).

l‐Arginine binding and recognition

A well‐ordered arginine molecule was found bound to the active site (Figure 5A). l‐Arginine binds at the C‐terminal end of the β‐strands of the Rossmann fold in a crevice formed between the two symmetrical halves of the fold (Figure 5B). Arginine recognition involves amino acid residues from strands S5 and S11, helices H6 and H13 and the loop between S5 and H6. Arginine is bound specifically to the protein by a network of H‐bonds and salt bridge interactions (Figure 5C). All H‐bonding capability of the substrate is used by the protein for the specific recognition. The α‐amino group of the arginine molecule forms H‐bonds to the amide oxygen of Asn153 and the main chain carbonyl of Ser151, while the α‐carboxylate interacts with the amide nitrogens of Asn153 and Gln375. Note that strand S5 presents a β‐bulge at residues 149–150. This peculiar conformation may be important for the correct recognition of the α‐ammonium and α‐carboxy groups of the l‐arginine substrate by Ser151 and Asn153. The two histidine residues of the conserved signature of class I aaRSs also interact with the carboxylate atoms of l‐arginine, either by a direct H‐bond (His162) or by a water‐mediated interaction (His159). The guanidinium moiety forms two salt bridge interactions with two carboxylates residues, Glu148 on one side, and Asp351 on the other. The phenolic oxygen of Tyr347 plays a dual role, accepting an H‐bond from the η‐nitrogen atoms of the substrate and donating one to the amide oxygen of Gln375. This residue links the specificity of side chain recognition of the substrate to the correct positioning of the main chain atoms. An alignment of 29 ArgRS sequences shows that Asp351, Asn153 and Tyr347 are strictly conserved, while Glu148 is sometimes replaced by an aspartate. The mode of binding of l‐Arginine can thus be generalized to all ArgRSs. Like GlnRS and GluRS, ArgRS requires its cognate tRNA for the first step of the aminoacylation reaction, the amino acid activation. Our structure shows that tRNAArg binding is not a prerequisite for l‐arginine binding.

Figure 5.

l‐Arginine‐binding site. (A) Final (2FoFc) cross‐validated σ‐weighted omit map (resolution limits 20–2.75 Å, all data used, contour level 1.2σ). (B) Interaction between the active site and l‐Arginine. The two histidines of the HIGH motif are also shown. (C) l‐Arginine recognition: interactions between the substrate and the protein. The figure was drawn with SETOR (Evans, 1993).

Despite our best efforts, no crystal has been obtained either for the ArgRS alone, or for ArgRS in the presence of ATP alone. Once the initial conditions of crystal growth have been established in the presence of ATP and l‐Arginine, we have been able to reproduce crystals in the presence of l‐arginine alone. Therefore, only l‐Arginine is essential for crystal growth. One may postulate that l‐Arginine binding produces a conformational change in different loops around the active site which will promote intermolecular contacts necessary for crystal packing. One cannot exclude that ArgRS may also require the binding of tRNA to form the ATP‐binding site, as seen in the GlnRS–tRNAGln structure, where the terminal adenosine of tRNAGln interacts with the α‐phosphate of ATP.

ATP‐binding site

While ArgRS crystals grow in the presence of l‐Arginine and ATP, no ATP molecule is visible in the electron density map. Packing effects can explain this absence. Previous results on class I aaRSs have shown that the second signature motif ‘KMSKS’ is part of a mobile loop that is involved in the stabilization of the transition state, the second lysine forming a salt bridge interaction with the α‐phosphate of ATP (Brick et al., 1989). In our crystal form, this loop (G407–M408–S409–T410–R411) is involved in a strong crystal packing interaction with the domain Add‐1 of another molecule. The C‐terminal side of the Rossmann fold, together with strands S13 and S14, presents an open platform on which the domain I of a crystallographically equivalent molecule sits, interacting on the helix side. The ‘KMSKS’ loop is therefore located below helices H2 and H3 of the other molecule. Several H‐bond interactions anchor the two molecules together with an intimate salt bridge between Glu294 and Lys79 of the crystallographically equivalent molecule. The crevice which should receive the AMP moiety is accessible, but there is no room to accommodate the β‐ and γ‐phosphates.


Amino acid recognition by class I aaRSs

A structural comparison of the active sites of class I aaRSs reveals that each enzyme has its own solution for the binding of the main chain atoms of the amino acid substrate. There are no strictly conserved residues of the protein dedicated for this purpose. The main chain atoms of residues at the C‐terminal end of strands S5 or S11 are also involved in the recognition of the common peptidic part of the substrate. The local conformation of these strands is different in each enzyme, and irregularities such as β‐bulges are often found. The system‐specific conformation of S5 or the loop between S5 and H6 may explain the presence of at least one proline residue but at a different position in all class I aaRSs.

The variability within class I enzymes is apparent in the orientation of the peptidic atoms of the amino acid substrate, which is different in ArgRS compared with TyrRS and TrpRS. While the α‐amino group of l‐arginine points toward strand S5, those of Tyr and Trp are rotated 90° away and are directed toward helix H13. An orientation of the peptidic portion of the substrate similar to that found in ArgRS seems to be present in the structure of the complex between GlnRS–tRNAGln and a glutaminyl‐adenylate analogue (Figure 3 in Rath et al., 1998).

Only two residues are conserved in the amino acid‐binding crevice of class I aaRSs, a tyrosine residue (Tyr347 in ArgRS) and an acidic residue (Asp191 in ArgRS). Tyr347 is special since it is strictly conserved in GlnRS, GluRS, TyrRS, TrpRS and ArgRS. In ArgRS, it cooperates in the recognition of the η‐nitrogen atom of the substrate. In TyrRS (Tyr169) and TrpRS (Tyr125), this residue makes a H‐bond to the α‐amino group. In the recent crystal structure of GlnRS–tRNAGln in the presence of a stable glutaminyl‐adenylate analogue (Rath et al., 1998), this tyrosine residue (Tyr211) together with a water molecule has been shown to be responsible for the specific recognition of the two amide hydrogens of the substrate glutamine. By analogy, this tyrosine is likely to be in a key position for the recognition of l‐glutamic acid by GluRS. It is astonishing that such a highly conserved residue may play several roles in different aaRSs.

The only other invariant residue, an acidic residue (Asp191 in ArgRS), is highly conserved in class I aaRSs (all but TyrRS and TrpRS) at the C‐terminal end of strand S5 (Landès et al., 1995). Our structure shows that this residue is not involved in substrate binding but has a structural role. It stabilizes the interactions of strand S5 with strand S6 and the surrounding secondary structures.

ATP‐binding site

Crystal structures of class I aaRSs and extensive solution experiments linked the two signature peptides ‘HIGH’ and ‘KMSKS’ to the ATP‐binding site (for a recent review, see Meinnel et al., 1995). The absolutely invariant glycine of the first signature sequence forms a flat platform at the N‐terminus of helix H6 on which the adenine base of the ATP sits. The two histidines and the lysine have been shown to be involved in the stabilization of the first transition state to the aminoacyl‐adenylate formation. In the GlnRS–tRNAGln complex, the two histidines make H‐bonds with the negatively charged ATP phosphates. The ArgRS structure illustrates the dual functional role played by these two histidines; the first (His162) forms a direct H‐bond with the carboxylate moiety of the l‐Arginine substrate and the second interacts through a water molecule. The structures of GlnRS, TyrRS and TrpRS have shown that the adenylate is formed with little displacement of the AMP and amino acid moieties, compared with their initial binding conformations. Model building suggests (data not shown) that His159 of ArgRS can easily interact with the α‐phosphate of ATP in the initial conformation.

Based on sequence alignment of class I aaRSs, the precise location of this second signature motif ‘KMSKS’ in ArgRS was a subject of controversy (Landès et al., 1995). Our structure clearly identifies G407–M408–S409–T410–R411 as the correct motif where the essential lysine is not present. Only 10 out of 29 ArgRS sequences known to date (data not shown) display a lysine residue at the third position of this signature sequence. The variability of the 29 ArgRS sequences at this location can be described by the following pattern M(F,I,L)S(K,R)T(G,K)R K(R,S,A,T,E)G(A). A single lysine residue is always present, either in position 2, 3 or 5. This lysine can play the role of the second lysine of the canonical ‘KMSKS’. Position 4 is always occupied by an arginine residue, which is also present at a similar position in GlnRS and GluRS. In these three structures, this residue is directed toward the solvent and does not seem to be implicated directly in initial substrate binding (tRNA or ATP).

tRNA recognition and binding

The structural similarities between ArgRS and GlnRS mentioned above allow a model to be built by docking a tRNA molecule to the ArgRS structure, based on the crystal structure of the GlnRS–tRNAGln complex. The resulting model shows that (i) the amino acid acceptor stem is clamped on one side by Ins‐1, and by the second half of the Rossmann fold on the other, and that (ii) domains Add‐1 and Add‐2 together recognize the anticodon stem and arm of tRNAArg.

A tRNA‐anchoring platform

ArgRS, GlnRS, GluRS and MetRS share a common motif made of two strands (S13 and S14), located after the second half of the Rossmann fold, which are linked by a left‐handed crossover connection. This connection is vital for class I aaRSs because it brings the two signature motifs on the same side on the active site. Neither TrpRS nor TyrRS has the second strand. This feature has already been mentioned for GlnRS, GluRS and MetRS (Landès et al., 1995). This motif is involved in the anchoring of the tRNA molecule to the synthetase platform, as seen in the crystal structure of the GlnRS–tRNAGln complex. It interacts with the inside of the L‐corner of tRNAGln at the junction between the two helical domains. This feature clearly suggests a similar tRNA positioning by these four enyzmes.

An RNA‐binding domain at the N‐terminus of ArgRS

Our model for tRNA docking on ArgRS indicates that domain Add‐1 of ArgRS will most probably be involved in tRNA recognition, interacting with the D loop side of the tRNA. Add‐1, particularly the exposed face of the β‐sheet, possesses the characteristics which have been pointed out in many nucleic acid‐binding proteins: an exposed aromatic or aliphatic residue (Phe109 and Leu70) which may be involved in van der Waals and hydrophobic interactions; positively charged residues (Arg66, Arg75 and Lys103) which may interact with the sugar–phosphate backbone; and polar side chains (Asn62, Asn106 and Gln111) which may be involved in direct or water‐mediated interactions with the nucleic acid (for a review, see Arnez and Cavarelli, 1997).

Structural homology between Add‐1 of ArgRS and the C‐terminal domain of GAPDH was only revealed by the program SARF (Alexandrov, 1996). A monomer of GAPDH can be divided schematically into two domains, an N‐terminal domain which contains the Rossmann fold (residues 1–140) and binds the cofactor nicotinamide adenine dinucleotide, and a C‐terminal domain (residues 149–331) built around a twisted six‐stranded antiparallel β‐sheet. Structural superposition reveals that residues 155–177 and residues 237–317 of GAPDH have the same topology as Add‐1 of ArgRS (r.m.s. deviation over 67 Cα residues of 2.8 Å). This correlation gives a bona fide class I aaRS structure to monomeric GAPDH: a nucleotide‐binding site and a putative RNA‐binding domain. This may explain the observation that GAPDH was characterized as a tRNA‐binding protein which may participate in tRNA export (Singh and Green, 1993).

The spatial arrangement of the secondary structural elements of Add‐1, but with different connectivities, is also similar to that of two other RNA‐binding proteins: the small nuclear ribonucleoprotein U1A protein (Nagai et al., 1990; Oubridge et al., 1994) and ribosomal protein S6 (Lindahl et al., 1994).

Anticodon binding

According to our model, the protein–tRNA interface would involve residues from helices H21 and H22, the loops surrounding those two helices and the exposed face of the antiparallel β‐sheet of the Add‐1 domain. Add‐1 would contribute to the recognition of the D loop and the anticodon arm. This is supported by the distribution of the electrostatic potential on the solvent‐accessible surface of ArgRS, which is predominantly positive for domain Add‐1 (Figure 6). It also agrees with solution studies which have shown that A20 and C35 are the nucleotides responsible for the arginine identity in E.coli tRNAArg (McClain and Foss, 1988; Schulman and Pelka, 1989; McClain et al., 1990; Tamura et al., 1992; Saks et al., 1998).

Figure 6.

Docking model of tRNA binding to ArgRS. The molecular surface of the protein showing the electrostatic potential was calculated with GRASP (Nicholls and Honig, 1991). Negatively charged regions are in red and positively charged areas in blue. The tRNA is rendered as a green coil. The orientation of the ArgRS molecule corresponds to that of Figure 1A.

Superposing the Rossmann folds of ArgRS, GlnRS and GluRS clearly emphasizes differences in the relative position in space of the anticodon‐binding domain of ArgRS relative to the two other enzymes. Looking in a direction parallel to the plane of the Rossmann fold and perpendicular to the strand direction, with the C‐termini of the strands on the right hand side, the anticodon‐binding domains of GluRS and GlnRS are located on the right hand side, while the anticodon‐binding domain of ArgRS is located on the left hand side (Figure 3). The two positions are separated by ∼38 Å. The structural feature determining these alternative locations seems to be related to the orientation of helix H17–H18 relative to strand S14. This different positioning can be accommodated easily by a rotation of the tRNA along an axis parallel to the helical axis of the acceptor arm of the tRNA.

It is remarkable that MetRS, with a related anticodon‐binding domain, has not retained a domain similar to Add‐1 for helping in tRNA selection. The extra lock used by ArgRS may also be correlated to the numerous anticodons recognized by ArgRS (six codons) in comparison with MetRS (one codon). The three anticodon bases of tRNAMet are identity determinants for the aminoacylation reaction by MetRS, which means that MetRS strongly discriminates its cognate tRNA at the anticodon level (Muramatsu et al., 1988; Schulman and Pelka, 1988; Meinnel et al., 1991a,b). In contrast, arginine identity at the anticodon level depends mainly on nucleotide C35. Several residues exposed at the surface of the helices seem to be good candidates for nucleic acid recognition and will be probed by site‐directed mutagenesis. The crucial role of helices H17–H18 and H22 in tRNA recognition has already been probed by in vivo selection of enzyme mutations lethal for cell growth; seven out of 29 such mutations map to those two helices. A detailed structural analysis of these mutations will be published elsewhere.

Materials and methods

Gene expression and protein purification

The gene encoding yeast cytoplasmic ArgRS was isolated from genomic DNA by PCR amplification. The two oligonucleotides used for the experiment were designed in order to clone the gene (termed RRS1) into the NcoI site of the pTrc99 vector behind the strong trc promoter (Aman et al., 1988). This construct led to the expression of a non‐fusion protein with the authentic amino acid sequence. The E.coli strain TBI [F araΔ(lac‐proAB) hsdR (rk‐ mk+) rpsL(Strr) (φ80, dlacΔ(lacZ)M15] was transformed by the resulting vector pTrc99‐RRS1. The moderate level of RRS1 expression was improved by co‐transformation with pSBETa, a vector carrying the E.coli argU gene (Schenk et al., 1995). This gene encodes the rare tRNA4Arg decoding the AGA and AGG arginine codons rarely used in E.coli, but used at much higher frequencies in eukaryotic coding sequences (72% of the arginine codons are AGG or AGA codons in RRS1). The presence of the ‘helper’ tRNA in the E.coli cells increases the level of expression of RRS1 and the proportion of soluble ArgRS. Gene expression and protein purification then followed a protocol already published (Sissler et al., 1997).

Crystallization and data collection

Initial crystallization conditions were found by screening different precipitating agents for ArgRS alone and in the presence of the other substrates (l‐arginine and ATP). Crystals initially were only obtained in the presence of l‐arginine and ATP. The best crystals were grown at 17°C by the hanging drop vapour diffusion method against a reservoir containing 2.45 M ammonium sulfate pH 7, 100 mM Tris–HCl buffer pH 7.0. For the final setup, 4 μl of reservoir solution were mixed with 4 μl of the protein solution. This protein solution contains 13.5 mg/ml ArgRS, 5 mM l‐arginine, 5 mM ATP, 10 mM MgSO4 in 50 mM Tris–HCl buffer pH 7. Crystals grow after a few days. They belong to space group P43212 with unit cell constants a = b = 100.347 Å, c = 204.34 Å and with one molecule of ArgRS in the asymmetric unit, resulting in a solvent content of 63%. X‐ray diffraction data were collected at 100 K on a 30 cm Mar‐Research image plate at the station BW7B of the EMBL Hamburg outstation at DESY (λ = 1.1024 Å). The data collection statistics are presented in Table I. Each data set was collected from a single frozen crystal. The crystals were prepared for cryocooling by transferring them first into solutions of mother liquor containing 20% glycerol for 2 min and then plunged into liquid ethane; they were inserted just before data collection into a nitrogen gas flow at 100 K. The native data set between 30 and 2.75 Å was obtained from 215 frames of 0.5° oscillations, split into two runs in order to collect a complete data set at low and high resolution (125 frames with a crystal–image plate distance of 350 mm, and 90 frames with a crystal–image plate distance of 450 mm). The native data set was 99.4% complete in the resolution range 30–2.75 Å. Two heavy atom derivatives, PCMBS (p‐chloromercuribenzene sulfonate) and K2Hg(CN)4 (potassium tetracyano mercurate) were collected and used for the phasing process. For PCMBS, the anomalous signal was measured by collecting two sets of 100 frames (0.5° oscillation) separated by a rotation angle of 180°. Data were processed with the programs DENZO and SCALEPACK (Otwinowski and Minor, 1997).


Heavy atom‐binding sites were determined by difference Patterson and difference Fourier maps. The two derivatives have three heavy atom sites which correspond to the three cysteine residues of yeast ArgRS. Refinement of the heavy atom parameters was performed by the maximum likelihood approach as coded in the program MLPHARE (CCP4, 1994). The calculated phases gave an overall figure of merit of 0.51 for data between 30 and 3.75 Å resolution. The anomalous signal of the PCMBS derivative allowed the determination of the correct space group (P43212 versus P41212) and the correct hand of the heavy atom derivatives. An initial 3.8 Å map was improved by solvent flattening, using a solvent content of 60%, solvent flipping and histogram matching using the programs SOLOMON (CCP4, 1994) and DM (Cowtan, 1994). The calculated phases were then extended to 3.5 Å resolution using SOLOMON. All crystallographic calculations were carried out with the CCP4 package (CCP4, 1994).

Model building and refinement

The modified 3.5 Å MIRAS map allowed chain tracing for most parts of the polypeptide chain, and a polyalanine model of the ArgRS molecule was built using the program O (Jones et al., 1991). Phases obtained by combining the MIRAS phases with those derived from the polyalanine model using the program SIGMAA (Read, 1986) were then modified at 3.5 Å by solvent flattening. This phase modification process combined with model building was repeated several times. The amino acid sequence could be fitted for most of the chain to give an initial model which included 603 residues. The model was refined with the program CNS (Brünger et al., 1998), using the Engh and Huber stereochemical parameters (Engh and Huber, 1991). Initially, only 3.2 Å resolution data were included for torsion angle molecular dynamics refinement using a cross‐validated maximum likelihood crystallographic target. Inclusion of the bulk solvent correction, as implemented in CNS, allowed the use of all low resolution collected data. The crystallographic R‐factor for the starting model was 38.8%. A random sample containing 7.5% of the data was excluded from the refinement and used for monitoring the course of the refinement (Brünger, 1992). Torsion angle refinement at 3.2 Å was followed by rounds of model building in cross‐validated σ‐weighted maps with coefficient (2FoFc) and (3Fo−2Fc) (Kleywegt and Jones, 1994; Kleywegt and Brünger, 1996) and by torsion angle refinement in which the resolution of the data was increased gradually. In the latest stage, cartesian coordinate refinement was followed by individual B‐factor refinement. All rebuilding and graphics operations were done with O and related Uppsala programs. At every stage, models resulting from a refinement round were subjected to critical quality analyses, using the programs O, OOPS (Kleywegt and Jones, 1996), PROCHECK (Laskowski et al., 1993) and WHATIF (Vriend, 1990).

The refined model

The current model contains one ArgRS molecule of 603 residues, one l‐arginine substrate and 280 water molecules. The quality of the refined structure was assessed using the Biotech validation suite for protein structures (Wodak et al., 1995). The crystallographic R‐factor is 19.7% using all reflections between 20 and 2.75 Å with no sigma cut off (Rfree = 26.9%). The stereochemistry of the model was inspected by PROCHECK (see Table I for detailed analysis). The average B‐factor of the model is 53.3 Å2 except for two surface loops (residues 235–242 and residues 252–261) where B‐factors exceed 100 Å2. This is in agreement with the overall B‐factor determined by Wilson plot on the collected data (B = 43 Å2). The high B‐factor may be explained by the high solvent content in the crystal and the few packing interactions between the molecules in the crystal lattice. The atomic coordinates have been deposited at the Brookhaven Protein Data Bank.


We thank Professor Axel T.Brünger (Yale University) for allowing us to use a pre‐release version of the CNS software system and Dr John G.Arnez for careful reading of the manuscript. We thank the staff of the EMBL Outstation at DESY for use of their synchrotron instrumentation (beam line BW7B). This work was supported by grants from CNRS and by EEC contracts.


View Abstract