Crystal structure of HIV‐1 reverse transcriptase in complex with a polypurine tract RNA:DNA

Stefan G. Sarafianos, Kalyan Das, Chris Tantillo, Arthur D. Clark, Jianping Ding, Jeannette M. Whitcomb, Paul L. Boyer, Stephen H. Hughes, Edward Arnold

Author Affiliations

  1. Stefan G. Sarafianos1,
  2. Kalyan Das1,
  3. Chris Tantillo1,
  4. Arthur D. Clark Jr1,
  5. Jianping Ding1,
  6. Jeannette M. Whitcomb2,
  7. Paul L. Boyer3,
  8. Stephen H. Hughes3 and
  9. Edward Arnold*,1
  1. 1 Center for Advanced Biotechnology and Medicine (CABM) and Rutgers University Chemistry Department, 679 Hoes Lane, Piscataway, NJ, 08854‐5638, USA
  2. 2 ViroLogic, Inc., 270 E. Grand Avenue, S. San Francisco, CA, 94080, USA
  3. 3 HIV Drug Resistance Program, NCI‐Frederick Cancer Research and Development Center, PO Box B, Frederick, MD, 21702‐1201, USA
  1. *Corresponding author. E-mail: arnold{at}


We have determined the 3.0 Å resolution structure of wild‐type HIV‐1 reverse transcriptase in complex with an RNA:DNA oligonucleotide whose sequence includes a purine‐rich segment from the HIV‐1 genome called the polypurine tract (PPT). The PPT is resistant to ribonuclease H (RNase H) cleavage and is used as a primer for second DNA strand synthesis. The ‘RNase H primer grip’, consisting of amino acids that interact with the DNA primer strand, may contribute to RNase H catalysis and cleavage specificity. Cleavage specificity is also controlled by the width of the minor groove and the trajectory of the RNA:DNA, both of which are sequence dependent. An unusual ‘unzipping’ of 7 bp occurs in the adenine stretch of the PPT: an unpaired base on the template strand takes the base pairing out of register and then, following two offset base pairs, an unpaired base on the primer strand re‐establishes the normal register. The structural aberration extends to the RNase H active site and may play a role in the resistance of PPT to RNase H cleavage.


HIV‐1 reverse transcriptase (RT) is a multifunctional enzyme that is responsible for copying the single‐stranded viral RNA genome into double‐stranded DNA (Telesnitsky and Goff, 1997). RT contains a DNA polymerase that can copy either an RNA or DNA template and a ribonuclease H (RNase H) activity that cleaves the RNA strand in RNA:DNA hybrids. In addition to degrading the RNA genome after it has been copied to DNA, the RNase H cleavages define the ends of the double‐stranded genome that are the substrates for integration into the host genome. In vivo studies demonstrate that inactivation of RNase H results in non‐infectious virus particles (Tanese and Goff, 1988; Schatz et al., 1989).

HIV‐1 RT is a heterodimer consisting of p66 and p51 subunits. Both subunits are derived from a gag‐pol polyprotein, which is cleaved by the viral protease. The two subunits have a common N‐terminus; p51 lacks the C‐terminal RNase H domain present in p66. Crystal structures of HIV‐1 RT complexes with DNA:DNA have been determined in the presence (Huang et al., 1998) and absence (Jacobo‐Molina et al., 1993; Ding et al., 1998) of a bound dNTP. While both subunits contain fingers, palm, thumb and connection subdomains, the arrangement of the subdomains in the two subunits is very different (Kohlstaedt et al., 1992; Jacobo‐Molina et al., 1993). The crystal structure of the binary complex of HIV‐1 RT and a DNA:DNA substrate showed that the DNA:DNA substrate is bent by ∼40° (Jacobo‐Molina et al., 1993; Ding et al., 1998). Near the polymerase active site the duplex adopts A‐form geometry; near the RNase H active site the duplex adopts B‐form geometry (Ding et al., 1997). The structure of the isolated RNase H domain of HIV‐1 RT was solved (Davies et al., 1991) before the structure of the intact HIV‐1 RT was known. The HIV‐1 RNase H domain has a structure that is very similar to the RNase HI of Escherichia coli (Katayanagi et al., 1990; Yang et al., 1990) and of Thermus thermophilus (Ishikawa et al., 1993), but none of the RNases H has been cocrystallized with RNA:DNA, their natural substrate, and there are no published structures of HIV‐1 RT in complex with an RNA:DNA duplex.

Polypurine tract in retroviral replication

Viral DNA synthesis (see Figure 1) is initiated from a cellular tRNA base paired to the genome of HIV‐1 at the primer binding site (PBS). As the minus (−) DNA strand is synthesized, the RNA strand is digested by RNase H. The PBS is near the 5′ end of the genome. Degradation of the RNA by RNase H allows DNA synthesis to be transferred to the 3′ end of the RNA. After strand transfer, (−) strand synthesis can continue, accompanied by RNase H degradation of the RNA genome, but the degradation is not complete. The purine‐rich polypurine tract (PPT) is resistant to RNase H cleavage and serves as the primer for plus (+) strand synthesis (reviewed in Telesnitsky and Goff, 1997). The PPT sequence is just 5′ of U3 (Figure 1). Removal of the PPT primer by RNase H defines the left end of the upstream long terminal repeat (LTR) (Figure 1E), which, together with the downstream LTR, is the substrate for the viral integrase enzyme that inserts the linear viral DNA in the host genome. Unlike many retroviruses that have only one PPT sequence, HIV‐1 has a second copy of the PPT (central PPT) located near the center of the genome (Charneau et al., 1992). Mutations replacing purines by pyrimidines in the HIV‐1 and TY1 central PPTs, which do not modify amino acid sequence, slow down viral growth (Charneau et al., 1992; Hungnes et al., 1992; Heyman et al., 1995), suggesting that the central PPT is important, but not necessary for replication of retroviruses. While the sequence of multiple copies of PPTs is identical, it is likely that the relative importance of a PPT is determined by the neighboring sequences that flank different copies of PPT.

Figure 1.

Process of reverse transcription of the HIV‐1 genome. (A) Minus strand DNA synthesis (DNA strand in red) is initiated using a cellular tRNA annealed to the PBS. The RNA strand of the RNA:DNA duplex is degraded by RNase H of HIV‐1 RT. (B) First strand transfer allows annealing of the newly formed DNA to the 3′ end of the viral genome. Transfer is mediated by identical repeated (R) sequences. (C) Minus strand DNA synthesis resumes, accompanied by RNase H digestion of all template RNA except PPT. (D) PPT is used as a primer for second strand DNA synthesis. (E) RNase H removes the tRNA and the PPT. In HIV‐1, a single RNA nucleotide (from tRNA) is left by RNase H at the RNA/DNA PBS junction. (F) During second strand transfer (not shown) the newly formed PBS DNA (second strand) anneals to the PBS DNA from the first strand. Completion of second strand synthesis results in a linear DNA duplex with LTRs at both ends.

There are at least three requirements for the end of the viral genome to be synthesized correctly. First, the PPT RNA must be resistant to cleavage by RNase H during (−) strand DNA synthesis (Figure 1C). Secondly, RNase H cleavage must occur precisely at the end of the PPT to generate the correct primer for the proper initiation of (+) strand DNA synthesis (Figure 1C). Thirdly, after the PPT primer has been used to initiate DNA synthesis, it must be precisely removed from the end of the viral DNA (Figure 1E).

These three requirements have been extensively studied using biochemical methods in the HIV‐1, avian sarcoma leukosis virus (ASLV) and Moloney murine leukemia virus (MuLV) systems (reviewed in Telesnitsky and Goff, 1997). The PPT is relatively resistant to RNase H degradation in vitro, although cleavages within the PPT can occur (Champoux et al., 1984; Resnick et al., 1984; Rattray and Champoux, 1989; Wöhrl and Moelling, 1990; Fuentes et al., 1995; Gao et al., 1998, 1999). There is evidence to suggest that both RT and the PPT are important for determining the specificity of cleavage and for controlling plus strand priming: (i) mutations at the primer grip or thumb subdomain of RT dramatically affect the ability of the enzyme to cleave specifically at the 3′ end of the PPT (Ghosh et al., 1997; Palaniappan et al., 1997; Powell et al., 1997, 1999; Gao et al., 1998); (ii) the isolated RNase H domain of MuLV RT exhibits different cleavage specificity of PPT than the intact MuLV RT (Zhan and Crouch, 1997); (iii) with the exception of a few changes at the G‐rich 3′ end of PPT, most single base mutations of the PPT sequence can be tolerated without altering the specificity of cleavage and plus strand priming by HIV‐1 RT (Huber et al., 1989; Rattray and Champoux, 1989; Luo et al., 1990; Wöhrl and Moelling, 1990; Pullen et al., 1993; Powell and Levin, 1996); and finally, (iv) the NMR structure of an 8 bp RNA:DNA oligonucleotide containing the last four residues of PPT (with a mutation of one G to A) and the first 4 bp of U3, shows that the width and shape of the major groove is unusual (Fedoroff et al., 1997). Despite the considerable body of experimental data, however, the molecular details of PPT recognition remain unclear both in terms of the generation of the primer and its removal.

Mechanism and specificity of RNase H cleavage

NMR structural studies (Fedoroff et al., 1993; Lane et al., 1993) have suggested that RNA:DNA duplexes are neither A‐ nor B‐form structures in solution. This led to the hypothesis that RNase H distinguishes DNA:RNA and RNA:RNA duplexes by recognizing differences in the width of the minor groove, and suggested that a minor groove width of ∼9–10 Å should be optimal for efficient recognition by RNase H (Fedoroff et al., 1993, 1997; Salazar et al., 1994; Zhu et al., 1995; Horton and Finzel, 1996; Han et al., 1997; Bachelin et al., 1998; Szyperski et al., 1999). In some cases, however, RNase H can cleave single‐stranded RNA adjacent to the RNA:DNA duplex region, albeit with low efficiency (Gao et al., 1997; Lima and Crooke, 1997). Furthermore, under certain conditions the RNases H of both MuLV RT and HIV‐1 RT can cleave an RNA:RNA substrate (Ben‐Artzi et al., 1992; Blain and Goff, 1993; Smith and Roth, 1993; Hostomsky et al., 1994; Götte et al., 1995). Finally, HIV‐1 RT can cleave chimeric hybrid duplexes (RNA‐DNA annealed to DNA) at the RNA/DNA junction where the minor grooves tend to be very narrow [as small as 4.5–5.5 Å (Szyperski et al., 1999)] and bent (Salazar et al., 1994; Szyperski et al., 1999). These data suggest that the specificity of RNase H cleavage does not depend solely on the width of the minor groove (Szyperski et al., 1999).

In an effort to study the interactions of HIV‐1 RT with RNA:DNA, to discern the molecular details of PPT recognition and to understand better the mechanism and specificity of RNase H cleavage, we determined the 3.0 Å crystal structure of HIV‐1 RT in complex with a PPT‐containing RNA:DNA substrate and the Fab fragment of a monoclonal antibody. The PPT‐containing RNA:DNA oligonucleotide (r31:d29) was bound to HIV‐1 RT with the 3′ end of DNA at the polymerase active site and the middle part of the PPT (defined as a stretch of rAs) near the RNase H active site (Figure 2). This binding mode was designed to investigate the inefficient cleavage of the PPT by HIV RNase H. This complex has extensive protein– nucleic acid interactions, and the nucleic acid has unusual structural features that provide a basis for understanding why the PPT is resistant to degradation by HIV‐1 RNase H.

Figure 2.

Top: HIV genome sequence at the PPT (underlined) and U3 region. The minus strand synthesis initiation site is marked with an asterisk; +1 is the first nucleotide of U3. Bottom: sequence of the RNA:DNA oligonucleotide in our RT–RNA:DNA complex.

Results and discussion

Overall structure of the HIV‐1 RT–RNA:DNA complex

The overall conformation of the protein in the HIV‐1 RT–RNA:DNA complex (Figure 3) is similar to that in the HIV‐1 RT–DNA:DNA complex. There are extensive interactions between the nucleic acid and amino acids of all subdomains of the p66 subunit; there are also interactions between the nucleic acid and the fingers and connection subdomains of the p51 subunit (Figures 3 and 4). The nucleic acid binding cleft is ∼60 Å in length, extending from the polymerase active site to the RNase H active site. The distance in nucleotides between the polymerase and RNase H catalytic sites in this structure is 18 bp, which agrees with past biochemical studies with HIV‐1 RT (Schatz et al., 1990; Wöhrl and Moelling, 1990; Gopalakrishnan et al., 1992; Ghosh et al., 1995; Götte et al., 1995; Gao et al., 1998, 1999). In HIV‐1 RT complexes containing double‐stranded DNA template‐primers, the distance between the polymerase and RNase H active sites is 17 bp (Jacobo‐Molina et al., 1993; Huang et al., 1998). This nucleic acid‐dependent difference in the distance between the polymerase and RNase H active sites agrees with previous biochemical studies (Götte et al., 1998). The final model of RNA:DNA template‐primer includes 45 nucleotides (23 template, 22 primer) encompassing more than two full helical turns of RNA:DNA. The major groove is exposed to the solvent, as might be expected for a non‐specific DNA binding protein. The electron density of the nucleic acid is stronger in regions where there are extensive interactions with the enzyme, including the polymerase and RNase H active sites. Simulated annealing omit maps for nucleotides in key locations are shown in Figure 5.

Figure 3.

Stereo view of a ribbon representation of the structure of HIV‐1 RT in complex with the polypurine RNA:DNA. The fingers, palm, thumb, connection and RNase H subdomains of p66 are colored blue, red, green, yellow and orange, respectively. The p51 subunit is colored gray. The RNA template and DNA primer strands are shown in magenta and blue, respectively.

Figure 4.

The sequence and numbering scheme of the RNA:DNA PPT and the interactions between the nucleic acid and amino acid residues of HIV‐1 RT (≤3.8 Å). The RNA (orange) and DNA (cyan) strands are designated Tem and Pri, respectively. The nucleotide site positions are labeled with ascending numbers from the polymerase domain toward the RNase H domain. Amino acids of the p51 subunit are designated by an asterisk following the residue number; all others are in p66. RNase H nucleotide site positions are designated positive (+1 to +4) for positions 3′ to, and negative (−1 to −9) for positions 5′ to, the scissile phosphate, where the 3′ and 5′ orientations are for the RNA strand. Hydrogen bonds are shown in red dashed lines and other types of interaction are shown in solid black lines. 2′‐OH groups of RNA and phosphate groups are shown in red and gray spheres. Weakly paired (distance ≥3.6 Å), mismatched and unpaired bases are shown filled with stripes, spheres and empty, respectively. Residues Gly359 and Ala360 of the RNase H primer grip interact with the nucleic acid through their main‐chain atoms. Arg284 was modeled as Ala because of weak density for the side chain. N474 interacts with Pri15‐Thy through a water molecule (not shown).

Figure 5.

Simulated annealing (FoFc) omit electron density maps contoured at the 2σ level at the polymerase active site (1) (omitting nucleic acid) and of the unpaired residue of template (2) (omitting unpaired residue Tem‐15‐Ade).

RT has similar interactions with the DNA primer strand in RT–RNA:DNA and RT–DNA:DNA structures

While the contacts between RT and the DNA primer strands are very similar in the RT complexes with RNA:DNA and DNA:DNA template‐primers, the contacts with the RNA template (Figures 3 and 4) are different from those with the DNA template. However, there are relatively modest changes in the protein structure.

As seen in the RT–DNA:DNA structure, many of the RT contacts with the nucleic acid in the RT–RNA:DNA complex involve the sugar–phosphate backbone, consistent with the fact that RT can copy a wide variety of different templates. RT has numerous interactions with 2′‐OH groups of the RNA template in the RT–RNA:DNA complex. Such interactions (indicated in magenta in Figure 6) include residues 280 and 284 of helix I of the p66 thumb, residues of the template grip including 89 and 91 of the p66 palm and residues of the RNase H domain (Figures 4 and 6). The more extensive contacts between RT and RNA:DNA versus DNA:DNA may account for the increased polymerization activity and processivity of the enzyme with RNA templates.

Figure 6.

Molecular surface representation of HIV‐1 RT showing the nucleic acid binding cleft and the RNase H primer grip. Residues colored in cyan or magenta are amino acids within 3.8 Å of the 2′‐OH of RNA template nucleotides (magenta) or any other part of nucleic acid (cyan). The RNA template is shown as red ribbon and the DNA primer in blue ribbon. Minor groove widths proximal to the thumb area or at the RNase H active site are indicated (∼10 and ∼8 Å, respectively). The trajectory and minor groove width of a hypothetical RNA strand that can be cleaved efficiently by RNase H are shown in red. The RNase H primer grip region is shown in ball and stick representation in the figure inset.

There is also an increased involvement of the p51 subunit in binding the RNA:DNA relative to the DNA:DNA template‐primer. Although residues Lys395 and Glu396 of the p51 subunit interact with both the DNA:DNA and RNA:DNA duplexes (with Pri10‐Ade and Pri11‐Ade, Figure 4), residues Lys22 and Lys390 of p51 interact only with the RNA:DNA duplex (with the phosphates of Tem4‐Gua and Tem16‐Ade).

Nucleic acid geometry

The nucleic acid geometry was analyzed using CURVES (Lavery and Sklenar, 1988) and SCHNAP (Lu et al., 1997). Both programs yielded similar results. Despite the difference in the nature of the duplex (RNA:DNA versus DNA:DNA), the sequence (PPT versus PBS) and the length (31:29 versus 19:18), the overall conformations of these two nucleic acids are remarkably similar. Both have a bend of ∼40° with the helical curvature occurring smoothly over bp 5–9 from the polymerase active site (Jacobo‐Molina et al., 1993; Ding et al., 1998) (Figure 7). This bend is a hallmark of nucleic acids bound to a variety of polymerases and is associated with a transition from A‐ to B‐form geometry.

Figure 7.

Stereo view of structures of the nucleic acid template‐primers in the RT–DNA:DNA (Ding et al., 1998) and RT–RNA:DNA complexes. The 19mer DNA and 31mer RNA templates are shown in yellow and magenta, respectively. The 18mer DNA and 29mer DNA primers are cyan and blue. Region I contains the 4 bp near the polymerase active site. Region II consists of the next 4 bp at the bend of the nucleic acid. The next 5 bp compose region III, followed by region IV that contains residues of the ‘unzipped’ part of PPT.

The helical parameters of the RNA:DNA duplex bound to HIV‐1 RT were compared with values for canonical A‐ and B‐form DNA:DNA and A‐form RNA:RNA duplexes and with other RNA:DNA hybrids (Table I). The values obtained for the RNA:DNA in complex with RT were averaged within four separate regions of the nucleic acid. Region I contains the 4 bp near the polymerase active site. Region II consists of the next 4 bp at the bend of the nucleic acid. Region III includes the next 5 bp, followed by region IV, which contains residues of the ‘unzipped’ portion of the PPT (see below). The parameters that define nucleic acid conformation (Table I) suggest that none of the four regions has canonical A‐ or B‐type geometry. However, the geometry of region I is significantly closer to that of A‐form than the other regions. Regions II–IV have a conformation closer to an intermediate between A‐ and B‐form, consistent with the suggestion of Arnott et al. (1986) that RNA:DNA duplexes adopt ‘H‐form’ geometry. The H‐form conformation is characterized by values for the inclination of the base pairs with respect to the helical axis, the dislocation of the base pairs from the helix axis (Xdisp) and the helical rise, which are intermediate between those of canonical A‐ and B‐form helices (Table I). NMR studies of RNA:DNA duplexes in solution showed that unliganded RNA:DNA duplexes adopt H‐form geometry (Fedoroff et al., 1993; Lane et al., 1993). The width of the minor groove of the RNA:DNA (PPT) varies between canonical A‐ and B‐forms (Figure 8), however, and it is closer to B‐form in the stretch of rA:dTs (region IV, average minor groove width 7 Å). This is similar to the minor groove width in B‐DNA (6 Å) determined by X‐ray diffraction of DNA fibers (Chandrasekaran et al., 1989), and markedly lower than the average minor groove width in the RNA:DNA duplexes whose structures have been solved by NMR [∼9 Å, (Arnott et al., 1986; Fedoroff et al., 1993; Lane et al., 1993; Han et al., 1997)] or crystallography (8.7–10.5 Å) (Horton and Finzel, 1996), as shown in Figure 8. A purine‐rich sequence with only two consecutive As (RNA:DNA 5′‐GAAGAAGAA:CTTCTTCTT) had a considerably wider minor groove (9.4–10.1 Å) (Xiong and Sundaralingam, 1998), suggesting that the narrowing of the minor groove in 5′‐oligo(rA): 3′‐oligo(dT) tracts of RNA:DNA follows the same rules as in DNA:DNA [5′‐oligo(dA):3′‐oligo(dT) tracts], i.e. maximal narrowing of the minor groove requires at least four consecutive As. The narrow minor groove we report here for the rA:dT tract of the PPT is remarkably close to the minor groove narrowing predicted by Dickerson and coworkers (Han et al., 1997).

Figure 8.

Variation in minor groove width of the RNA:DNA template‐primer. The four regions of nucleic acid are defined in the legend of Figure 7. The minor groove width values for canonical A‐ and B‐type DNA are 11 and 6 Å, respectively.

View this table:
Table 1. Nucleic acid parameters

The geometry of the RNA:DNA hybrid in complex with RT is considerably less uniform than the geometry of unliganded RNA:DNA duplexes, either in solution or in crystals. The variation in the width of the minor groove of the protein‐bound RNA:DNA complex (up to 4.5 Å) (Figure 8) is considerably larger than that of free RNA:DNA hybrids (typically 0.5–2 Å), suggesting that the structure of the RNA:DNA duplex is significantly affected by the contacts with the enzyme. The effects of RT are manifested in several ways: (i) the RNA:DNA template‐primer has a bend near the RT polymerase active site that is similar to the bend in complexes of RT and a DNA:DNA duplex; (ii) both the RNA:DNA (PPT) and DNA:DNA (PBS) duplexes have similar A‐form geometry near the RT polymerase active site, where the majority of the protein–nucleic acid contacts occur; and (iii) there are structural irregularities (discussed below in more detail) in the RNA:DNA (PPT) that may be either caused or stabilized by specific contacts with the enzyme (Figure 4).

Role of A‐tracts in the structure of PPT

A‐tracts [in this case, stretches of four or more consecutive 5′‐oligo(dA):3′‐oligo(dT) (dA:dT) or 5′‐oligo(rA): 3′‐oligo(dT) (rA:dT)] have long been known to display several unusual features: they are straight (Han et al., 1997) and have a narrow minor groove and a large propeller twist, presumably because there are only two rather than three hydrogen bonds between the bases. A‐tracts (dA:dT) are highly resistant to reconstitution around nucleosome cores (Rhodes, 1979) and interact poorly with TATA binding protein, which requires an ∼80° bend at the binding site (Kim et al., 1993). Integration host factor (IHF), a small protein that specifically recognizes an A‐tract (dA:dT) by binding the phosphates of the narrow minor groove, appears to recognize an A‐rich (dA:dT) DNA duplex by structure rather than by base‐specific contacts (Rice et al., 1996).

It is possible that special structural features of A‐tracts (rA:dT) are recognized by HIV RT and affect the specificity of RNase H cleavage. While A‐tracts in DNA (dA:dT) are unbent, bends are commonplace at junctions between A‐tracts (dA:dT) and a dG:dC base pair (Dickerson et al., 1996). A dG:dC/dA:dT junction has been reported to be a flexible hinge, capable of adopting either a straight or a bent conformation under the influence of local forces (Dickerson et al., 1994). While this information is based primarily on crystallographic studies with DNA:DNA oligonucleotides, the same factors, i.e. large propeller twist and narrow minor groove, exist in an rA:dT RNA:DNA duplex (Table I). Therefore, the PPT sequence may have a natural propensity for (i) bending at the rG:dC/rA:dT junction and for (ii) 'stiffness’ in the stretches that contain multiple As. These two properties, combined with the extensive protein–nucleic acid interactions in this region (Figure 4; Table II), may contribute to an unusual structural deformation that we term ‘unzipping of the PPT’, which is centered at the rG:dC/rA:dT junction and is discussed below. Furthermore, they are likely to affect the trajectory of the template‐primer and its positioning at the RNase H active site.

View this table:
Table 2. Template‐primer contacts affecting RNase H activity

‘Unzipping’ in the PPT

By far the most striking structural feature of the PPT structure is a departure from Watson–Crick base pairing involving several nucleotide pairs. We refer to this as ‘unzipping of the PPT’. The structural aberration starts at the 5′ end of the PPT with the ‘melting’ of the first 2 bp of the PPT (bp 13 and 14, Figure 4), leaving an unpaired template base (Tem15‐Ade, Figures 4 and 5). The unpaired template nucleotide is followed by a frame‐shifted A‐T base pair (Tem16‐Ade:Pri15‐Thy) and a G‐T mismatch (Tem17‐Gua:Pri16‐Thy). The next primer nucleotide (Pri17‐Cyt) is unpaired, compensating for the unpaired residue of the template (Tem15‐Ade), bringing the nucleotide pairs back into frame. The next 4 bp are located adjacent to the RNase H active site and linked by what would normally be the scissile phosphate. For these 4 bp, although the nucleic acid duplex is realigned, the separations between the bases are relatively large. This means that the bases of the RNA template positioned at the RNase H active site are loosely bound to the corresponding bases of the DNA primer and held in place mostly by stacking interactions. Finally, the nucleic acid duplex appears to ‘heal’ and restore normal register in the remaining base pairs immediately following the RNase H active site.

Throughout the length of this structural aberration there are numerous contacts between amino acids of RT and phosphate groups of the 'stretched’ or 'slipped’ base pairs (Figures 4 and 6). The protein–nucleic acid interactions may contribute to the fact that the unusual nucleic acid structure in this region is well defined. It is not clear whether the unzipped PPT structure results from the protein–RNA:DNA contacts or is an intrinsic property of the PPT sequence. However, it seems plausible that the unusual structure depends on three factors: (i) the stiffness of A‐tracts; (ii) the propensity to bend at the end of A‐tracts (in particular at an AT/GC junction); and (iii) the large number of contacts with the amino acids of the RNase H domain.

Contacts of RNase H with template‐primer: the RNase H primer grip

In the RT–RNA:DNA complex, as well in RT complexes with DNA:DNA, a network of amino acids interacts with the DNA primer strand near the RNase H active site (Table II). We propose that these amino acids form an element of the RT structure (the ‘RNase H primer grip’) that serves to position the DNA primer strand near the RNase H active site and helps to determines the trajectory of the template strand in relation to the RNase H active site. The RNase H primer grip makes contacts with nucleic acid site positions −4 to −9 relative to the scissile phosphate base pair (see Figure 4 for a definition of nucleic acid site position). The RNase H primer grip, by interacting with the DNA primer strand, may also control the interactions of the RNA template substrate and the RNase H catalytic site in a nucleic acid sequence‐dependent and structure‐dependent manner. The interactions between the PPT and the RNase H primer grip of HIV‐1 RT include residues Lys395 and Glu396 of the p51 subunit, Gly359 (main chain), Ala360 (main chain) and His361 of the p66 connection subdomain, and Thr473, Asn474 (mediated by a water molecule), Gln475, Lys476, Tyr501 and Ile505 of the RNase H domain of HIV‐1 RT (Table II). HIV‐1 RT residues that contact the RNA template strand (positions −2 to +2, centered around the scissile phosphate) include the side chain of Lys390 of p51, and the side chains of Arg448, Asn474, Gln475, Gln500 and His539 of p66.

Sequence alignment of the RNase H domain of HIV‐1 RT and the RNase H domains of HIV‐2 RT, MuLV RT and E.coli RNase HI suggests a significant conservation of the residues of the RNase H primer grip in all these enzymes. Many of the residues of HIV‐1 RT RNase H that contact the RNA:DNA template‐primer have conserved functionally important counterparts in E.coli RNase HI. Specifically, mutations Thr43Cys, Asn44Ala, Asn45Ala, Gln72Ala, Tyr73Leu and Gly77Ala at residues of E.coli RNase HI, which are the functional equivalents of Thr473, Asn474, Gln475, Gln500, Tyr501 and Ile505, respectively, of HIV‐1 RT, resulted in decreased efficiency of RNA:DNA cleavage (as judged by changes in the efficiency ratio kcat/Km in the wild type compared with the mutant enzymes) by 140‐, 8‐, 10‐, 3‐, 3‐ and 5‐fold, respectively (Kanaya, 1997 and references therein). Similarly, mutations of the Gln72 and His124 of E.coli RNase HI, which are equivalent to Gln500 and His539 of HIV‐1 RT, which contact the RNA template at the RNase H active site of HIV‐1 RT, substantially decrease the catalytic efficiency of the enzymes (3‐ and 70‐fold for Gln72Ala and His124Ala, respectively). Finally, the Gln475Glu, His539Phe and His539Asp HIV‐1 RT mutants have decreased RNase H activity and defective PPT cleavage (Tisdale et al., 1991; Volkmann et al., 1993).

The RNase H primer grip together with minor groove width and trajectory determines retroviral RNase H cleavage specificity

It has been suggested that minor groove width determines susceptibility of nucleic acids to RNase H cleavage (Fedoroff et al., 1997; Han et al., 1997; Xiong and Sundaralingam, 1998). However, the ability of RNase H to cleave nucleic acids with diverse minor grooves (Ben‐Artzi et al., 1992; Blain and Goff, 1993; Smith and Roth, 1993; Hostomsky et al., 1994; Salazar et al., 1994; Götte et al., 1995; Gao et al., 1997; Lima and Crooke, 1997; Szyperski et al., 1999) suggests that additional factors are likely to determine the specificity of RNase H.

We have identified the RNase H primer grip as a structural element of HIV RT that is involved in the control of RNase H cleavage specificity (Figure 6). Since the RNase H primer grip makes numerous contacts with the DNA primer strand, appropriate access to the RNase H active site can only occur with substrates that have the correct minor groove width. The RNase H primer grip also places additional constraints on the structure of nucleic acids leading up to the RNase H active site: our model suggests that curvatures associated with particular RNA:DNA sequences could also affect cleavage specificity. This model allows for the possibility that some RNA:DNA substrates that have a minor groove width appropriate for cleavage may have an inappropriate trajectory and may not be able to interact properly with the RNase H active site.

Despite the fact that the T‐rich primer interacts strongly with the RNase H primer grip, the rA‐stretch of the PPT is a poor substrate for RNase H cleavage. There is an unusually narrow minor groove that might prevent the rA‐rich RNA template strand from reaching the catalytic residues of RNase H. Interestingly, the approximate distance between the active site residues of RNase H and what would normally be the scissile phosphate of the A‐tract (rA:dT) in our structure is ∼3 Å, similar to the reduction in minor groove width, presumably caused by the A‐tract (rA:dT) on the RNA. It is possible that RNase H of HIV‐1 RT has evolved to interact unproductively with A‐tracts to prevent cleavage within the PPT.

The trajectory model may also account for the inability of HIV‐1 RT to cleave within the (dC:rG)6 sequence at the end of PPT. Precise trajectory might also be a determining factor in certain rare cleavage events such as cutting of RNA:RNA or RNA:RNA‐DNA.

Effect of PPT structure on the cleavage specificity of RNase H at the U3/PPT junction; a biological role for the nucleic acid bend near the polymerase active site

The specificity of cleavage at the 3′ end of the PPT (after the G‐stretch) is critical since it helps to determine one end of the viral DNA. If the PPT/U3 junction is modeled at the RNase H active site using our structure, the A‐tract (rA:dT), including the unzipped region, would be in position to contact residues of the H and I helices of the p66 thumb. Biochemical studies have shown that mutation of several residues of this region of the p66 thumb dramatically affects cleavage at the PPT/U3 junction (Powell et al., 1997, 1999; Gao et al., 1998). Hence our structural and modeling data suggest that the p66 thumb may contribute to precise cleavage at the end of PPT by interacting with the A‐tract (rA:dT) region at the opposite end of the PPT. The stiffness of the A‐tract (rA:dT) sequences may cause pausing of polymerization at the point where the A‐tract (rA:dT) at the 5′ end of PPT must be bent to create the 40–45° bend, normally found 5–9 bp from the polymerase active site, in the vicinity of the p66 thumb. Pausing at this site could promote RNase H cleavage at the PPT/U3 junction, which is concomitantly aligned at the RNase H active site. Thus, the bend may have a biological role in ensuring precise cleavage at the PPT/U3 junction, which is required for retroviral replication.

The sequence features important for positioning RNase H for the cleavage reaction that generates the plus strand primer of HIV‐1 and MuLV have been studied extensively (Rattray and Champoux, 1989; Pullen et al., 1993; Powell and Levin, 1996). For HIV‐1, it has been reported that the −2G at the 3′ end of HIV‐1 PPT is crucial for specific cleavage at the PPT/U3 junction (between −1G and +1A in Figure 1). In our complex, a G residue is also positioned two registers 3′ of the scissile phosphate. This G makes a specific interaction with Gln475 (hydrogen bond between 2‐amino of G and Oϵ1 of Gln475) that would be lost if there were another base present at this position. The contact of Gln475 with the −2G base at the PPT 3′ end may also contribute to precise cleavage at the PPT/U3 junction.

The HIV PPT as a target for drug design

The unusual structural features of HIV‐1 PPT bound to HIV‐1 RT may provide attractive targets for drug design. Specific inhibitors designed and/or selected to recognize preferentially the structural characteristics of PPT could become anti‐AIDS drugs. Several studies have shown that the minor groove of DNA:DNA and RNA:DNA can be targeted by minor groove binding drugs. Furthermore, substantial progress has been made in making such inhibitors sequence specific (White et al., 1998). Binding of drugs that target the minor groove of DNA results in expansion of the minor groove and shortening of the major groove width; it is possible that such changes could render PPT susceptible to HIV‐1 RT cleavage. Alternatively, the PPT can be targeted to give a PPT–drug complex with poor ability to be used as a primer for second strand synthesis. Dickerson and colleagues have specifically suggested (Han et al., 1997) that inhibitors such as a bis‐linked distamycin–anthramycin target the HIV‐1 RT–RNA:DNA PPT; our results support that strategy. The present study has revealed a new potential target and highlights the importance of structure in drug development efforts.

Materials and methods

Crystallization and data collection

Wild‐type HIV‐1 RT was purified as described previously (Clark et al., 1995). Nucleic acids were purified and annealed as described before, using diethyl pyrocarbonate‐treated aqueous solutions (Sarafianos et al., 1999). The crystallization protocols for the RT–RNA:DNA–Fab complex were similar to those used to prepare the corresponding crystals of HIV‐1 RT complexed with DNA:DNA (Ding et al., 1998; Sarafianos et al., 1999). Specifically, purified RT was mixed with Fab at a 1:0.8 mass ratio and with template‐primer to final concentrations of 25 mg/ml protein and 0.2 mM nucleic acid. Hanging drops were prepared by mixing equal volumes of the complex and crystallization solutions [100 mM cacodylate pH 5.6, 29–31% saturated ammonium sulfate (SAS)] at 4°C. Before freezing, crystals were transferred into soaking solutions containing 36% SAS, 18% glucose (w/v) and 18% glycerol (v/v), reaching the final concentration in three steps of 20–30 min each (Ding et al., 1998). Soaking solutions contained 0.2 mM RNA:DNA. Several diffraction datasets were collected at the Brookhaven National Laboratory Synchrotron Light Source X25 beamline, and at the Cornell High‐Energy Synchrotron Source (CHESS) F1 beamline. The main dataset was collected from one crystal frozen at −165°C and recorded on image plates (Table III). Diffraction data were processed and scaled with DENZO and SCALEPACK, respectively (Otwinowski and Minor, 1996). The diffraction data statistics are summarized in Table III.

View this table:
Table 3. Summary of data collection and refinement statistics

Structure determination

The structure was solved by molecular replacement [X‐PLOR (Brunger, 1993)] using the wild‐type HIV RT–DNA:DNA–Fab structure (Ding et al., 1998) as a starting model. To avoid bias no DNA:DNA or RNA:DNA model was included in the initial stages of structure determination. In order to improve phasing quality at the early stages of structure refinement, we used the program RAVE to average the electron density maps from several frozen and unfrozen datasets at 3.5 Å resolution. Rounds of model building were guided by both normal Fourier maps and averaged maps. In the early steps of refinement simulated annealing omit maps and averaged electron density maps computed for the RT–RNA:DNA–Fab complex clearly indicated the position and conformation of nucleic acid (Figure 4). These maps showed that there was departure from canonical base pairing for several nucleotides of PPT. Even when the nucleic acid was included in the phasing and was restrained to a canonical Watson–Crick base‐paired form the density clearly showed the unusual structure. The parameters used for refinement of nucleic acid were from Parkinson et al. (1996). The aberrant region of nucleic acid (region IV) was refined without any base‐pairing or sugar‐puckering restraints. The structure was refined using the torsion‐restrained slow‐cooling protocol in program X‐PLOR (Rice and Brunger, 1994), followed by positional and individual isotropic thermal parameter refinement. Later stages of refinement incorporated resolution‐dependent normalization and bulk‐solvent correction of structure factors (Brunger, 1993). The final cycle of refinement using data between 8.0 and 3.0 Å resolution converged with a conventional crystallographic R factor of 27.4% and a free R factor of 31.6%. Coordinates and structure factors are available from the Protein Data Bank (entry 1HYS).


We thank Michael Gait and colleagues at the MRC Laboratory of Molecular Biology, Cambridge, for instruction in synthesizing RNA oligonucleotides, and the staff personnel at BNLS and CHESS and other members of E.A.'s group for help with data collection. S.G.S. was supported by an NIH‐NIAID NRSA fellowship (AI 09578). The research in E.A.'s laboratory has been supported by NIH MERIT award AI 27690. S.H.H.'s laboratory is sponsored in part by the National Cancer Institute, DHHS, under contract with ABL, and by NIGMS.