DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

Results

Ad Hoc Model Building

The results of the best homology scores, the percent exact sequence match, and a description of the protein are given in Table~1. The top seven unique structures (all members of the serine protease family) are shown sorted by the calculated homology score. Each structure is referenced by its PDB identifier: 3RP2 (Remington et al, 1988), 1HNE (Navia et al, 1989), 1EST (Sawyer et al, 1978), 1TON (Fujinaga and James, 1987), 1CHG (Freer et al, 1970), 2TRM (Sprang et al, 1987), 1TGB (Fehlhammer et al, 1977). The results of the alignment are shown in Figure~1.

Table 1. Sequence Homology with Factor

The proteins are sorted by their IDEAS homology score. References to the PDB files are given in the text.

 PDB   homology  % exact      protein       
 code    score    match      description  
 ----     ---     ----    ------------------------
 3RP2     305     33.9    rat mast cell protease 
 1HNE     283     33.5    human neutrophil elastase
 1EST     282     33.5    porcine pancreas elastase
 1TON     238     34.4    rat submaxilliary gland tonin
 1CHG     178     33.3    bovine chymotrypsinogen-A
 2TRM     173     33.8    rat trypsin mutant      
 1TGB     164     35.5    bovine pancreas trypsinogen

The sequences show a high degree of homology. The three-dimensional similarity is even more striking. The C-alpha tracing of 3RP2 served as the reference. The resulting superpositions are shown in Figure~2. The major differences in backbone structure occur in the loops connecting secondary structural elements. These loop regions are also where the ``insertions'' and ``deletions'' in the alignment occur.

The Factor D model was created using 3RP2 as the basic template. Each of the 228 residues were examined sequentially from the N- to C-terminus. The atomic models of 3RP2 and the two or three next best sequence matches for each stretch of approximately twenty residues were displayed. There was always at least one other protein with the same number of residues as Factor D in regions where insertions/deletions were made in 3RP2. The regions of Factor D where 3RP2 was not the primary template are given in Table~2. Approximately fifty residues were shifted to align the mainchain (including carbonyls) with the alternate protein. All mainchain conformations were set to allowed phi/psi values. All sidechain conformations were set to favored rotamers.

Table 2. Alternative Factor D model templates

The 3RP2 structure was employed for the remainder of the 228 residue sequence in constructing the manual model.

  range   protein 
 -------  -------
  20--26    1HNE 
  43--50    1HNE 
 155--170   1EST 
 171--178   1CHG 
 198--208   1HNE 

The potential energy of the model was calculated with X-PLOR. The geometry was nearly ideal. Two pairs of residues elicted warnings of atoms being less than 1.5\ A apart. These positions were re-examined with graphics and deemed inconsequential: one formed a hydrogen bond and the other a nonpolar interaction. Energy refinement by 250 cycles of conjugate gradient minimization with X-PLOR was performed. The rms changes from the model built with Atom to the energy minimized model were 0.60\ A for the 228 C-alpha atoms and 0.91\ A for all 2113 atoms. This produced the model fdm_6jun90.pdb, which is referred to FDM (Factor D from Manual modeling).

Automated Model Building

The PDB was not loaded on this trial system. The first step involving a complete search of the PDB was omitted as the method also uses F AST A for alignment. The seven previously identified proteins of Table~1 were imported into this modeling package's environment.

A multiple sequence alignment (Feng and Doolittle, 1987) aligned all sequences to that of Factor D. The resulting alignments were excellent, correlating well with secondary structure. Examination of the alignment scores and tutorial recommendations indicated that Factor D should be built from 1TON, 3RP2, 1HNE, and 1EST. A two residue deletion exists in Factor D between residues 174 and 175; otherwise, there was always an alignment match. Table~3 gives the residue ranges from those proteins used to create the homology model.

The coordinates of the known structures were copied onto the sequence of Factor D. Exact sequence matches were copied directly; otherwise, only atoms in common between the two residue types were used. The regularization routine added the missing atoms. Residues that were joined together from different protein fragments (which may introduce gaps) were examined and deemed to have reasonable geometries. The two residue deletion mentioned above was in a turn extending into the solvent far from the active site. The regularization routine annealed these gaps.

Table 3. Protein Design Factor D model templates.

The automated model was built piecewise from the fragments shown. Note that Factor D has a two residue deletion at position 174 compared to the 1TON sequence.

      range    protein
     ------    -------
      1--24     1HNE
     25--42     1TON
     43--77     1HNE
     78--132    3RP2
    133--167    1EST
    168--174    1TON
    175--209    1TON
    210--228    3RP2

The side chain spin routine checked all side chain conformations for bad contacts. Energy minimization of the entire structure produced the model fdq_24oct91.pdb. This model is referred to as FDQ (Factor D from Quanta).

Crystallographic Results

Molecular Replacement Solution

Factor D crystallizes in the triclinic space group P1 with two independent molecules per asymmetric unit. The original homology model, FDM, was used to solve the crystal structure of Factor D by molecular replacement (Rossmann, 1972). The various functions gave unique peaks considerably above background (Figure 3).

Initial Structure Refinement

Rigid body refinement of the two independent monomers produced little change. Examination on graphics revealed no interpenetration due to crystal packing, although some residues were too close. Refinement continued with no manual adjustment of the coordinates or non-crystallographic symmetry restraints. All refinements included reflections greater than 2 sigma from 7.5\ A to the high resolution limit and an overall temperature factor of 15\ A**2. The recommended X-PLOR slow-cooling simulated annealing (S A) protocol was followed. The R-factor decreased from 0.455 to 0.218 at 3.0\ A resolution (5657 reflections) producing the initial molecular replacement model, fdx_29jun90.

Additional rounds of S A refinement were undertaken as improved native data sets became available. The 2.5\ A data set (11220 reflections) gave a model (fdx_24oct90) with an R-factor of 0.242. These coordinates established the origin and proved very useful in locating heavy atom derivative positions. The 2.4\ A data (13032 relections) yielded a model (fdx_19jan91) with an R-factor of 0.249. This was the starting point for interpreting the first MIR maps.

Initial Error Analysis

Comparison of the intermediate fdx_24oct90 and the fdx_19jan91 coordinates indicated 36 mainchain and 39 sidechains of the 456 independent residues had moved significantly (>1 A for mainchain, > 2 A for sidechain). A 2Fobs-Fcalc map was created with calculated phases from the latter model. The real-space fit residual (Jones, 1991) was computed for the mainchain and sidechain of each residue. Geometric analysis revealed residues with unfavorable dihedral angles.

Strong correlations existed between the residues flagged with bad dihedrals, poor real-space fit, and significant shifts during refinement. The central core of the molecule had few problems; essentially all of the troublesome regions were in peripheral loops. Unfortunately, several of these loops are involved in the substrate binding critical to the catalytic action of Factor D.

Initial MIR Map

The initial MIR map was computed at 3.2\ A using three heavy atom derivatives (PtCl4, PCMBS, K2HgI4) and symmetry averaged (SVLN, 30jan91). Preliminary inspection indicated that approximately 80\% of the model coordinates fit the observed density very well. An additional 10\% of the residues required only minor adjustment. The remaining 10\% of the residues had problems; about half required major adjustment (a few angstroms shift) and the rest were uninterpretable in this map. These ``problem'' residues corresponded well to those identified by the initial error analysis. These preliminary results were published in a conference proceedings (Carson et al, 1991).

Crystallographic Refinement

The details of the crystallographic refinement of Factor D to 2.0\ A are given by Narayana et al (1994). We stress that phases from the homology model were never used (except in aiding the location of heavy atom positions). We believe the refined model is of high quality. These coordinates are thus assumed to be correct and therefore provide the basis for the comparisons that follow.

Model Nomenclature

The final refined crystal structure models of the two independent Factor D monomers are referred to as ``FD A'' and ``FDB''. The initial homology models are referred to as ``FDM'' and ``FDQ'' for the manually (M) constructed and the Quanta (Q) generated coordinates, respectively.

The molecular replacement model (fdx_19jan91) was subjected to S A and temperature factor refinement against the final 2.0\ A data set of 23249 reflections. Thus, this model has now undergone five iterations (I) of X-PLOR refinement after its initial placement in the unit cell. The individual subunits are denoted as FDI AX and FDIBX.

The original FDM and FDQ models were placed in the unit cell by a least-squares fit with the corresponding C-alpha atoms of FD A and FDB. These models were then subjected the S A protocol of X-PLOR (X) described previously. Additionally, individual atomic temperature factors were refined. This produced the models FDM AX, FDMBX, FDQ AX, and FDQBX for the two monomers refined from each starting homology model. (This assumes that the molecular replacement solutions would be found exactly. The ease of solution with the initial FDM model using incomplete data makes this plausible.) Even though both the FDMX and FDIX models used FDM as the starting point, the rms differences between the closest pair, FDIBX/FDMBX, is 0.8\ A for mainchain and 1.7\ A for all atoms. Thus they are considered distinct models. None of the models refined by X-PLOR had any manual intervention. A flow chart of the models' creation and nomenclature is given by Figure~4.

R-factors and Estimated Error

The final R-factors, deviations from ideal values, and estimated coordinate errors for the models are given in Table~4. The models are sorted by R-factor. The experimental crystal structure clearly provides the best model, but the differences between it and the homology models are not dramatic. The estimated errors are all less than a third of an angstrom.

Table 4. R-factors of Final and Molecular Replacement Models

All molecular replacement models have undergone X-PLOR SA refinement as described in the text. R-factors are based on 7.5 to 2.0\A native data with a 2 sigma cutoff. The bonds (\A) and angles (degrees) columns are the rms deviations from ideal geometric values. The errors (\A) are estimated from Luzzati plots (not shown).

 Model   R-factor   Bonds   Angles   Error   Description
 -----   --------   -----   ------   -----   -----------------
  FD      0.188    0.010    1.65    0.23     Final crystal structure
  FD'     0.219    0.010    1.65    0.27     Final, 69 waters omitted  
  FDI     0.246    0.017    2.23    0.30     Iterated manual model
  FDQ     0.255    0.017    2.29    0.32     Quanta model
  FDM     0.259    0.018    2.28    0.32     Manual model

Comparison of the two independent crystallographic monomers

The two subunits in the crystal were refined independently, with some interesting conformational differences noted between FD A and FDB (Narayana et al, 1994). Analysis of the crystal structure reveals that the FDB monomer more closely fits the experimental data, in particular the range of residues from 41 to 48. These residues are disordered in the final FD A structure. The refined temperature factors and real-space fit per residue are shown in Figure~5.

The mainchain dihedral differences of 202 of the 228 residues agree within 30 degrees. Only 16 residues have differences over 60 degrees. There are 159 sidechain residues with dihedral differences of less than 30 degrees, and 57 residues with differences greater than 60 degrees(implying a different rotamer). The dihedral differences between FD A and FDB and the rms differences between the superimposed monomers are plotted in Figure~5.

A C-alpha tracing of the superposition is also shown.

Slightly more than half of the 228 residues show mainchain deviations less than the 0.23\ A error suggested by Luzzati plot analysis. Slightly more than three quarters have deviations no greater that twice that amount (0.46\ A). The corresponding results for sidechain residues are 29\% and 53\%, respectively. There are 27 mainchain and 76 sidechain residues with rms differences over 1.0\ A. From visual inspection, there are significant mainchain conformational differences in the ranges 41-48, 77-89, and 198-201. Twenty of the sidechains that differ significantly have similar mainchain conformations. These are primarily surface Arg, Lys, and Glu influenced by crystal packing.

Comparison of the two different homology models

The FDM and FDQ models were created independently, but are based on the same set of protein templates. However, comparison of Table~2 and Table~3 indicates that less than half of the residues (43-50 1HNE, 78-132 3RP2, 155-167 1EST, 210-228 3RP2) had an identical protein as the primary template. Figure~6 plots the dihedral differences between FDM and FDQ and the rms differences between FDM and the superimposed FDQ. A C-alpha tracing of the superpositions on the crystal structure FDB is also shown. There are only 71 mainchain and 28 sidechain residues with rms deviations less than the 0.46\ A cutoff. There are 61 mainchain and 125 sidechain residues with deviations over 1.0\ A. The overall average rms difference is 1.2\ A for the mainchain atoms and 2.6\ A A\ for all atoms. Comparision of dihedral angle differences using a 30 degree cutoff shows similar conformations for 145 mainchain and 135 sidechain residues. A 60 degree cutoff yields significant differences for 38 mainchain and 77 sidechain conformations.

Comparison of the crystal structure to models

Comparisons with the appropriate FD A or FDB are presented in the upper half of Table~5 and in Figure~7. There is little difference between the degree of fit of the FDM and FDQ models to the crystal structure. Both fit slightly better to FDB than to FD A. Less than 20\% of the mainchain and less than 10\% of the sidechains may be considered correct based on the difference in coordinates. However, examination of the differences in dihedral angles indicates that approximately 60\% of the mainchain and 50\% of the sidechain conformations are nearly correct. The differences are not uniformly distributed; about half of the large differences occur in the regions where FD A differs from FDB.

Table 5. Deviations of models from crystal structure

The mainchain(mc) and sidechain(sc) rms deviations over the entire structure (Delta), and the median value (med.) of the deviations over all 228 residues are given in \A. The mc-ok and sc-ok count residues having deviations within twice the Luzzati limit. The phi\psi-ok and chis-ok count dihedral differences of less than 30 degrees.


  model  mc-Delta sc-Delta mc-med. sc-med. mc-ok sc-ok phi\psi-ok chis-ok
  -----   -----   ------   ----   -----   ----    ---    ---     ---
  FDMA    1.73    3.55    0.75   1.51     37      15     131     114
  FDMB    1.56    3.22    0.74   1.45     41      14     136     116
  FDQA    1.60    3.40    0.80   1.76     52      25     142     106
  FDQB    1.44    3.02    0.78   1.57     54      21     153     110
  FDMAX   1.44    3.12    0.30   0.69    165      85     166     126
  FDMBX   1.17    2.81    0.28   0.53    184     111     187     132
  FDQAX   1.27    2.88    0.26   0.65    174      96     177     130
  FDQBX   1.27    2.84    0.26   0.66    177      96     175     130
  FDIAX   1.22    2.70    0.44   0.59    124      74     179     136
  FDIBX   1.14    2.68    0.43   0.62    139      68     189     132

Comparison of crystal structure to refined models

Comparison of the homology models after S A refinement is given in the lower portion of Table~5 and in Figure~7. The X-PLOR refinement moved the coordinates significantly closer to the crystal structure for almost every residue. These results are summarized in Table~6.

Table 6. X-PLOR shifts of models

The number better counts residues that moved closer to the refined crystal structure after X-PLOR SA refinement. The maximum shifts for any residue are given in \A. The number worse counts residues that moved more that twice the Luzzati limit away from the crystal structure. The number bad counts residues with rms deviations greater than 1.0\A.

 model   #-mc   #-sc    max-mc   max-sc  #-mc   #-sc  #-mc  #-sc
 protein better better  A shift  A shift worse  worse  bad  bad 
 ------    ---    ---    ----   ---      ----     --- ---  ---
  FDMAX    201    186    3.06   5.41        9      20  42  101
  FDMBX    208    187    3.09   4.59        2      17  36   98
  FDQAX    210    191    2.83   6.33        4      13  33   85
  FDQBX    201    183    1.91   4.01       10      28  25   86
  FDIAX    187    184    3.32   4.76        4       8  34   96
  FDIBX    184    171    2.78   4.43        1       2  23   85

Comparison Summary

Both model structures fail to accurately reproduce the crystal structure, especially around the active site and substrate binding loops. Only about 10\% of the molecular replacement model mainchain was grossly in error after the S A refinement. Almost all of these errors were located in the active site and substrate binding regions of the enzyme. These are precisely the residues that must be known accurately to understand the structure/function relationship for this enzyme. However, these loops are likely rather flexible. The temperature factors of the refined crystal structures of Factor D and the serine proteases used for homology modeling are generally higher in these loops. The results are summarized in Figure~8.