DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
The results of the best homology scores, the percent exact sequence match, and a description of the protein are given in Table~1. The top seven unique structures (all members of the serine protease family) are shown sorted by the calculated homology score. Each structure is referenced by its PDB identifier: 3RP2 (Remington et al, 1988), 1HNE (Navia et al, 1989), 1EST (Sawyer et al, 1978), 1TON (Fujinaga and James, 1987), 1CHG (Freer et al, 1970), 2TRM (Sprang et al, 1987), 1TGB (Fehlhammer et al, 1977). The results of the alignment are shown in Figure~1.
Table 1. Sequence Homology with Factor
The proteins are sorted by their IDEAS homology score. References to the PDB files are given in the text.
PDB homology % exact protein code score match description ---- --- ---- ------------------------ 3RP2 305 33.9 rat mast cell protease 1HNE 283 33.5 human neutrophil elastase 1EST 282 33.5 porcine pancreas elastase 1TON 238 34.4 rat submaxilliary gland tonin 1CHG 178 33.3 bovine chymotrypsinogen-A 2TRM 173 33.8 rat trypsin mutant 1TGB 164 35.5 bovine pancreas trypsinogen
The sequences show a high degree of homology. The three-dimensional similarity is even more striking. The C-alpha tracing of 3RP2 served as the reference. The resulting superpositions are shown in Figure~2. The major differences in backbone structure occur in the loops connecting secondary structural elements. These loop regions are also where the ``insertions'' and ``deletions'' in the alignment occur.
The Factor D model was created using 3RP2 as the basic template. Each of the 228 residues were examined sequentially from the N- to C-terminus. The atomic models of 3RP2 and the two or three next best sequence matches for each stretch of approximately twenty residues were displayed. There was always at least one other protein with the same number of residues as Factor D in regions where insertions/deletions were made in 3RP2. The regions of Factor D where 3RP2 was not the primary template are given in Table~2. Approximately fifty residues were shifted to align the mainchain (including carbonyls) with the alternate protein. All mainchain conformations were set to allowed phi/psi values. All sidechain conformations were set to favored rotamers.
Table 2. Alternative Factor D model templates
The 3RP2 structure was employed for the remainder of the 228 residue sequence in constructing the manual model.
range protein ------- ------- 20--26 1HNE 43--50 1HNE 155--170 1EST 171--178 1CHG 198--208 1HNE
The potential energy of the model was calculated with X-PLOR. The geometry was nearly ideal. Two pairs of residues elicted warnings of atoms being less than 1.5\ A apart. These positions were re-examined with graphics and deemed inconsequential: one formed a hydrogen bond and the other a nonpolar interaction. Energy refinement by 250 cycles of conjugate gradient minimization with X-PLOR was performed. The rms changes from the model built with Atom to the energy minimized model were 0.60\ A for the 228 C-alpha atoms and 0.91\ A for all 2113 atoms. This produced the model fdm_6jun90.pdb, which is referred to FDM (Factor D from Manual modeling).
The PDB was not loaded on this trial system. The first step involving a complete search of the PDB was omitted as the method also uses F AST A for alignment. The seven previously identified proteins of Table~1 were imported into this modeling package's environment.
A multiple sequence alignment (Feng and Doolittle, 1987) aligned all sequences to that of Factor D. The resulting alignments were excellent, correlating well with secondary structure. Examination of the alignment scores and tutorial recommendations indicated that Factor D should be built from 1TON, 3RP2, 1HNE, and 1EST. A two residue deletion exists in Factor D between residues 174 and 175; otherwise, there was always an alignment match. Table~3 gives the residue ranges from those proteins used to create the homology model.
The coordinates of the known structures were copied onto the sequence of Factor D. Exact sequence matches were copied directly; otherwise, only atoms in common between the two residue types were used. The regularization routine added the missing atoms. Residues that were joined together from different protein fragments (which may introduce gaps) were examined and deemed to have reasonable geometries. The two residue deletion mentioned above was in a turn extending into the solvent far from the active site. The regularization routine annealed these gaps.
Table 3. Protein Design Factor D model templates.
The automated model was built piecewise from the fragments shown. Note that Factor D has a two residue deletion at position 174 compared to the 1TON sequence.
range protein
------ -------
1--24 1HNE
25--42 1TON
43--77 1HNE
78--132 3RP2
133--167 1EST
168--174 1TON
175--209 1TON
210--228 3RP2
The side chain spin routine checked all side chain conformations for bad contacts. Energy minimization of the entire structure produced the model fdq_24oct91.pdb. This model is referred to as FDQ (Factor D from Quanta).
Factor D crystallizes in the triclinic space group P1 with two independent molecules per asymmetric unit. The original homology model, FDM, was used to solve the crystal structure of Factor D by molecular replacement (Rossmann, 1972). The various functions gave unique peaks considerably above background (Figure 3).
Rigid body refinement of the two independent monomers produced little change. Examination on graphics revealed no interpenetration due to crystal packing, although some residues were too close. Refinement continued with no manual adjustment of the coordinates or non-crystallographic symmetry restraints. All refinements included reflections greater than 2 sigma from 7.5\ A to the high resolution limit and an overall temperature factor of 15\ A**2. The recommended X-PLOR slow-cooling simulated annealing (S A) protocol was followed. The R-factor decreased from 0.455 to 0.218 at 3.0\ A resolution (5657 reflections) producing the initial molecular replacement model, fdx_29jun90.
Additional rounds of S A refinement were undertaken as improved native data sets became available. The 2.5\ A data set (11220 reflections) gave a model (fdx_24oct90) with an R-factor of 0.242. These coordinates established the origin and proved very useful in locating heavy atom derivative positions. The 2.4\ A data (13032 relections) yielded a model (fdx_19jan91) with an R-factor of 0.249. This was the starting point for interpreting the first MIR maps.
Comparison of the intermediate fdx_24oct90 and the fdx_19jan91 coordinates indicated 36 mainchain and 39 sidechains of the 456 independent residues had moved significantly (>1 A for mainchain, > 2 A for sidechain). A 2Fobs-Fcalc map was created with calculated phases from the latter model. The real-space fit residual (Jones, 1991) was computed for the mainchain and sidechain of each residue. Geometric analysis revealed residues with unfavorable dihedral angles.
Strong correlations existed between the residues flagged with bad dihedrals, poor real-space fit, and significant shifts during refinement. The central core of the molecule had few problems; essentially all of the troublesome regions were in peripheral loops. Unfortunately, several of these loops are involved in the substrate binding critical to the catalytic action of Factor D.
The initial MIR map was computed at 3.2\ A using three heavy atom derivatives (PtCl4, PCMBS, K2HgI4) and symmetry averaged (SVLN, 30jan91). Preliminary inspection indicated that approximately 80\% of the model coordinates fit the observed density very well. An additional 10\% of the residues required only minor adjustment. The remaining 10\% of the residues had problems; about half required major adjustment (a few angstroms shift) and the rest were uninterpretable in this map. These ``problem'' residues corresponded well to those identified by the initial error analysis. These preliminary results were published in a conference proceedings (Carson et al, 1991).
The details of the crystallographic refinement of Factor D to 2.0\ A are given by Narayana et al (1994). We stress that phases from the homology model were never used (except in aiding the location of heavy atom positions). We believe the refined model is of high quality. These coordinates are thus assumed to be correct and therefore provide the basis for the comparisons that follow.
The final refined crystal structure models of the two independent Factor D monomers are referred to as ``FD A'' and ``FDB''. The initial homology models are referred to as ``FDM'' and ``FDQ'' for the manually (M) constructed and the Quanta (Q) generated coordinates, respectively.
The molecular replacement model (fdx_19jan91) was subjected to S A and temperature factor refinement against the final 2.0\ A data set of 23249 reflections. Thus, this model has now undergone five iterations (I) of X-PLOR refinement after its initial placement in the unit cell. The individual subunits are denoted as FDI AX and FDIBX.
The original FDM and FDQ models were placed in the unit cell by a least-squares fit with the corresponding C-alpha atoms of FD A and FDB. These models were then subjected the S A protocol of X-PLOR (X) described previously. Additionally, individual atomic temperature factors were refined. This produced the models FDM AX, FDMBX, FDQ AX, and FDQBX for the two monomers refined from each starting homology model. (This assumes that the molecular replacement solutions would be found exactly. The ease of solution with the initial FDM model using incomplete data makes this plausible.) Even though both the FDMX and FDIX models used FDM as the starting point, the rms differences between the closest pair, FDIBX/FDMBX, is 0.8\ A for mainchain and 1.7\ A for all atoms. Thus they are considered distinct models. None of the models refined by X-PLOR had any manual intervention. A flow chart of the models' creation and nomenclature is given by Figure~4.
The final R-factors, deviations from ideal values, and estimated coordinate errors for the models are given in Table~4. The models are sorted by R-factor. The experimental crystal structure clearly provides the best model, but the differences between it and the homology models are not dramatic. The estimated errors are all less than a third of an angstrom.
Table 4. R-factors of Final and Molecular Replacement Models
All molecular replacement models have undergone X-PLOR SA refinement as described in the text. R-factors are based on 7.5 to 2.0\A native data with a 2 sigma cutoff. The bonds (\A) and angles (degrees) columns are the rms deviations from ideal geometric values. The errors (\A) are estimated from Luzzati plots (not shown).
Model R-factor Bonds Angles Error Description ----- -------- ----- ------ ----- ----------------- FD 0.188 0.010 1.65 0.23 Final crystal structure FD' 0.219 0.010 1.65 0.27 Final, 69 waters omitted FDI 0.246 0.017 2.23 0.30 Iterated manual model FDQ 0.255 0.017 2.29 0.32 Quanta model FDM 0.259 0.018 2.28 0.32 Manual model
The two subunits in the crystal were refined independently, with some interesting conformational differences noted between FD A and FDB (Narayana et al, 1994). Analysis of the crystal structure reveals that the FDB monomer more closely fits the experimental data, in particular the range of residues from 41 to 48. These residues are disordered in the final FD A structure. The refined temperature factors and real-space fit per residue are shown in Figure~5.
The mainchain dihedral differences of 202 of the 228 residues agree within 30 degrees. Only 16 residues have differences over 60 degrees. There are 159 sidechain residues with dihedral differences of less than 30 degrees, and 57 residues with differences greater than 60 degrees(implying a different rotamer). The dihedral differences between FD A and FDB and the rms differences between the superimposed monomers are plotted in Figure~5.
A C-alpha tracing of the superposition is also shown.
Slightly more than half of the 228 residues show mainchain deviations less than the 0.23\ A error suggested by Luzzati plot analysis. Slightly more than three quarters have deviations no greater that twice that amount (0.46\ A). The corresponding results for sidechain residues are 29\% and 53\%, respectively. There are 27 mainchain and 76 sidechain residues with rms differences over 1.0\ A. From visual inspection, there are significant mainchain conformational differences in the ranges 41-48, 77-89, and 198-201. Twenty of the sidechains that differ significantly have similar mainchain conformations. These are primarily surface Arg, Lys, and Glu influenced by crystal packing.
The FDM and FDQ models were created independently, but are based on the same set of protein templates. However, comparison of Table~2 and Table~3 indicates that less than half of the residues (43-50 1HNE, 78-132 3RP2, 155-167 1EST, 210-228 3RP2) had an identical protein as the primary template. Figure~6 plots the dihedral differences between FDM and FDQ and the rms differences between FDM and the superimposed FDQ. A C-alpha tracing of the superpositions on the crystal structure FDB is also shown. There are only 71 mainchain and 28 sidechain residues with rms deviations less than the 0.46\ A cutoff. There are 61 mainchain and 125 sidechain residues with deviations over 1.0\ A. The overall average rms difference is 1.2\ A for the mainchain atoms and 2.6\ A A\ for all atoms. Comparision of dihedral angle differences using a 30 degree cutoff shows similar conformations for 145 mainchain and 135 sidechain residues. A 60 degree cutoff yields significant differences for 38 mainchain and 77 sidechain conformations.
Comparisons with the appropriate FD A or FDB are presented in the upper half of Table~5 and in Figure~7. There is little difference between the degree of fit of the FDM and FDQ models to the crystal structure. Both fit slightly better to FDB than to FD A. Less than 20\% of the mainchain and less than 10\% of the sidechains may be considered correct based on the difference in coordinates. However, examination of the differences in dihedral angles indicates that approximately 60\% of the mainchain and 50\% of the sidechain conformations are nearly correct. The differences are not uniformly distributed; about half of the large differences occur in the regions where FD A differs from FDB.
Table 5. Deviations of models from crystal structure
The mainchain(mc) and sidechain(sc) rms deviations over the entire structure (Delta), and the median value (med.) of the deviations over all 228 residues are given in \A. The mc-ok and sc-ok count residues having deviations within twice the Luzzati limit. The phi\psi-ok and chis-ok count dihedral differences of less than 30 degrees.
model mc-Delta sc-Delta mc-med. sc-med. mc-ok sc-ok phi\psi-ok chis-ok ----- ----- ------ ---- ----- ---- --- --- --- FDMA 1.73 3.55 0.75 1.51 37 15 131 114 FDMB 1.56 3.22 0.74 1.45 41 14 136 116 FDQA 1.60 3.40 0.80 1.76 52 25 142 106 FDQB 1.44 3.02 0.78 1.57 54 21 153 110 FDMAX 1.44 3.12 0.30 0.69 165 85 166 126 FDMBX 1.17 2.81 0.28 0.53 184 111 187 132 FDQAX 1.27 2.88 0.26 0.65 174 96 177 130 FDQBX 1.27 2.84 0.26 0.66 177 96 175 130 FDIAX 1.22 2.70 0.44 0.59 124 74 179 136 FDIBX 1.14 2.68 0.43 0.62 139 68 189 132
Comparison of the homology models after S A refinement is given in the lower portion of Table~5 and in Figure~7. The X-PLOR refinement moved the coordinates significantly closer to the crystal structure for almost every residue. These results are summarized in Table~6.
Table 6. X-PLOR shifts of models
The number better counts residues that moved closer to the refined crystal structure after X-PLOR SA refinement. The maximum shifts for any residue are given in \A. The number worse counts residues that moved more that twice the Luzzati limit away from the crystal structure. The number bad counts residues with rms deviations greater than 1.0\A.
model #-mc #-sc max-mc max-sc #-mc #-sc #-mc #-sc protein better better A shift A shift worse worse bad bad ------ --- --- ---- --- ---- --- --- --- FDMAX 201 186 3.06 5.41 9 20 42 101 FDMBX 208 187 3.09 4.59 2 17 36 98 FDQAX 210 191 2.83 6.33 4 13 33 85 FDQBX 201 183 1.91 4.01 10 28 25 86 FDIAX 187 184 3.32 4.76 4 8 34 96 FDIBX 184 171 2.78 4.43 1 2 23 85
Both model structures fail to accurately reproduce the crystal structure, especially around the active site and substrate binding loops. Only about 10\% of the molecular replacement model mainchain was grossly in error after the S A refinement. Almost all of these errors were located in the active site and substrate binding regions of the enzyme. These are precisely the residues that must be known accurately to understand the structure/function relationship for this enzyme. However, these loops are likely rather flexible. The temperature factors of the refined crystal structures of Factor D and the serine proteases used for homology modeling are generally higher in these loops. The results are summarized in Figure~8.
