Results

R-factors and Luzzati Plots

The final R-factors and estimated coordinate errors for the models are given in Table~4 of the accompanying paper. The experimental crystal structure clearly provides the best model, but the differences between it and the homology models are not dramatic. All estimated errors are less than 0.32 A.

Correlation of errors with various criteria

Figure~2 presents plots of the criteria on a per residue basis for the FDIBX and FDQBX models. The top graph of each figure is the observed rms difference between the crystal structure and the model. This is taken as the measure of error for each residue. Inset are diagrams of the secondary structure and of the regions where FDA and FDB differ.

Table~1 lists the correlation coefficient between each criterion and the rms coordinate error per residue for FDIBX and FDQBX. Figure~3 presents the mean values of the correlation coefficients and their standard deviations calculated with the six models. Both Table~1 and Figure~3 present the values for the data set with all residues and for the restricted data set (only those residues that are the same in both FDA and FDB).

Table 1. Correlations of Deviations to Criteria

The correlation coefficients (x1000) are given for the criteria discussed in methods. The results are for the data plotted in Figure~2. For geom, rsr's, bf, surf, and rms, data were compared against the respective data for all atoms, mainchain (mc) atoms, and sidechain atoms. The remaining criteria are not so divided. FDIBX (I) and FDQBX (Q) data for all 228 residues is given in the upper portion of the table. The lower portion of the table is for the 124 residues (all), 175 residues (mc), and 120 residues (sc) where the FDB subunit superimposes within 0.46 A upon FDA.

    criteria    I.all   Q.all   I.mc   Q.mc   I.sc  Q.sc
 ------------    ----   ----   ----   ----   ----   ----
 geom             605    570    467    416    582    546
 rsr              667    570    645    615    544    530
 orsr             669    600    638    592    573    559
 rwin             446    464    536    539    455    459
 bf               692    692    596    673    616    665
 surf             431    437    181    298    411    405
 rms              580    516    649    473    504    392
 omega            -65     -3    -81    -92    -73     -9
 phi/psi          368    384    432    458    351    371
 chi-1,2          366    442     29    136    369    459
 db-C-alpha       376    336    410    420    363    311
 db-O             275    339    382    487    244    342
 3D               -19     55   -109    -23    -36     55
 geom             559    579    498    610    529    441
 rsr              646    583    683    698    518    553
 orsr             645    687    671    709    556    602
 rwin             279    295    413    417    273    280
 bf               673    686    565    681    562    635
 surf             227    338    289    412    236    266
 rms              644    446    682    356    612    251
 omega            -43    -13   -131    -86    -80    -22
 phi/psi          256    276    294    382    189    225
 chi-1,2          525    513    111    224    543    586
 db-C-alpha       183    271    398    372    139    184
 db-O             120    265    388    532     80    241
 3D                79    120     48     74     81    169

The temperature factor generally has the greatest correlation with model error. This is true for both test sets of data. The correlation with surface area is higher using all the data, as many of the differences between the data sets involve surface residues. The correlation between bf and surf for the sidechains of the various data sets averaged about 0.5. The bf criterion is chosen as a test for error.

Comparing the measures of real-space fit, those using OMITMAPS have a slightly, but not significantly, higher value than those based on 2Fo-Fc maps. (The OMITMAP procedure produced an average phase angle change of only about 12 degrees for the models.) The R-window method gives similar correlations considering all the data, but drops significantly when evaluated against the restricted data. The free R value evaluated in the same fashion (data not shown) gave a slightly worse correlation than the R-window method. (A random 10% of the reflections was used as the free R test set.) These sliding R factor methods are computationally expensive. The real-space fit residual based on the 2Fo-Fc maps, rsr, is adopted as the best criterion in this group, due to the computational simplicity of the process.

The geometric strain energy is the only other criterion with a correlation coefficient greater that 0.5. The rms shift shows the highest correlation with the iteratively refined FDI model. This would be the more common situation. The FDM and FDQ models had only been subjected to one cycle of SA refinement, so they have not had a chance to ``settle in.'' (The rms differences based on a subsequent SA run on each gave significantly higher correlations.) Both these criteria, geom and rms, are adopted for use.

Considering mainchain atoms, the phi/psi dihedral values, C-alpha fit to database, and C=O fit to database show correlations of about 0.5 with all the data. These values all drop (the first two significantly) when compared to the restricted data. This implies Ramachandran plot agreement alone may not be a sufficient criterion. Comparison of the carbonyl direction to the database appears to be a slightly, but not significantly, better measure of these three. However, this measure may give false negatives when glycines or prolines are involved. There is essentially no correlation with the sidechain dihedrals.

Considering sidechain atoms, there is a fair correlation (about 0.35) with the mainchain values cited above considering all the data. However, these correlations fall dramatically when using only the restricted set. The correlation with the chi1/chi2 dihedral values increases to 0.45 with the restricted data. As the dihedral criterion, dihe, works well for both mainchain and sidechain and involves only a table lookup, it is adopted for subsequent analysis.

A comparision of the tables reveals, somewhat suprisingly, that the peptide plane omega angle is anti-correlated with the error. This may be a consequence of our using special ``PROLSQ''-like potentials for the peptide planes. This results in substantially more planar peptides than the default X-PLOR parameters. The 3D folding test is not correlated, but this method is primarily for the identification of grossly misfit models. These criteria will not be considered further.

While the del-dihe method is a good way to monitor differences between structures (see accompanying paper), the rms measure is generally better as an error function. The dihe and db criteria correlations with del-dihe were all higher than their correlations with rms. The del-dihe error function is significantly less correlated with most of the evaluation criteria. This was especially true for the sidechain correlations. The rms deviation is thus used as the error measure for the remainder of the paper.

Correlation of the various criteria to each other

The following five criteria previously described were adopted on the basis of their correlation to error in the model: bf, rsr, geom, rms, and dihe. These may all be applied separately to the mainchain and the sidechain atoms of each residue. These criteria were correlated with one another (Table~2) to determine their independence.

Table 2. Average Correlation between Criteria

The averages over the six models of the correlation coefficients (x1000) of each final criterion with one another are shown. The upper half of the table is for all 228 residues. The lower half is for the restricted data, as in Table~1. The upper right triangle gives mainchain-mainchain values and the lower left triangle gives sidechain-sidechain values.


 MC\SC     bf    rsr   geom    rms   dihe
 ----------------------------------------
  bf      ---    329    361    383   280
  rsr     563    ---    261    151   139
  geom    412    432    ---    329   426
  rms     421    344    339    ---   297
  dihe    407    380    461    297   ---

 MC\SC     bf    rsr   geom    rms   dihe 
 ----------------------------------------
  bf      ---    196    203    317   167
  rsr     367    ---    169    147    55
  geom    334    335    ---    351   425
  rms     357    224    312    ---   278
  dihe    290    269    361    223   ---

Table~2 reveals positive correlation between all the criteria, but these correlations are much less than the correlations of the criteria with the error. The only value greater than 0.5 was for bf to rsr, computed over all data. The average criteria-criteria correlation is 0.28 for the restricted data. The average deviation-criteria correlation is 0.49 for the same data.

Combination of criteria

To further test the independence of these five criteria, linear models were constructed and subjected to singular value decompostion tests. The routine ``svdfit'' of Press et al (1992) was employed for multiple-regression analysis. The best linear models were obtained using the log of the rms deviation as the error function. Individual T-tests suggest a high probability that each individual criterion is required for modeling the error (the largest value was P = 0.0017 for sidechain rms). This was true with the analysis based on all six models, on only the three B-chain models, and on the restricted set of the six models.

The individual coefficients vary considerably, though they are all on the same order of magnitude. For example, the coefficient for rsr is about threefold that for rms with the restricted sidechain data, while they are about equal for the restricted mainchain data. We do not wish to suggest that the coefficients obtained from this particular study should be used --- only that these criteria are independent predictors of error.

We will attempt to combine them to give an overall score. Each correlation and model analysis thus far has been against the raw data. In order to put the disparate data (such as B-factor and R-factor) on the same scale, they are converted into standard deviation units relative to the mean.

Correlation of combined criteria

The rms difference between FDB and FDIBX (again, assumed to be the true error) is presented in Figure~4. The sum of the five criteria and each individual criterion, each in standard deviation units, are also shown. Data are presented separately for mainchain and sidechain atoms. The sum of the five independent criteria in standard deviation units was chosen as our final error criterion due to its consistency, as explained below.

The desired error detection method must identify ``incorrect'' residues and give no false negatives. An arbitrary testing cutoff must be selected for this criterion. Another arbitrary cutoff is used to select residues in error based on the rms deviation of the model from the crystal structure. Table~3 gives results for mainchain and sidechain atoms with a variety of error ( A) and criterion (sigma) cutoffs. The values are the averages over all 228 residues of the three models corresponding to FDB. The number of incorrect residues are noted, and the percent correctly identified by application of the criterion cutoff. The number of false positives count residues flagged as being in error, yet agreeing to the crystal structure within twice the Luzzati error.

Table 3. Identification of gross errors

The A error gives the rms deviation cutoff that defines a gross error. The results are averages of all residues in the three models corresponding to the FDB crystal structure. The sigma-cut gives the sum-of-criteria cutoff used to flag errors. The #bad counts residues deviating by more than the given error. The %hits are the bad residues correctly flagged. The #false are correct residues (within twice the Luzzati error) flagged as being in error.

atoms A-error sigma-cut #bad  %hits  #false
----    ------  ----     ---   ----   ----
 mc       1.0   1.00      27     89      3
 mc       1.0   0.67      27     92      7
 mc       1.5   1.00      21     97      3
 sc       1.0   1.00      89     42      0
 sc       1.0   0.67      89     53      1
 sc       1.5   1.00      74     48      0
 sc       1.5   0.67      74     60      1
 sc       2.0   0.67      53     69      1
 sc       2.5   0.67      39     82      1
 sc       3.0   0.67      32     88      1

These analyses were carried out against each individual criterion and various combinations. The results using the five combined criteria consistently located the greatest percent of residues in error, and was even more impressive in producing the smallest number of false positives. The method works very well for identifying problems in the mainchain. Nearly 10% of the residues in the molecular replacement models are grossly in error. Using 1.0 A as the cutoff for errors and 1.0 sigma for the cutoff for the criterion test, the method identifies nearly 90% of the problem residues. Only about one percent of the protein give a true false negative. An additional two percent of residues having errors greater than twice the Luzzati limit, but less than the cutoff, were also flagged. These are taken as true, but less severe, errors.

These same cutoffs applied to sidechain residues give a less impressive result. Nearly 40% of the residues are in error by the 1.0 A distance cutoff, but only 42% are identified as such. However, there were no true false negatives reported. Using 2.0 A as the error cutoff, nearly a quarter of the sidechains are grossly in error. With a 0.67 sigma criteria cutoff, the method identifies nearly 70% of the problem residues and only 1 residue as a false positive. An additional seven percent of residues with less severe errors are also flagged.