The final R-factors and estimated coordinate errors for the models are given in Table~4 of the accompanying paper. The experimental crystal structure clearly provides the best model, but the differences between it and the homology models are not dramatic. All estimated errors are less than 0.32 A.
Figure~2 presents plots of the criteria on a per residue basis for the FDIBX and FDQBX models. The top graph of each figure is the observed rms difference between the crystal structure and the model. This is taken as the measure of error for each residue. Inset are diagrams of the secondary structure and of the regions where FDA and FDB differ.
Table~1 lists the correlation coefficient between each criterion and the rms coordinate error per residue for FDIBX and FDQBX. Figure~3 presents the mean values of the correlation coefficients and their standard deviations calculated with the six models. Both Table~1 and Figure~3 present the values for the data set with all residues and for the restricted data set (only those residues that are the same in both FDA and FDB).
Table 1. Correlations of Deviations to Criteria
The correlation coefficients (x1000) are given for the criteria discussed in methods. The results are for the data plotted in Figure~2. For geom, rsr's, bf, surf, and rms, data were compared against the respective data for all atoms, mainchain (mc) atoms, and sidechain atoms. The remaining criteria are not so divided. FDIBX (I) and FDQBX (Q) data for all 228 residues is given in the upper portion of the table. The lower portion of the table is for the 124 residues (all), 175 residues (mc), and 120 residues (sc) where the FDB subunit superimposes within 0.46 A upon FDA.
criteria I.all Q.all I.mc Q.mc I.sc Q.sc
------------ ---- ---- ---- ---- ---- ----
geom 605 570 467 416 582 546
rsr 667 570 645 615 544 530
orsr 669 600 638 592 573 559
rwin 446 464 536 539 455 459
bf 692 692 596 673 616 665
surf 431 437 181 298 411 405
rms 580 516 649 473 504 392
omega -65 -3 -81 -92 -73 -9
phi/psi 368 384 432 458 351 371
chi-1,2 366 442 29 136 369 459
db-C-alpha 376 336 410 420 363 311
db-O 275 339 382 487 244 342
3D -19 55 -109 -23 -36 55
geom 559 579 498 610 529 441
rsr 646 583 683 698 518 553
orsr 645 687 671 709 556 602
rwin 279 295 413 417 273 280
bf 673 686 565 681 562 635
surf 227 338 289 412 236 266
rms 644 446 682 356 612 251
omega -43 -13 -131 -86 -80 -22
phi/psi 256 276 294 382 189 225
chi-1,2 525 513 111 224 543 586
db-C-alpha 183 271 398 372 139 184
db-O 120 265 388 532 80 241
3D 79 120 48 74 81 169
The temperature factor generally has the greatest correlation with model error. This is true for both test sets of data. The correlation with surface area is higher using all the data, as many of the differences between the data sets involve surface residues. The correlation between bf and surf for the sidechains of the various data sets averaged about 0.5. The bf criterion is chosen as a test for error.
Comparing the measures of real-space fit, those using OMITMAPS have a slightly, but not significantly, higher value than those based on 2Fo-Fc maps. (The OMITMAP procedure produced an average phase angle change of only about 12 degrees for the models.) The R-window method gives similar correlations considering all the data, but drops significantly when evaluated against the restricted data. The free R value evaluated in the same fashion (data not shown) gave a slightly worse correlation than the R-window method. (A random 10% of the reflections was used as the free R test set.) These sliding R factor methods are computationally expensive. The real-space fit residual based on the 2Fo-Fc maps, rsr, is adopted as the best criterion in this group, due to the computational simplicity of the process.
The geometric strain energy is the only other criterion with a correlation coefficient greater that 0.5. The rms shift shows the highest correlation with the iteratively refined FDI model. This would be the more common situation. The FDM and FDQ models had only been subjected to one cycle of SA refinement, so they have not had a chance to ``settle in.'' (The rms differences based on a subsequent SA run on each gave significantly higher correlations.) Both these criteria, geom and rms, are adopted for use.
Considering mainchain atoms, the phi/psi dihedral values, C-alpha fit to database, and C=O fit to database show correlations of about 0.5 with all the data. These values all drop (the first two significantly) when compared to the restricted data. This implies Ramachandran plot agreement alone may not be a sufficient criterion. Comparison of the carbonyl direction to the database appears to be a slightly, but not significantly, better measure of these three. However, this measure may give false negatives when glycines or prolines are involved. There is essentially no correlation with the sidechain dihedrals.
Considering sidechain atoms, there is a fair correlation (about 0.35) with the mainchain values cited above considering all the data. However, these correlations fall dramatically when using only the restricted set. The correlation with the chi1/chi2 dihedral values increases to 0.45 with the restricted data. As the dihedral criterion, dihe, works well for both mainchain and sidechain and involves only a table lookup, it is adopted for subsequent analysis.
A comparision of the tables reveals, somewhat suprisingly, that the peptide plane omega angle is anti-correlated with the error. This may be a consequence of our using special ``PROLSQ''-like potentials for the peptide planes. This results in substantially more planar peptides than the default X-PLOR parameters. The 3D folding test is not correlated, but this method is primarily for the identification of grossly misfit models. These criteria will not be considered further.
While the del-dihe method is a good way to monitor differences between structures (see accompanying paper), the rms measure is generally better as an error function. The dihe and db criteria correlations with del-dihe were all higher than their correlations with rms. The del-dihe error function is significantly less correlated with most of the evaluation criteria. This was especially true for the sidechain correlations. The rms deviation is thus used as the error measure for the remainder of the paper.
The following five criteria previously described were adopted on the basis of their correlation to error in the model: bf, rsr, geom, rms, and dihe. These may all be applied separately to the mainchain and the sidechain atoms of each residue. These criteria were correlated with one another (Table~2) to determine their independence.
Table 2. Average Correlation between Criteria
The averages over the six models of the correlation coefficients (x1000) of each final criterion with one another are shown. The upper half of the table is for all 228 residues. The lower half is for the restricted data, as in Table~1. The upper right triangle gives mainchain-mainchain values and the lower left triangle gives sidechain-sidechain values.
MC\SC bf rsr geom rms dihe ---------------------------------------- bf --- 329 361 383 280 rsr 563 --- 261 151 139 geom 412 432 --- 329 426 rms 421 344 339 --- 297 dihe 407 380 461 297 --- MC\SC bf rsr geom rms dihe ---------------------------------------- bf --- 196 203 317 167 rsr 367 --- 169 147 55 geom 334 335 --- 351 425 rms 357 224 312 --- 278 dihe 290 269 361 223 ---
Table~2 reveals positive correlation between all the criteria, but these correlations are much less than the correlations of the criteria with the error. The only value greater than 0.5 was for bf to rsr, computed over all data. The average criteria-criteria correlation is 0.28 for the restricted data. The average deviation-criteria correlation is 0.49 for the same data.
To further test the independence of these five criteria, linear models were constructed and subjected to singular value decompostion tests. The routine ``svdfit'' of Press et al (1992) was employed for multiple-regression analysis. The best linear models were obtained using the log of the rms deviation as the error function. Individual T-tests suggest a high probability that each individual criterion is required for modeling the error (the largest value was P = 0.0017 for sidechain rms). This was true with the analysis based on all six models, on only the three B-chain models, and on the restricted set of the six models.
The individual coefficients vary considerably, though they are all on the same order of magnitude. For example, the coefficient for rsr is about threefold that for rms with the restricted sidechain data, while they are about equal for the restricted mainchain data. We do not wish to suggest that the coefficients obtained from this particular study should be used --- only that these criteria are independent predictors of error.
We will attempt to combine them to give an overall score. Each correlation and model analysis thus far has been against the raw data. In order to put the disparate data (such as B-factor and R-factor) on the same scale, they are converted into standard deviation units relative to the mean.
Correlation of combined criteria
The rms difference between FDB and FDIBX (again, assumed to be the true error) is presented in Figure~4. The sum of the five criteria and each individual criterion, each in standard deviation units, are also shown. Data are presented separately for mainchain and sidechain atoms. The sum of the five independent criteria in standard deviation units was chosen as our final error criterion due to its consistency, as explained below.
The desired error detection method must identify ``incorrect'' residues and give no false negatives. An arbitrary testing cutoff must be selected for this criterion. Another arbitrary cutoff is used to select residues in error based on the rms deviation of the model from the crystal structure. Table~3 gives results for mainchain and sidechain atoms with a variety of error ( A) and criterion (sigma) cutoffs. The values are the averages over all 228 residues of the three models corresponding to FDB. The number of incorrect residues are noted, and the percent correctly identified by application of the criterion cutoff. The number of false positives count residues flagged as being in error, yet agreeing to the crystal structure within twice the Luzzati error.
Table 3. Identification of gross errors
The A error gives the rms deviation cutoff that defines a gross error. The results are averages of all residues in the three models corresponding to the FDB crystal structure. The sigma-cut gives the sum-of-criteria cutoff used to flag errors. The #bad counts residues deviating by more than the given error. The %hits are the bad residues correctly flagged. The #false are correct residues (within twice the Luzzati error) flagged as being in error.
atoms A-error sigma-cut #bad %hits #false ---- ------ ---- --- ---- ---- mc 1.0 1.00 27 89 3 mc 1.0 0.67 27 92 7 mc 1.5 1.00 21 97 3 sc 1.0 1.00 89 42 0 sc 1.0 0.67 89 53 1 sc 1.5 1.00 74 48 0 sc 1.5 0.67 74 60 1 sc 2.0 0.67 53 69 1 sc 2.5 0.67 39 82 1 sc 3.0 0.67 32 88 1
These analyses were carried out against each individual criterion and various combinations. The results using the five combined criteria consistently located the greatest percent of residues in error, and was even more impressive in producing the smallest number of false positives. The method works very well for identifying problems in the mainchain. Nearly 10% of the residues in the molecular replacement models are grossly in error. Using 1.0 A as the cutoff for errors and 1.0 sigma for the cutoff for the criterion test, the method identifies nearly 90% of the problem residues. Only about one percent of the protein give a true false negative. An additional two percent of residues having errors greater than twice the Luzzati limit, but less than the cutoff, were also flagged. These are taken as true, but less severe, errors.
These same cutoffs applied to sidechain residues give a less impressive result. Nearly 40% of the residues are in error by the 1.0 A distance cutoff, but only 42% are identified as such. However, there were no true false negatives reported. Using 2.0 A as the error cutoff, nearly a quarter of the sidechains are grossly in error. With a 0.67 sigma criteria cutoff, the method identifies nearly 70% of the problem residues and only 1 residue as a false positive. An additional seven percent of residues with less severe errors are also flagged.