Branden and Jones (1990) note that crystallographic analysis is both objective and subjective. Modern crystallographic software such as X-PLOR (Brunger, 1992a) and O (Jones et al, 1991) contain assorted tools for more objective analysis. Guss et al (1992) have studied the effects of the refinement strategy on the accuracy and precision of coordinates. We investigate various methods to assess the validity of a protein crystal structure. The metrics are B-factor, real-space fit (omitmap and 2Fo-Fc), sliding R-window, R-free, geometric strain energy, phi/psi ``energy'', omega angle, chi angle ``energy'', rms shift in refinement, deviation from a database, exposed surface, and 3D folding profile.
Our test case is the structure of the serine protease Factor D determined by MIR methods. The crystal structure was refined to 2.0A to an R factor of 0.188 (Narayana et al, 1994). It is compared to models built by homology using methods similar to Greer's (1990), which ``solved'' the structure by molecular replacement methods and refined with X-PLOR (with no by human intervention) to reasonable R-factors and geometries. In order to better understand the errors that might arise from the use of homology models, we investigated the differences between the structure of Factor D and the original and refined homology models. Details of the model building, refinements, and differences are presented in the accompanying paper (Carson et al, 94).
The accuracy of models constructed from homology is of fundamental concern when the models cannot be confirmed by experimental methods. An experienced modeler can generate a structure that would be deemed correct on geometric grounds. Programs such as PROCHECK (Morris et al, 1992) or GEOM (Cohen, 1993) which assess all geometric features cannot adequately establish the validity of a model. Empirical diffraction data is also required.
There is also concern about the general validity of refined crystal structures obtained through molecular replacement techniques employing homology models. There is always the question of model bias when no empirically determined phases are available. A procedure to reliably identify suspect regions of the structure is needed.
We take the refined crystal structure of Factor D to be correct, and use the deviations of the models as error functions. Correlations of various criteria against the errors are evaluated. Statistical analysis suggests a linear model of 5 variables: temperature factor, real-space fit, geometric strain, dihedral angle value, and shift from the previous refinement cycle. A protocol to identify model errors based on these crystallographic, energetic, and geometric grounds is presented. Grossly incorrect residues are identified with approximately 90% accuracy.