A model of Factor D was built using interactive graphics, based on visual homology modeling with seven superimposed serine proteases. The model had nearly ideal geometry, all dihedral angles set to allowed values, and violated none of the ``rules'' of protein structure. The modeler's naive view, perhaps seduced by the beauty of computer graphics, was that there might be no need to do the experiment.
An unambiguous molecular replacement solution and subsequent refinement against the native data produced very respectable molecular geometry and R-factor, at the expense of substantial coordinate shifts. The modeler's revised naive view held that the structure was essentially solved, and refinement could be completed with a few rounds of refitting based on difference maps using native data only.
The view of the experimentalists holds that one might not escape from the bias of the model. Experimental MIR phases were determined; phases from the molecular replacement model were never used. A high-quality 2.0\ A structure was ultimately produced. The differences between the final structure and the homology models have been documented here. These differences often exceed what most crystallographers would consider correct.
Modeling Methods
The production of the manual model (FDM) required two days work on a high-performance graphics workstation. The production of the more automated model (FDQ) with commercial software required almost a day. ( Another protein of similar size required only two hours to model, as familiarity with the software was attained.)
FDM and FDQ are roughly equidistant from the true structure, both before and after refinement. They are nearly as different from each other as they are from the crystal structure. Both model structures fail to accurately reproduce the crystal structure around the active site substrate binding loops. An experienced researcher can build as good a model with freely available academic software as with a commercial package. However, the commercial package has a better user interface, integrates more features, and can accomplish the task more quickly.
Few systems should be as easy to model as serine proteases, given the wealth of data on these proteins. The models built purely by graphical methods have severe errors. The final crystal stuctures were examined after the fact to determine if the regions of poor fit to the homology models could have been modeled better to begin with. A variant of the ``spare parts'' method (Jones and Thirup, 1986) was used with a database of 62 highly-refined protein structures. Residues 199-202 should have been modeled with coordinates from 3EST. Several proteins fit the bend from 113-116 better, but none of the proteases. The longer stretches 43-50, 81-89, and 161-167 generally had some protein that would fit to an rms of 1-1.5 A, but it is unclear how these might have been selected in the first place.
Irrespective of the source of the model, experimental data will be required to back it up. The improvement in the models after X-PLOR refinement against the empirical data is impressive (consult Tables 4 and 5 and Figures 7 and 8). However, the structure apparantly becomes trapped in a local minima from which it cannot escape, even after many iterations of annealing.
We do not wish to cast doubt on the usefulness of molecular replacement in general. However, there is a major unanswered question: could the correct structure have been attained without resorting to MIR methods and many rounds of manual refitting on graphics?
Figure~9 shows the completely computer-generated FDIBX molecular replacement model with the FDB crystal structure and the computed OMITMAPS based only on the FDIBX model and the native data. It would appear that the model could be refit to this map, especially if one knew that this particular region was in error}
We have developed a statistical protocol that can in large part identify the incorrect residues with few false positives in an accompanying paper (Carson et al, 94). The protocol employs temperature factors, real-space fit residual, geometric strain, dihedral angle sensibility, and coordinate shifts from the previous refinement cycle. We intend to have a student unaware of the history of this project attempt to refit the problem residues with map fitting software under development.
We gratefully acknowledge NASA grant NAGW-813 and Public Health Sevice grant AI32949 for support.