Developing and Benchmarking Sage 2.3.0 with the AshGC Neural Network Charge Model.
Journal Article
Overview
abstract
Partial atomic charges are a fundamental component underlying classical molecular simulations, but assigning charges remains a computational bottleneck; many common methods rely on quantum mechanical calculations that scale poorly with molecular size and are sensitive to the choice of conformer geometry. We introduce Open Force Field (OpenFF) AshGC, a new graph convolutional neural network charge model, as well as the Sage 2.3.0 small molecule force field for drug-like molecules parametrized to be consistent with AshGC. AshGC is designed to efficiently produce conformer-independent charges of semiempirical quality at linear cost for molecules of all sizes, from small molecules to macromolecules. AshGC largely generates charges within the range of other accepted AM1-BCC backends such as OpenEye's oequacpac and AmberTools' sqm, deviating most in smaller molecules between 4 and 9 heavy atoms, in negatively charged molecules, and areas of chemistry underrepresented in the training set, such as particular sulfur- and phosphorus-containing functional groups. We further present the development and performance of Sage 2.3.0, which has both Lennard-Jones and valence parameters that are retrained to be consistent with neural network charges for the first time. Benchmarks spanning gas-phase geometry optimization through protein-ligand binding free energies show Sage 2.3.0 performs comparably to earlier Sage releases, with modest improvements in condensed-phase properties and a slight decrease in nonaqueous solvation free energy accuracy. As with other OpenFF force fields, Sage 2.3.0 was validated in protein-ligand benchmarks to be compatible with Amber protein force fields. All data are publicly available, along with scripts and environments for reproducing the training and benchmarking of AshGC and Sage 2.3.0 at https://github.com/openforcefield/ashgc-v1.0-fit and https://github.com/openforcefield/ash-sage-rc2 respectively.