Supervised self-organizing maps in drug discovery. 1. Robust behavior with overdetermined data sets.
The utility of the supervised Kohonen self-organizing map was assessed and compared to several statistical methods used in QSAR analysis. The self-organizing map (SOM) describes a family of nonlinear, topology preserving mapping methods with attributes of both vector quantization and clustering that provides visualization options unavailable with other nonlinear methods. In contrast to most chemometric methods, the supervised SOM (sSOM) is shown to be relatively insensitive to noise and feature redundancy. Additionally, sSOMs can make use of descriptors having only nominal linear correlation with the target property. Results herein are contrasted to partial least squares, stepwise multiple linear regression, the genetic functional algorithm, and genetic partial least squares, collectively referred to throughout as the "standard methods". The k-nearest neighbor (kNN) classification method was also performed to provide a direct comparison with a different classification method. The widely studied dihydrofolate reductase (DHFR) inhibition data set of Hansch and Silipo is used to evaluate the ability of sSOMs to classify unknowns as a function of increasing class resolution. The contribution of the sSOM neighborhood kernel to its predictive ability is assessed in two experiments: (1) training with the k-meansclustering limit, where the neighborhood radius is zero throughout the training regimen, and (2) training the sSOM until the neighborhood radius is reduced to zero. Results demonstrate that sSOMs provide more accurate predictions than standard linear QSAR methods.