Detection of protein secondary structures via the discrete wavelet transform.
We subject the primary sequence of proteins gathered from the Structural Classification of Proteins (SCOP) database to a discrete wavelet transform (DWT) analysis to search for predictors of secondary structures. We use proteins with both alpha helices and beta sheets (the A/B , A+B databases from SCOP). The amino acids composing the protein are converted to their hydrophobicity values using three hydrophobicity scales. Results prove to be independent of the scale used. Using a DWT multiresolution decomposition, each protein is coarse grained, in effect, creating snapshots of each protein at multiple scales. For each protein, a control data set is formed by generating random realizations that remove the positional informational in the sequence but still contain the same amino acid frequencies. Regions of salient hydrophobicity in the protein sequence are identified by comparing the transforms of the original sequence with those of the control set, at each resolution. We find significant matching between regions of salient hydrophobicity and the locations of secondary structure along the amino acid chains. We calculate the sensitivity, specificity, and Matthews correlation to quantify the agreement between the wavelet detected structures and the real protein. In addition we are able to distinguish between the morphologically different subsets, A/B and A+B. We also construct a correlation function based on the DWT that correlates quasilocalized structures at lengths in wavelet space. Through a similar comparison to the control data sets, features in this space-scale correlation are identified that show correspondence to the typical lengths of the secondary structures.