Short Communication: Evaluating the accuracy of binary classifiers for geomorphic applications Journal Article uri icon



  • Abstract. Airborne lidar has revolutionized our ability to map out fine-scale (~1-m) topographic features at watershed- to landscape-scales. As our ‘vision’ of land surface has improved, so has our need for more robust quantification of the accuracy of the geomorphic maps we derive from these data. One broad class of mapping challenges is that of binary classification where remote sensing data are used to identify the presence or absence of a given feature. Fortunately, there are a large suite of metrics developed in the data sciences that are well suited to quantifying pixel-level accuracy of binary classifiers. In this paper, I focus on the challenge of identifying bedrock from lidar topography, though the insights gleaned from this analysis apply to any task where there is a need to quantify how the number and extent of landforms are expected to vary as a function of the environmental forcing. Using a suite of synthetic maps, I show how the most widely used pixel-level accuracy metric, F1-score, is particularly poorly suited to quantifying accuracy for this kind of application. Well-known biases to imbalanced data are exacerbated by methodological strategies that attempt to calibrate and validate classifiers across a range of geomorphic settings where feature abundances vary. Matthews Correlation Coefficient largely removes this bias such that the sensitivity of accuracy scores to geomorphic setting instead embeds information about the error structure of the classification. To this end, I examine how the scale of features (e.g., the typical sizes of bedrock outcrops) and the type of error (e.g., random versus systematic) manifest in pixel-level scores. The normalized version of Matthews Correlations Coefficient is relatively insensitive to feature scale if error is random and if large enough areas are mapped. In contrast, a strong sensitivity to feature size and shape emerges when classifier error is systematic. My findings highlight the importance of choosing appropriate pixel-level metrics when evaluating topographic surfaces where feature abundances strongly vary. It is necessary to understand how pixel-level metrics are expected to perform as a function of scene-level properties before interpreting empirical observations.;

publication date

  • November 2, 2022

has restriction

  • green

Date in CU Experts

  • January 17, 2023 9:14 AM

Full Author List

  • Rossi MW

author count

  • 1

Other Profiles