An Open-Source Cultural Consensus Approach to Name-Based Gender Classification Conference Proceeding uri icon

Overview

abstract

  • Name-based gender classification has enabled hundreds of otherwise infeasible scientific studies of gender. Yet, the lack of standardization, reliance on paid services, understudied limitations, and conceptual debates cast a shadow over many applications. To address these problems we develop and evaluate an ensemble-based open-source method built on publicly available data of empirical name-gender associations. Our method integrates 36 distinct sources—spanning over 150 countries and more than a century—via a meta-learning algorithm inspired by Cultural Consensus Theory (CCT). We also construct a taxonomy with which names themselves can be classified. We find that our method's performance is competitive with paid services and that our method, and others, approach the upper limits of performance; we show that conditioning estimates on additional metadata (e.g. cultural context), further combining methods, or collecting additional name-gender association data is unlikely to meaningfully improve performance. This work definitively shows that name-based gender classification can be a reliable part of scientific research and provides a pair of tools, a classification method and a taxonomy of names, that realize this potential.

publication date

  • June 2, 2023

has restriction

  • bronze

Date in CU Experts

  • January 28, 2024 10:12 AM

Full Author List

  • Van Buskirk I; Clauset A; Larremore DB

author count

  • 3

Other Profiles

International Standard Serial Number (ISSN)

  • 2162-3449

Electronic International Standard Serial Number (EISSN)

  • 2334-0770

Additional Document Info

start page

  • 866

end page

  • 877

volume

  • 17