Moving Past Metadata: Improving Digital Libraries with Content‐Based Methods Journal Article uri icon

Overview

abstract

  • AbstractThe growth of text mining and corpus analytic scholarship over large digital libraries brings to light the issues created by text duplication and variation within collections that are not adequately addressed in metadata practices. The SaDDL project was a study examining text duplication and similarity in massive digital library collections. Using the HathiTrust Digital Library, this project aims to reduce the bias that duplication gives rise to. We present the content‐based methods of the project, employing a convolutional neural network classification approach, as well as SaDDL's outcomes. The datasets provided by the project will assist in improving cataloging practice and aid scholars in using large text corpora in research.

publication date

  • October 1, 2021

has restriction

  • closed

Date in CU Experts

  • January 12, 2023 10:35 AM

Full Author List

  • VandenBosch A; Schmidt BM; Matusiak KK; Organisciak P

author count

  • 4

Other Profiles

International Standard Serial Number (ISSN)

  • 2373-9231

Electronic International Standard Serial Number (EISSN)

  • 2373-9231

Additional Document Info

start page

  • 849

end page

  • 851

volume

  • 58

issue

  • 1