Submissions/Authority Addicts: The New Frontier of Authority Control on Wikidata

This is an accepted submission for Wikimania 2013.

Watch session on YouTube

Submission no.
2055
Subject no.
B1
Title of the submission

Authority Addicts: The New Frontier of Authority Control on Wikidata

Type of submission

Panel

Author of the submission

Maximilian Klein

Country of origin

USA

Affiliation

OCLC

E-mail address

isalix@gmail.com

Username

w:User:Maximilianklein

Personal homepage or blog

hangingtogether.org

notconfusing.com

Abstract

Librarians have long known the benefits of Authority Control (AC) in organizing information. Among other features, it disambiguates between the multiple names of a single entity, and similar names of multiple entities. Since 2009 Wikipedias took up the practice themselves. Today .75 million articles have proliferated to include AC by way of crowdsourced drives and algorithmic bots. The migration of AC to Wikidata in some ways makes Wikidata an Authority File itself, bridging multiple language versions of articles and multiple AC identifiers. Will Wikidata make obsolete traditional AC efforts?

Detailed proposal

Authority Control (AC) is a Library Science technique for organizing information. Amongst other features, it disambiguates between the multiple names of a single entity, or similar names of multiple entities. Prominent examples of Authority Files (AF), which implement AC, include the Library of Congress Control Number (LCCN), and the Gemeinsame Normdatei (GND) which were made by the American and German National Libraries respectively. Their construction took decades, and feature strict rules and heavyweight data models. It is precisely this value in meticulous attention to detail, that Wikipedians recognize. Wikidata, on the other hand, is on course to become an accidental Authority Control system. We are at a point where we can preempt accidental mistakes and shape AC on Wikidata into a well-designed long-term, collaborative system by understanding it in the trajectory of AC as a whole.

As a casual onlooker you could be forgiven for not understanding AC history. “Disambiguation”, as a term, is now is only now popularized by its high use in Wikipedias, through “Disambiguation Pages”. But those pages aren't the only disambiguation technique on Wikipedia. In 2009 German Wikipedia launched a large drive to further disambiguate by implementing Normdaten AC by hand. Gadget-based computer-assisted editing made in-roads at Wikimedia Commons by adding AC to Creator Pages. This year, data imports from the Virtual International Authority File (VIAF) into Wikidata have extended AC to the grips of all Wikipedias.

The coming of Wikidata is an impetus to rethink Wikipedians' approach to AC in scope and in necessity. On the topic of scope, most Wikipedians have generally simplified to applying AC only to disambiguated names. However Authority Files in some instances, such as the GND attempt classify all concepts of human knowledge, and uses seven additional categories, like Corporate names, Geographic places, and most generally Subject. The early attempt to blanket Wikidata with GND was rejected by the community since the classification describing most entities as subjects is understandably lacking. Yet there is still a use for classifying non subject entities like Geographic places using AC, and searching for more fitting classification schemes.

In the Person realm of AC to which we're accustomed, the logical step is to merge AC from multiple Wikipedias, for the advantage of refining and expanding each languages’ AC, and spreading AC to all Wikipedias. That is already happening by bot software. This confluence of centralized and editable data has important implications for the future.

Wikidata bridges multiple Authority Files with multiple related Wikipedia pages, and thus becomes an Authority File in its own right. Wikidata is an Authority File of Wikipedia Concepts, with internal and external identifiers. Combine that with the power of crowdsourcing for extensibility, and a you have a compelling competitor for a new de facto standard in Authority Files in the cases where Wikipedia Concepts are about people. Yet its acceptance will rely on the willingness of both Librarians and the General public to treat it as “authoritative”. Treating crowdsourcing as an authoritative source seems far-fetched, but similar scoffs were proven wrong when Wikipedia itself grew to mainstream recognition. Librarians may likewise be unhappy about utilizing Wikidata as an Authority File, but its open and free nature could be tempting in a future of increasing complexity and financial constraints.

What would an adoption of Wikidata in AC mindshare mean for traditional AC? Our panelists discuss.

Phoebe Ayers, former Wikimedia Board of Trustees Member and Academic Librarian moderates a panel of:

  • David Palmer, Associate University Librarian & Digital Strategist, The University of Hong Kong
    • Having made an Authority Control (AC) system for Chinese Institutional Repositories ( Hong Kong Chinese Authority Project ), David gives both an Pre-Wikipedia view of the matter, as well as an Asian one.
  • Mathias Schindler, Co-Author of GND-Beacon, Wikimedia Deutschland
    • Creating a hack for Websites to declare which GND-identified entities exist at a website, Mathias understands technical underpinnings of AC on the web, and the relation of the ever-popular GND to Wikipedia.
  • Andrew Gray, Wikipedian in Residence, British Library
    • Initiator of the consensus process to bring VIAF to English Wikipedia, Andrew understands how to convince the average Wikipedian of the use of AC.
  • Max Klein, Wikipedian in Residence, OCLC
    • Programmer of VIAFbot for Wikipedia and Wikidata, Max provides technical and statistical knowledge of the AC landscape on Wikimedia projects.


Track
  • Cultural and Educational Outreach
Length of presentation/talk

25 Minutes 60 minutes

Language of presentation/talk

English

Will you attend Wikimania if your submission is not accepted?

Yes

Slides or further information (optional)

Editing statistics from the first VIAFbot, see hangingtogether.org/?p=2306

Authority conflict statistics from the first VIAFbot, see hangingtogether.org/?p=2306

Special requests


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Wolfyeb (talk) 21:50, 29 April 2013 (UTC)[reply]
  2. Blue Rasberry (talk) 21:58, 29 April 2013 (UTC)[reply]
  3. DarTar (talk) 22:04, 29 April 2013 (UTC) (interested in hearing where ORCID fits in this discussion) (replies on talk page)[reply]
  4. Daniel Mietchen (talk) 23:14, 29 April 2013 (UTC) (also with an interest on how ORCID fits in)[reply]
  5. Ijon (talk) 00:07, 30 April 2013 (UTC)[reply]
  6. Legoktm (talk) 17:58, 2 May 2013 (UTC)[reply]
  7. Aubrey (talk) 21:24, 3 May 2013 (UTC)[reply]
  8. Micru (talk) 02:49, 4 May 2013 (UTC)[reply]
  9. Multichill (talk) 13:59, 4 May 2013 (UTC)[reply]
  10. Axel Pettersson (WMSE) (talk) 09:48, 6 May 2013 (UTC)[reply]
  11. Ocaasi (talk) 21:48, 8 May 2013 (UTC)[reply]
  12. Waldir (talk) 14:28, 14 May 2013 (UTC)[reply]
  13. Bender235 (talk) 15:57, 25 May 2013 (UTC)[reply]
  14. ChristophZ (talk) 21:27, 9 June 2013 (UTC)[reply]
  15. Susannaanas (talk) 16:21, 30 July 2013 (UTC)[reply]