Wikisym 2012 Report: Difference between revisions

From Wikimedia UK
Jump to navigation Jump to search
No edit summary
No edit summary
Line 56: Line 56:


[3 images]
[3 images]
==== Paolo Massa ====
http://www.wikisym.org/ws2012/bin/download/Main/Program/p13wikisym2012.pdf
Manypedia: Comparing Language Points of View of Wikipedia Communities
Paola Massa, Federico Scrinzi
Paolo Massa's presentation on Linguistic Points of View in Wikipedia dealt with the problem of the neutral and unbiased voice in the context of diverse cultural and lingustic communities. In another formulation of Brent Hecht's 'Tale of Two History Books', Massa asked 'Do editors on Arabic Wikipedia and editors on Hebrew Wikipedia write the same history of the “Gaza war”?'.
Massa drew attention to Wikipedia's own policies and dialogue around [http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view Neutral Point of View] (NPOV), and the known biases that result from author demographics. The page [http://en.wikipedia.org/wiki/Wikipedia:Systemic_bias ''Wikipedia:Systemic bias''] states that:
::“The Wikipedia project suffers from systemic bias that naturally grows from its contributors' demographic groups, manifesting an imbalanced coverage of a subject, thereby discriminating against the less represented demographic groups.”
“The average Wikipedian on the English Wikipedia is a male, technically inclined, formally educated, an English speaker (native or non-native), European–descent, aged 15–49, from a majority-Christian country, from a developed nation, from the Northern Hemisphere, and likely employed as a white-collar
worker or enrolled as a student rather than employed as a labourer”.
The concept of the NPOV is in itself contested, Massa cited Roy Rosenzweig's characterisation of it as Wikipedia's 'founding myth', a 'view from nowhere'. Massa went on to describe a number of Wiki encycolpedia projects that have established their own POV. Ecured.cu is Cuba's Wikipedia from a 'decolonizer point of view'. The atheist point of view comes in the form of atheism.wikia.com, and to 'discover what Wikipedia and the liberal media don't want you to know about' see conservapedia.com.
In the last section of his talk Massa demonstrated ManyPedia, a tool that enables cross-cultural analysis of specific Wikipedia articles. Having searched for a topic the user can select two different languages to compare how the subject is represented. These can be automatically translated to the users native tongue using Google translate. To see at a glance some of the salient differences between articles the images for each page are displayed together at the top.
[manypedia]

Revision as of 20:35, 20 October 2012

Introduction


Wikisym 2012 was hosted by the Ars Electronica Centre in Linz, and coincided with the Ars Electronica Festival The Big Picture. WikiSym is one the most important annual venues for sharing research, ideas and experience around Wiki collaborations, and was attended by academics, researchers and Wikimedia representatives from around the globe.

With the travel award generously provided by the Wikimedia UK, Gavin Baily was able to attend the symposium and present research projects at both WikiSym and Ars Electronica.

The work presented at symposium covered a broad range of themes, and was informed by the ever accumulating archives of Wikipedia and other Wiki projects. The scale and complexity these datasets provides researchers with a rich source of meta data that is generated in the course of Wiki production. Examples of this meta data include talk pages and comments, language inter-links, citations, revision logs, user account histories and usage, and also semantic analysis of page content. These features of the data were used to ask broader questions about the sociology, content, and cultural diversity of Wiki projects, and ultimately how Wiki communities function, and how they might be developed in the future.

In the following sections I'll outline some of themes that emerged from the symposium, focusing on research around Wikipedia and the diverse methods of analysing content. In particular I'll look at work that explores how Wikipedia is represented in different language editions, a topic that was central to the symposium keynotes, and the subject of a number of research papers. The work outlined here represents a small fraction of the contributions, for a comprehensive program see WikiSym 2012.


Wikipedia representation by language and culture

Jimmy Wales

One of the highlights of the symposium was Jimmy Wales keynote speech which focused on the opportunities and challenges for Wikipedia over the next 5 to 10 years. As a key aspiration Wales asked us to 'Imagine a world in which every single person on the planet is given free access to the Sum of all Human knowledge'. In terms of growth, Wales identified the developing world and in particular Africa as a site for increasing participation. Although increasing internet access is the main driver in this, Wales said that he had caused some controversy when he tweeted that “Broadband speed in Nigeria. Beats New York City”

As part of the effort to encourage participation in under-represented languages Wales described the sudden growth of the Yorùbá edition in 2011, and his presentation of the second annual 'Jimbo Award' to User:Demmy for this single-handed contribution. User:Demmy had created a bot that added 15,000 articles in one month, doubling the editions size, and increasing the number of active editors. Drawing a parallel with other editions of Wikipedia, Wales said that the use of bots to generate a large number of articles (particularly for geographic locations) had been significant in bolstering Polish, and indeed English.


[Mark contributors by language]

Considering some of the challenges facing Wikipedia, Wales discussed gender imbalance, the slight decline in the number of English editors, and strategies for improving the user-friendliness of Wiki user interfaces. In relation to gender imbalance Wales described the case of Kate Middleton's dress. When the article was first written it was flagged for deletion as lacking notability, despite the widespread media interest. This in contrast to the volume of articles about Linux releases, which arguably have a smaller audience. Jodi Schneider pointed out that the dress controversy has spawned a whole category Royal wedding dresses.

On the subject of English Wikipedia's decline, Wales speculated that one possible factor is that so many historic subjects already have comprehensive articles. He cited the case of George Wallace, 45th Governor of Alabama, a less famous politician, who already has a very comprehensive article. Looking to the future of Wikipedia's editing tools, although innovations such as the visual editor are in development, Wales was concerned that too greater conservatism within the community would impede efforts to experiment with more user-friendly interfaces, and so encourage new contributors.

[1]


Brent Hecht: Mining and Applying Diverse Perspectives in User-Generated Content

The second keynote was given by Brent Hecht from NorthWestern University, who discussed how Wikipedia reflects cultural contexts and some new tools and algorithms for examining this cultural diversity. Hecht set the scene by describing a tale of two history books. Growing up he noticed that family friends from Mexico had rather different accounts of the American Mexican war.

These discrepancies prompted questions about how Wikipedia articles compare across different languages, and also whether language editions cover the same set of concepts. Using concept alignment algorithms that indicate whether an article in one language refers to the same concept as that in another, Hecht showed that across all Wikipedias most concepts belong to a single language and that few concepts appear in a large number of languages. The result was illustrated with the example of chocolate, a common concept across many languages, but with a fraction of the articles compared with those for culturally unique chocolate products. One of the key points is that that larger language editions are not supersets of smaller ones, an that the set of concepts for an encyclopedia is culturally specific. For example the French and German editions are a similar size but only share 33% of the same concepts.

[Chocolate]

Hecht next outlined how language editions are biased towards countries where the language is prominent, a phenomena he described as the 'self-focus bias'. To measure the comparative degree of self-focus for a language, Hecht used the 'Indegree Sum'. This is calculated by summing the number of articles that refer to locations in each country. For example in the Finnish Wikipedia the articles Eurovision Iaulukilpailu, Linus Torvalds and Alfred Hitchcock all refer to Helsinki which increase the Finland count. The Alfred Hitchcock article also points to London which increases the UK count. The result is an article count for each country that can be displayed as a choropleth map, and which typically indicates self-focus.

[Finnish and Spanish Indegree]


In the final section of his talk Hecht presented two tools that explore how Wikipedia concepts are represented across different language editions, and how they can be spatial categorised. Omnipedia visualises how each Wikipedia concept is referenced in 32 language editions, highlighting which articles are unique and which are shared across languages. Hecht demonstrated the software with the search concept conspiracy theory. In the screenshots below the single colour dots are unique related articles, and the multicoloured disks show articles occurring in various languages. The Hebrew Wikipedia is the only one to mention a Middle Eastern conspiracy theory about Microsoft Windows, whereas The Protocols of the Elders of Zion is a more widely held conspiracy.


[omnipedia images]

Expanding on the research around self-focus bias, the Atlasify project maps the relatedness of a concept to any one of a number of spatial reference systems. As an example Hecht showed how the concept Nuclear Power can be visualised as a choropleth map on three different reference systems: the World Map, the Periodic Table, and the U.S. Senate Seating Plan. This extraordinarily powerful search system is applicable to any Wikipedia category, and will be of great interest to a variety of Wikipedia users. http://www.brenthecht.com/papers/bhecht_sigir2012_ExpSpatialization_SRplusE.pdf

[3 images]

Paolo Massa

http://www.wikisym.org/ws2012/bin/download/Main/Program/p13wikisym2012.pdf

Manypedia: Comparing Language Points of View of Wikipedia Communities Paola Massa, Federico Scrinzi

Paolo Massa's presentation on Linguistic Points of View in Wikipedia dealt with the problem of the neutral and unbiased voice in the context of diverse cultural and lingustic communities. In another formulation of Brent Hecht's 'Tale of Two History Books', Massa asked 'Do editors on Arabic Wikipedia and editors on Hebrew Wikipedia write the same history of the “Gaza war”?'.

Massa drew attention to Wikipedia's own policies and dialogue around Neutral Point of View (NPOV), and the known biases that result from author demographics. The page Wikipedia:Systemic bias states that:

“The Wikipedia project suffers from systemic bias that naturally grows from its contributors' demographic groups, manifesting an imbalanced coverage of a subject, thereby discriminating against the less represented demographic groups.”

“The average Wikipedian on the English Wikipedia is a male, technically inclined, formally educated, an English speaker (native or non-native), European–descent, aged 15–49, from a majority-Christian country, from a developed nation, from the Northern Hemisphere, and likely employed as a white-collar worker or enrolled as a student rather than employed as a labourer”.

The concept of the NPOV is in itself contested, Massa cited Roy Rosenzweig's characterisation of it as Wikipedia's 'founding myth', a 'view from nowhere'. Massa went on to describe a number of Wiki encycolpedia projects that have established their own POV. Ecured.cu is Cuba's Wikipedia from a 'decolonizer point of view'. The atheist point of view comes in the form of atheism.wikia.com, and to 'discover what Wikipedia and the liberal media don't want you to know about' see conservapedia.com.

In the last section of his talk Massa demonstrated ManyPedia, a tool that enables cross-cultural analysis of specific Wikipedia articles. Having searched for a topic the user can select two different languages to compare how the subject is represented. These can be automatically translated to the users native tongue using Google translate. To see at a glance some of the salient differences between articles the images for each page are displayed together at the top.

[manypedia]