Talk:Technology Committee/Project requests/WikiRate - rating Wikimedia: Difference between revisions

From Wikimedia UK
Jump to navigation Jump to search
(→‎Comments from Tom: new section)
(adding interesting background read)
Line 44: Line 44:
: : Regarding assessment areas, take a look at Table 3 of [http://stvilia.cci.fsu.edu/wp-content/uploads/2011/04/InfoWorkOrganization.pdf Stvilia, Besiki, et al. "Information quality work organization in Wikipedia." Journal of the American society for information science and technology 59.6 (2008): 983-1001.]. It gives a good list of criteria. For example, "Accessibility" has "caused by (1) Language barrier (2) Poor organization (3) Policy restrictions imposed by copyrights, Wikipedia internal policies, and automation scripts" and suggests actions such as "Reorganize, duplicate, remove, translate, split, join, rearrange"." See the whole list, I think it might be helpful for your chart, [[User:Sjgknight|Simon]]. [[User:Jodi.a.schneider|Jodi.a.schneider]] ([[User talk:Jodi.a.schneider|talk]]) 10:48, 13 April 2014 (BST)
: : Regarding assessment areas, take a look at Table 3 of [http://stvilia.cci.fsu.edu/wp-content/uploads/2011/04/InfoWorkOrganization.pdf Stvilia, Besiki, et al. "Information quality work organization in Wikipedia." Journal of the American society for information science and technology 59.6 (2008): 983-1001.]. It gives a good list of criteria. For example, "Accessibility" has "caused by (1) Language barrier (2) Poor organization (3) Policy restrictions imposed by copyrights, Wikipedia internal policies, and automation scripts" and suggests actions such as "Reorganize, duplicate, remove, translate, split, join, rearrange"." See the whole list, I think it might be helpful for your chart, [[User:Sjgknight|Simon]]. [[User:Jodi.a.schneider|Jodi.a.schneider]] ([[User talk:Jodi.a.schneider|talk]]) 10:48, 13 April 2014 (BST)


== Related wikimania talks ==
== Related wikimania talks and materials ==


* On Twitter Aaron Halfaker (now at the Foundation) has pointed out he's proposed a [https://wikimania2014.wikimedia.org/wiki/Submissions/WikiCredit_-_Calculating_%26_presenting_value_contributed_to_Wikipedia related Wikimania talk]
* On Twitter Aaron Halfaker (now at the Foundation) has pointed out he's proposed a [https://wikimania2014.wikimedia.org/wiki/Submissions/WikiCredit_-_Calculating_%26_presenting_value_contributed_to_Wikipedia related Wikimania talk]
*Jean-Frédéric Berthelot and Pierre-Selim Huard are also [https://wikimania2014.wikimedia.org/wiki/Submissions/A_framework_to_evaluate_entities_impact:_wm_metrics_and_friends proposing a talk] about some tools used by WMFR. --[[User:MichaelMaggs|MichaelMaggs]] ([[User talk:MichaelMaggs|talk]]) 17:44, 15 April 2014 (BST)
*Jean-Frédéric Berthelot and Pierre-Selim Huard are also [https://wikimania2014.wikimedia.org/wiki/Submissions/A_framework_to_evaluate_entities_impact:_wm_metrics_and_friends proposing a talk] about some tools used by WMFR. --[[User:MichaelMaggs|MichaelMaggs]] ([[User talk:MichaelMaggs|talk]]) 17:44, 15 April 2014 (BST)
* See http://edutechwiki.unige.ch/en/Wiki_metrics,_rubrics_and_collaboration_tools#Introduction for some interesting related background [[User:Sjgknight|Sjgknight]] ([[User talk:Sjgknight|talk]]) 23:29, 25 April 2014 (BST)


== WikiTrust ==
== WikiTrust ==

Revision as of 23:29, 25 April 2014

SMART

I think this is now being used so much as a jargon word that the meaning was lost a while back. Metrics are not SMART, could this please be corrected as this just promulgates further misunderstanding? -- (talk) 20:11, 12 April 2014 (BST)

'SMART metrics' -> 'SMART targets'. Thanks for pointing that out. --MichaelMaggs (talk) 20:57, 12 April 2014 (BST)

Thoughts on task breakdown

Thanks Michael this looks like a great start. I'm just having a bit of a brain dump here. There are some challenges on which input, or/and experimentation will be required. These include:

  • Defining the classes/outcomes
  • Defining the relevant input feature set (and their detection) which will be used as predictors
  • Segmentation - both with regard to the level at which outcomes are assigned (e.g. edit, article) and with respect to

Some of this matters more for machine learning techniques, while we might also use some simple measures (e.g. looking at T1-T2 difference on article specific Wiki ToDo list) which depend on features as proxies for outcomes. On that Re: outcomes, one simple thing for machine learning particularly as a first step might just be: 1) was the edit positive, 2) negative, 3) neutral with respect to improving the article. We could then provide a breakdown of such edits. . We should also be cautious that our feature selection doesn't exclude some widely missed but important features (e.g. alt-text). The rubric below might be a good way to 'present back' and assess improvements (with some aggregation method probably for overall improvement).

Assessment area Scoring methods Score
Comprehensiveness Score based on how fully the article covers significant aspects of the topic. 1-10
Sourcing Score based on adequacy of inline citations and quality of sources relative to what is available. 0-6
Neutrality Score based on adherence to the Neutral Point of View policy. Scores decline rapidly with any problems with neutrality. 0-3
Readability Score based on how readable and well-written the article is. 0-3
Formatting Score based on quality of the article's layout and basic adherence to the Wikipedia Manual of Style 0-2
Illustrations Score based on how adequately the article is illustrated, within the constraints of acceptable copyright status. 0-2
Total 1-26

from https://en.wikipedia.org/wiki/Wikipedia:WikiProject_United_States_Public_Policy/Assessment#Quantitative_Article_Quality_Assessment_Metric

It's also the case that the salient features may vary with a combination of temporal and editor-interaction factors. Early stage articles benefit greatly from addition of different features to later stage ones (e.g. amongst others http://scholar.google.co.uk/scholar?cluster=2386972856451571180&hl=en&as_sdt=0,5).

There's also an interesting point re: namespace contributions on talk and article pages, presumably in the first instance we're looking only at article contributions.

It is also worth noting that whatever we do, we shoud where possible consider implications for non-English Wikipedias, in particular the ways in which references are used are (I believe) different in different Wikipedias. This may well also be true of different projects.

Finally, we should also note if we did anything sucessfully, a number of benefits might also be gained including automation of quality (or semi-automation) within projects, etc., and potential for new editor engagement experiments e.g. sending editors to articles which we think might be easily improved (a more sophisticated 'wiki to do' tool). Sjgknight (talk) 10:34, 13 April 2014 (BST)

Just crossed my mind it's also worth noting the potential benefits for e.g. education being able to take the contributions of a particular editor, and look for particular features (e.g. use of 'cite' templates) across the contributions. That would have benefits outside of Wikipedia (on Mediawiki) where analytics on writing style and content could be conducted. Sjgknight (talk) 10:45, 13 April 2014 (BST)
: Regarding assessment areas, take a look at Table 3 of Stvilia, Besiki, et al. "Information quality work organization in Wikipedia." Journal of the American society for information science and technology 59.6 (2008): 983-1001.. It gives a good list of criteria. For example, "Accessibility" has "caused by (1) Language barrier (2) Poor organization (3) Policy restrictions imposed by copyrights, Wikipedia internal policies, and automation scripts" and suggests actions such as "Reorganize, duplicate, remove, translate, split, join, rearrange"." See the whole list, I think it might be helpful for your chart, Simon. Jodi.a.schneider (talk) 10:48, 13 April 2014 (BST)

Related wikimania talks and materials

WikiTrust

WikiTrust is looking for an adopter, have you considered hosting/supporting it? --Nemo bis (talk) 20:11, 16 April 2014 (BST)

I wasn't aware of that. Could certainly be discussed. Do you know who would be best person to contact? --MichaelMaggs (talk) 16:36, 21 April 2014 (BST)
This reminded me I'd seen that discussion, this is the only thread I could find, no resolution as far as that indicates. Sjgknight (talk) 16:45, 21 April 2014 (BST)

User-based quality measures

Thanks for this work Michael! I think one of the most challenging aspects of quality is tying contributions to specific users, i.e. how to tie various programs or events to specific user contributions. Wikimetrics measures contributions, but is unable to measure whether contributions are of high "quality". During one of the discussions at the Wikimedia Conference, which will soon be posted on-wiki, there was a discussion about various methods for measuring quality. One of the themes brought up that might be interesting to pursue is to break up the measure of "quality" into different ideas, such as "popularity" or "appreciation"; but more discussion is definitely needed around that and how to measure. One of the main challenges about quality as well is that they vary significantly across various language projects. The measure of citations can vary across articles because other cultures have different customs around the idea of citations.


Another factor to consider, for example, is what is the benefit of the quality of an already pretty decent article versus the addition of new content to a stub. But this brings up the point that it could be risky to get too nit picky at the outset of measuring quality and that perhaps focusing on very general measures (such as number of headings, page views, etc.) might be more helpful and generalizable to all the projects. Thank you for starting the discussion around this topic! Looking forward to hearing your thoughts. I will also send you the link to the discussion around program outcomes when its posted. Regards - EGalvez (WMF) (talk) 21:59, 16 April 2014 (BST)

Comment by Charles Matthews (copied from wikimediauk-l)

There's the old DREWS acronym from How Wikipedia Works, to which I'd now add T for traffic. In other words there are six factors that an experienced human would use to analyse quality, looking in particular for warning signs.

  • D = Discussion: crunch the talk page (20 archives = controversial, while no comments indicates possible neglect)
  • R = WikiProject rating, FWIW, if there is one.
  • E = Edit history. A single editor, or essentially only one editor with tweaking, is a warning sign. (Though not if it is me, obviously)
  • W = Writing. This would take some sort of text analysis. Work to do here. Includes detection of non-standard format, which would suggest neglect by experienced editors.
  • S = Sources. Count footnotes and so on.
  • T = Traffic. Pages at 100 hits per month are not getting many eyeballs. Warning sign. Very high traffic is another issue.

Seems to me that there is enough to bite on, here.

Charles

Comments from Tom

So following on from my mailing list comment! I don't believe there is extensive technical work needed - it should be kept lightweight in terms of providing a way to pull a specific article from Wikipedia and apply definable metrics to it. To my mind the significant work is in evolving relevant metrics - perhaps for different use cases. So any tool should be carefully designed to be "pluggable" in the metrics you can apply. Some more thought, and development work, might be needed to support individual metric plugins - for example if we needed to create a corpus of data from many articles.

So to my mind what we really need is a framework which places a simple UI with good UX in front of a tool to pull an article from Wikipedia, and apply selected Metrics. These Metrics would be developed based on a documented plugin API so that e.g. *any* volunteer, stakeholder, or whomever, could build a Metric. Internally there should be a full test suite (good practice), Metric integration suite (determine user submitted Metric plugins were valid), and Metric submission process. Ideally the tool should be simple enough to install and use in a personal environment for a competent developer - for Metric development and custom installs.

We have all the bits in place to build this:

To a certain extent, this framework is entirely independent of actually deciding on Metrics to use.. and so could be specced and built in parallel with that discussion. I've done a bit of work on the concepts for this and would be happy to share them. It's quite an exciting project. I think it would probably not require too much in terms of development - probably 2 months (not full-time, obviously) from start to usable-prototype.

I've also got some comments related to metrics but will post them seperately to try and keep discussion lean. --ErrantX (talk) 12:43, 23 April 2014 (BST)