Macrogrants/Wikimedia Commons Geograph and Avionics batch upload projects support

From Wikimedia UK
< Macrogrants
Revision as of 16:35, 15 November 2013 by (talk | contribs) (ce)
Jump to navigation Jump to search
Objective

This application is for supporting funding for two of the largest non-GLAM projects on Wikimedia Commons. These projects are entirely driven by unpaid volunteers and have a track record of delivering huge amounts of valued content for Wikipedia in many languages.

Goals
  • Deliver 100,000 uploaded and categorized quality amateur Avionics photographs from a selection of forums, with releases from the photographers on record. Project page: Commons:Commons:Batch uploading/Airliners.
  • Geographic categorization of UK Geograph images (currently just under 2 million on Commons) and refresh the collection with those now available in higher resolutions and update with additional photographs (a total current collection of 3.7 million photographs). Project page: Commons:User:Faebot/Geograph.
Resources
  • Volunteer resources. These are long running projects spanning years rather than months and requiring regular maintenance tasks when complete. Highly reliant on volunteer time the plan needs to be flexible but firm announcements about the projects would be available for Wikimania 2014 and the main deliverables would complete by the end of 2014. The principle contributors in 2013 for the avionics project have been Russavia and Fæ with wide support from more than 150 other contributors. Time from principle contributors has been of the order of more than 10 hours per week. The Geograph project took a lot of development and test time in 2012 but is now less demanding, regular set-up and maintenance is of the order of 10 hours per month of Fæ's time.
  • Communications and hardware. The bandwidth costs have been high. A key reliability issue has been Fæ's internet connection and an inability to do any batch image processing apart from simple cropping (using a part time old notebook as a Windows installation). Video processing requests (such as conversions to OGV) have been rejected in the past due to this lack of power rather than lack of volunteer time.
    • The primary machine for this work is a maxed-out 2009 Macmini running OSX 10.5. This means that Python-scripted image processing is limited or impossible. It is proposed that a devoted machine running OS X Mountain Lion is purchased specifically to support Faebot's activities (currently the most active bot on Commons with a track record of over 2.2 million edits), this will provide much needed disk capacity and open potential for audio and video processing as well as supporting more complex image processing and identification issues on batch uploaded images. Current price is £499[1] (standard John Lewis price with 2 year warranty)
    • Bandwidth use has been high (capped at 40GB/month). An upgrade to a higher bandwidth service cost an extra £10/month, it is proposed that half of the years' costs are covered by a one-off grant in 2014 to support Faebot's batch processing related activities (£60).
    • WMUK previously agreed to pay for a £15 memory stick to reduce the likelyhood of hard-drive damage, though this has yet to be claimed for. Considering a 32GB stick will probably not be sufficient to cope with a full xml dump from Commons in 2014 (needed for the Geograph project), it is proposed that a 64GB usb stick is purchased with expected costs around £30.
  • Staff resources. None.
  • Expenses. Limited to postage, perhaps £10, no travel is expected.
  • Access costs. An obstacle to uploading some sets of restricted images (but where a licence release is on record in OTRS) has been that we require membership for some of the forums. The membership cost for Airliners.net (the main resource so far) is $55 for a year, there are options for taking 3 month or 6 month memberships that may be suitable. Around 9 forums are on our target list and a general membership budget of up to £100 may be sufficient as and when these purchases will have the most benefit.

The total budget to support the above is estimated at £650.

Constraints

None.

These projects are noted for being both engaging for "gnomic" volunteers and independent of the WMF or WMF managed tools. This probably remains desirable even if promotion of the outcomes (the images then available for reuse on all projects) may appeal to "front-end" volunteers, the methods could be popular to present at events such as Wikimania or for more focussed workshops on how to manage large batch projects on Commons and in the longer term there may be regular maintenance or housekeeping bot tasks that could transition to WMF servers.

Outcomes
  • 100,000 Avionics photographs checked and categorized will provide an independent world standard reference base for aircraft of all types in all liveries. A consequence will be a consistent standard for using ICAO codes on all Aircraft categories, along with their geo-coordinates and photographs of the majority of airports, military air bases and air fields in all countries.
  • Consensus for the project methods of automatic Geo-categorization of the millions of photographs in the Geograph collection. This has currently been limited to UK County/Authority level due to the doubts about accuracy and a lack of standardization for naming lower level categories such as villages. An automated link using WikiData may be possible in 2014, though this will also require cross-project consensus. This has been an issue without firm consensus for several years.
  • Throughout 2014 a series of published tests, case studies and on-Commons guidelines for:
    • Best practices for using Ordnance Survey Open Data to categorize images on Commons by location.
    • Python and Pywikipediabot techniques for identifying and removing standard watermarks and credit bars from batch uploads. This may include the use of SciPy or similar open source tools to analyse and correct images.
    • Identification of near duplicates and copyright problems through image matching.
    • Using EXIF data for improved categorization and finding suspect images.
  • Ad-hoc outcomes from supporting Faebot, as an example both Commons:User:Noaabot's work maintaining daily USA weather maps on Commons and more unusual requests such as the maintained project dashboard on the Welsh wikipedia rely on Faebot being available every day of the year.
Risks
  • This project is highly reliant on Fæ's time and Russavia's expertise for avionics. Project pages such as Commons:Batch_uploading/Airliners actively encourage participation and it may be possible to get another bot operator interested in the relevant maintenance scripts that Faebot relies on. However there are no hard deadlines, so temporary illness or unavailability of a volunteer should not affect the long term outcome.