Last modified: 2010-10-16 00:09:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27238, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 25238 - Investigate and re-enable action=parse API module on Wikimedia wikis


Summary:	Investigate and re-enable action=parse API module on Wikimedia wikis

Status:	RESOLVED FIXED

Product:	Wikimedia
Classification:	Unclassified
Component:	Site requests (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Roan Kattouw

URL:
Whiteboard:
Keywords:	shell

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2010-09-21 18:57 UTC by MZMcBride
Modified:	2010-10-16 00:09 UTC (History)
CC List:	18 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
sq33:3128 hit ratio (53.60 KB, image/png) 2010-09-22 03:34 UTC, Tim Starling	Details
Add an attachment (proposed patch, testcase, etc.)

Description MZMcBride 2010-09-21 18:57:24 UTC

The action=parse API module was disabled because it was suspected of causing server load (or possibly bandwidth) issues. It should be re-enabled at some point.

The relevant server admin log entry is here: http://wikitech.wikimedia.org/index.php?diff=28854&oldid=28853

Comment 1 Justin 2010-09-21 22:30:22 UTC

I sincerely hope it is re-enabled soon. Now my application === broken :(

Comment 2 Platonides 2010-09-21 22:36:28 UTC

Dispenser noted in #wikimedia-tech that action=parse uses the parser cache, the problem that needs disabling should be just misses.

Comment 3 Tim Starling 2010-09-22 02:27:31 UTC

If you're using action=parse then please describe your application and typical requests on this bug report.

Comment 4 Bawolff (Brian Wolff) 2010-09-22 02:37:32 UTC

We're using action=parse on wikinews for the preview of a javascript tool that helps users change a template. ( [[n:WN:ML]] ).

An example request would look something like the following data posted  to the api:

action:	parse
format:	xml
prop:	text
pst:	true
title:	Main Page
text:	{{Lead 2.0 |id=3 <!-- do not change. Each lead must have its own unique ID --> |image=Kenny McKinley.JPG |width=100x100px |type=none |title=Denver Broncos player Kenny McKinley found dead aged 23 |short_title= |summary=Kenny McKinley, an american football player for Denver Broncos, has been found dead at the age of 23. }}

Comment 5 CBM 2010-09-22 02:55:09 UTC

The WP 1.0 tools use it (http://toolserver.org/~enwp10 and subpages).  The
results from the API are (supposed to be) cached in a local database on
toolserver with a 12 hour expiry, to lighten the load on the API. 

The things that are parsed are:

1) Templates such as [[en:Template:B-Class]], to get the formatting that has
been specified inside them. This formatting is used to make the toolserver
program give similar output to on-wiki tables. It would be silly to manually
keep the web tool code in sync with the templates. And the overall collection
of class templates is not predefined and can grow at the whim of a wikiproject. 

2) The page [[en:User:SelectionBot/HomePage]] is parsed to fill in the contents
of the http://toolserver.org/~enwp10 . 

The goal of this was to prevent, as much as possible, hard-coding formatting
into the web program that could be or should be updatable by people other than
the tool's maintainer.  

If there is a caching failure in this tool, it will be somewhat apparent in the
WMF server logs, because the requests would be coming from the toolserver's web
server. My logs show 4367 invocations in the last 24 hours.

Comment 6 Alex Z. 2010-09-22 03:03:05 UTC

I use it in [[Wikipedia:RefToolbar 2.0]] to do previews of citation templates. This is one of the most popular gadgets on the English Wikipedia, though I don't know how much this feature is used and this feature is only available with the new toolbar version, so not all users have access to it. Every parse request is manually triggered by the user.

Example request:

action: parse
title: wgPageName
prop: text
format: json
text: {{cite web|url=http://www.scu.edu.au/news/media.php?item_id=1023&action=show_item&type=M|title=Birds learn to eat cane toads safely |last=Marchant|first=Gillian|date=26 November 2007|work=Southern Cross University website|publisher=Southern Cross University|accessdate=2009-05-09}}

Comment 7 CBM 2010-09-22 03:07:56 UTC

A third use from the WP 1.0 bot: parsing tables such as [[en:User:WP_1.0_bot/Tables/Project/Libraries]] if people want to see them in the web tool instead of on the wiki.

Comment 8 Tim Starling 2010-09-22 03:26:00 UTC

What about this?

http://commons.wikimedia.org/w/api.php?action=parse&pst&text=%7B%7BMediaWiki%3AImageAnnotatorTexts%7Clive%3D1%7D%7D&title=API&prop=text&uselang=en&maxage=14400&smaxage=14400&format=json

There's a lot of those.

Comment 9 Tim Starling 2010-09-22 03:33:40 UTC

(In reply to comment #2)
> Dispenser noted in #wikimedia-tech that action=parse uses the parser cache, the
> problem that needs disabling should be just misses.

It appears that the problem was caused by squid cache hits, not parser cache misses. The byte hit ratio at sq33:3128 spiked from 18% to 92%.

Comment 10 Tim Starling 2010-09-22 03:34:23 UTC

Created attachment 7695 [details]
sq33:3128 hit ratio

Comment 11 Pol 2010-09-22 03:58:21 UTC

Our application is a free iPad application with more than 200,000 users. It's been the #1 free app during its launch for a week a couple months ago, and never left the top 10 in the Lifestyle category. As you can in its description, we're trying to create interesting alternative presentations of Wikipedia content to really make it look great on iPad: http://itunes.apple.com/us/app/id384224429?mt=8

Discover uses the parse action as a more efficient way to retrieve the contents of the pages from Wikipedia. It always uses the "page" argument, retrieving entire pages, which therefore should be in the cache (as indicated at the very end of the parsing result with the various timestamps). As far as I understanding, the application should be a good citizen toward the Wikipedia servers and it downloads less data this way.

Please re-enable "action=parse" for entire pages as soon as possible, as Discover is effectively completely broken right now.

For reference, here are all the exact parse API requests I could find in the source code:
* action=parse&prop=displaytitle%%7Ctext%%7Ccategories%%7Cexternallinks&page=%@&redirects&format=xml
* action=parse&prop=text%%7Cimages&page=%@&redirects&format=xml
* action=parse&prop=text%%7Cimages&page=%@&redirects&format=xml
* action=parse&prop=text&page=Wikipedia:Featured_articles&redirects&format=xml
* action=parse&prop=images&page=%@&redirects&format=xml
* action=parse&prop=links&page=Wikipedia:Featured_pictures&redirects&format=xml
* action=parse&prop=text&page=Template:In_the_news&redirects&format=xml

Comment 12 Guillaume Paumier 2010-09-22 04:08:36 UTC

(In reply to comment #8)
> What about this?
> 
> http://commons.wikimedia.org/w/api.php?action=parse&pst&text=%7B%7BMediaWiki%3AImageAnnotatorTexts%7Clive%3D1%7D%7D&title=API&prop=text&uselang=en&maxage=14400&smaxage=14400&format=json
> 
> There's a lot of those.

[[commons:Help:Gadget-ImageAnnotator]]

Comment 13 Max Semenik 2010-09-22 04:34:35 UTC

[[WP:AWB]] uses action=parse for previews. This tool is used by thousands of Wikimedians.

Typical requests are action=parse&prop=headhtml before the first preview and then just ordinary

action=parse&prop=text in GET with title=...&text=... in POST for every preview. While previews aren't displayed by default, this feature is extensively used.

Comment 14 Pol 2010-09-22 04:36:33 UTC

Note that other APIs don't return the same info as action=parse, so replacements is not easy.

For instance, links in the page retrieved from action=parse contain an extra attribute (exists="") indicating if the link exists. The alternative action=query&prop=links doesn't provide this info.

Comment 15 Ben 2010-09-22 04:55:46 UTC

I'm using the "parse" action for a site I'm developing, and I certainly don't want it suddenly disappearing from the API once we go live with it.

The site contains radio play lists and when you click on a music track it retrieves videos, images and information about each band.  The information comes from our own D/B when we have it, but falls back on Wikipedia when we don't.

Comment 16 Michael Dale 2010-09-22 05:06:12 UTC

Timed text pages: 
http://commons.wikimedia.org/wiki/Commons:Timed_Text_Demo_Page?withJS=MediaWiki:MwEmbed.js
make use of the api to grab the subtitles for a given video. ( but it uses page title parse ( which should use cache ) for good measure I have added in a maxage=3600 on page parse requests to ideally hit the squids instead of the apaches, but I don't think that it has anything to do with the load issues. 

But most likely the issue is caused by some cache miss issue for the site wide enabled features like Image Annotator ?

Comment 17 Pol 2010-09-22 05:58:19 UTC

Not sure if related (my guess is yes though), but the iPad and iPhone versions of Wikipanion, likely the most popular Wikipedia app on iOS, are not working anymore either (page just loads endlessly but nothing shows up).

Comment 18 Mathieu Poumeyrol 2010-09-22 07:06:27 UTC

(In reply to comment #3)
> If you're using action=parse then please describe your application and typical
> requests on this bug report.

At fotopedia, we use action=parse in the following contexts:

- server side for displayable text retrieval:
/w/api.php?action=parse&format=json&prop=text&page=....

These texts are cached on our side for 30 days. So it only happens when somebody tries to display an article page on fotopedia for the first time in a month. Before the API endpoint was unactivated yesterday, we were only requiring a dozen of pages a minute.

Right now, this part of fotopedia works fine, as long as the users don't wander in unexplored pages.

- client side for both article search and displayable text retrieval

These queries are triggered when a user adds a wikipedia article to a fotopedia page. The typical scenario is a search, followed by a series of:
/w/api.php?action=parse&format=xml&prop=text&page=....

We only have a handful of regular users of the client software, so I don't expect this to be a threat to wikipedia server stability either. On the other hand, the impact on their side is important for us business-wise.

Comment 19 Tim Starling 2010-09-22 08:01:01 UTC

(In reply to comment #16)
> But most likely the issue is caused by some cache miss issue for the site wide
> enabled features like Image Annotator ?

I think that ImageAnnotator is indeed the most likely culprit. As I said in comment #9, we're looking for squid cache hits, which most likely means requests with a maxage parameter. Between 14:20 and 14:29, we logged 71 requests with a maxage parameter in the 1/1000 sampled log. 47 were from ImageAnnotator, the other 24 were for [[MediaWiki:Sitenotice-translation]]. And since the sitenotice requests all went to the same URL, it's unlikely they'd hit the disk of the squids, which is what we saw. None of the logged requests came from a site other than commons.

Between 13:00 and 14:00, we logged an average of 74 requests per second from ImageAnnotator. 14:00 to 15:00 saw a decline to 60 req/s, presumably because sq31-33 were toast for most of that period. 

I think the best thing to do for now is to disable ImageAnnotator pending a performance review. Since certain administrators on Commons like to revert me when I change things there, I will leave action=parse disabled on Commons until a regular Commons administrator removes it from [[MediaWiki:Common.js]].

Comment 20 Justin 2010-09-22 08:53:30 UTC

Thanks for re-enabling the parse method.

(In reply to comment #3)
> If you're using action=parse then please describe your application and typical
> requests on this bug report.

For the record, my application makes requests for single pages, one at a time, from Wikipedia. An example request would be:

/w/api.php?action=parse&page=Art&format=json&prop=text|revid|links|displaytitle&redirects

I need to traverse the page content client side so it is essential that the content is parsed first, unless I write / implement a reliable parser myself and use the query method instead, though I hope now this is not necessary.

Thanks

Comment 21 Sascha Hendel 2010-09-22 09:34:23 UTC

Hi,
my web application is broken either.
I need the parse method to analyze image annotations (description & license) for inclusion in our online architecture database www.archinform.net
This information is cached on our server (only updated if the images are refreshed (manually initiated)). Shouldn't cause much traffic.

Please reenable the parse method soon,

thanks
Sascha

Comment 22 Daniel Kinzler 2010-09-22 09:38:19 UTC

I'm using action=parse for the featured article feed provided by wmde at <http://feeds.feedburner.com/wikimedia/wp-adt>. The feed should contain the teaser text, as it appears on the main page, as html. That is what I grab using action=parse.

Comment 23 Sascha Hendel 2010-09-22 09:44:39 UTC

Forgot to say, that a typical request looks like this:

http://commons.wikimedia.org/w/api.php?action=parse&format=xml&prop=text&title=TITLE&text=TEXT

Comment 24 Max Semenik 2010-09-22 10:21:39 UTC

Image Annotator disabled:
http://commons.wikimedia.org/w/index.php?title=MediaWiki:Common.js&diff=44218237&oldid=44154173
http://commons.wikimedia.org/w/index.php?title=MediaWiki:Gadgets-definition&diff=44218225&oldid=43070611

Comment 25 Sascha Hendel 2010-09-22 11:58:04 UTC

Cool, works again ;)

Thank You!

Comment 26 Roan Kattouw 2010-09-22 12:01:03 UTC

Reenabled action=parse on Commons. Tim had previously reenabled it on all other wikis, so action=parse is now back across the board. I'll be keeping a close eye on the API Squids throughout the day.

Comment 27 Pol 2010-09-22 12:03:25 UTC

Thanks, much appreciated!

Comment 28 Jarek Tuszynski 2010-09-23 14:01:00 UTC

Unfortunately the fix disabled "Image Annotator" gadget used to add localized description to over 21k files. It would be great is some solution was found to re-enable this great tool.

Comment 29 Helder 2010-09-23 15:39:09 UTC

The localized description is not added by the Image Annotator. It is added by Template:Information:
http://commons.wikimedia.org/wiki/Template:Information

Comment 30 Jarek Tuszynski 2010-09-23 16:17:11 UTC

(In reply to comment #29)
> The localized description is not added by the Image Annotator. It is added by
> Template:Information:
> http://commons.wikimedia.org/wiki/Template:Information

I should have known better than use catch-all word like "localized". I agree the Localized/Internationalized descriptions (in the language of the user) provided by Information, Book or Artwork templates will still be there, but descriptions linked to a specific locations in the image ("localized"?) are gone. Those descriptions were used to annotate a each face in a group image (replacing "5th head, with hat, in the 6th row" kind of descriptions), each building in a panorama, a signature or other inscription in a painting. A lot of effort was put into annotating images to provide more information to the final user. For example every known person in the famous http://commons.wikimedia.org/wiki/File:Stroop_Report_-_Warsaw_Ghetto_Uprising_06b.jpg was identified.

Comment 31 Tisza Gergő 2010-09-26 21:46:11 UTC

ImageAnnotator is enabled by default on hu.wikipedia (though only actually used on a handful of images). Is that a problem, or is it OK to use on low-traffic sites?

Comment 32 User:Docu 2010-09-27 04:47:24 UTC

The dicussion at [[Commons:Commons:Administrators'_noticeboard#Stats]] suggests that this isn't related to ImageAnnotator. 

The current fix for this problem broke a lot of pages on Commons. Please reexamine this problem.

Comment 33 Tim Starling 2010-09-27 05:29:02 UTC

(In reply to comment #31)
> ImageAnnotator is enabled by default on hu.wikipedia (though only actually used
> on a handful of images). Is that a problem, or is it OK to use on low-traffic
> sites?

Yes it's OK to use it on hu.wikipedia.org for now. The problem was an overload, that's why it happened at the weekly peak time. 

(In reply to comment #32)
> The dicussion at [[Commons:Commons:Administrators'_noticeboard#Stats]] suggests
> that this isn't related to ImageAnnotator. 

All I see there is one single person (Slomox) doing some wishful thinking.

Comment 34 MZMcBride 2010-09-27 06:12:09 UTC

(In reply to comment #32)
> The dicussion at [[Commons:Commons:Administrators'_noticeboard#Stats]] suggests
> that this isn't related to ImageAnnotator. 
> 
> The current fix for this problem broke a lot of pages on Commons. Please
> reexamine this problem.

I'm re-resolving this bug as "fixed."

This bug was about getting action=parse re-enabled on Wikimedia wikis. The bug summary and comment 0 both make this clear.

It's quite possible that other issues have been exposed subsequent to this bug. In particular, there should probably be a bug about ImageAnnotator being turned into an extension, if one hasn't been filed already. But that doesn't change the resolution of this bug. If there are new issues, file separate bugs. This issue (i.e., action=parse being disabled on Wikimedia wikis), as far as I'm aware, is completely resolved.

Comment 35 User:Docu 2010-09-27 06:31:49 UTC

Ok, I re-opened it mainly because of the investigation part, but obviously if it's just the "re-enable" part that is important, no problem then.

Comment 36 Tim Starling 2010-09-28 02:14:58 UTC

ImageAnnotator can be re-enabled for now. 

Further testing indicates that the "byte hit ratio" figure in squid includes error messages, and a lot of them were probably being sent at the time in question. The error counters (server.http.errors and client_http.errors) are apparently broken and never incremented.

The issue occurred at peak time, disabling action=purge probably reduced the server load to slightly below peak, bringing demand back under capacity. There are several things we could have disabled which would have had the same effect.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links