Last modified: 2014-03-26 09:32:07 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21262, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19262 - Pages with a high number of templates suffer extremely slow rendering or read timeout for logged in users
Pages with a high number of templates suffer extremely slow rendering or read...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: High major with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/List_of_...
aklapper-fixedbyLua?
: platformeng
: 28744 41863 41941 44982 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-17 15:55 UTC by j.mccranie
Modified: 2014-03-26 09:32 UTC (History)
18 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description j.mccranie 2009-06-17 15:55:05 UTC
I think I reported this once before, but the problem still exists.  I checked other bug reports that talk about the slowness.  They say they have been resolved, but this problem has not.

Long pages on the English Wikipedia that have a lot of links such as [[List of chess books, A-L]] usually take 30-35 seconds to load.  A couple examples from today:

<!-- Served by srv137 in 34.009 secs. -->
<!-- Served by srv201 in 31.683 secs. -->

Before when I was discussing this problem (probably about three months ago), there was something missing dealing with cache in the downloaded file (I forgot what).  

If the page has NOT changed and I am NOT logged in, it is fast.  Otherwise it is slow.  People were suspecting something in my Preferences is causing it, perhaps a gadget.  I don't use many gadgets except Twinkle.

I've tested it under IE, Firefox, and Chrome.  Chrome works the best - it seems to be fast if the page has not changed, even if I am logged in.  The others are slow anytime I'm logged in, and the page has changed.
Comment 1 Chad H. 2009-06-17 20:38:05 UTC
Updating platform to none, as this isn't a platform-dependent issue.
Switching component to Page rendering, as this is a parser issue.

Comment 2 MZMcBride 2011-01-14 23:23:14 UTC
I don't believe this problem is related to the number of links. I believe it is due to the number of instances of particular templates such as [[Template:Cite book]]. A simple test should be sufficient to demonstrate this.

If I copy the current text of [[List of chess books, A-L]]] (oldid: <http://en.wikipedia.org/w/index.php?oldid=407667664>) into a sandbox, it takes approximately 39.5 seconds to render according to the page source ("<!-- Served by srv195 in 39.529 secs. -->"), using ?action=purge (<http://en.wikipedia.org/w/index.php?oldid=407920835&action=purge>) while logged in.

If I take the same text, put it through [[Special:ExpandTemplates]] and save it to my sandbox, it takes approximately 5.3 seconds to render according to the page source ("<!-- Served by srv273 in 5.353 secs. -->"), using ?action=purge (<http://en.wikipedia.org/w/index.php?oldid=407924303&action=purge>) while logged in.

Special:ExpandTemplates, of course, full expands the templates, their ParserFunctions parser functions, and other magic word variables, while leaving the links. This makes it fairly clear that it is not the number of links that is to blame for the slow rendering time, but is instead the number of instances of particular templates.

I'm updating the bug summary from "large pages with a lot of links still slow" to "Large pages with a high number of particular templates have unacceptably slow rendering time for logged in users" accordingly.
Comment 3 Chad H. 2011-04-29 15:39:39 UTC
*** Bug 28744 has been marked as a duplicate of this bug. ***
Comment 4 Ryan Kaldari 2011-04-29 17:25:09 UTC
Changing description and severity per duped bug. Not being able to save articles means loss of new data. Also, I get read timeout errors when trying to view diffs of articles like:
http://en.wikipedia.org/wiki/List_of_former_NTA_Film_Network_affiliates
Comment 5 Ryan Kaldari 2011-04-29 18:39:19 UTC
Here are some reports for articles with lots of citation templates and extremely long load times:

List of former NTA Film Network affiliates (gives read timeout errors on save or view diff):
Document render time: 2-3 minutes
Preprocessor node count: 424325/1000000
Post-expand include size: 1481257/2048000 bytes
Template argument size: 353396/2048000 bytes
Expensive parser function count: 0/500

List of chess books, A-L:
Document render time: 50 seconds
Preprocessor node count: 376837/1000000
Post-expand include size: 1595542/2048000 bytes
Template argument size: 450957/2048000 bytes
Expensive parser function count: 1/500

World War II:
Document render time: 49 seconds
Preprocessor node count: 223394/1000000
Post-expand include size: 1599138/2048000 bytes
Template argument size: 563135/2048000 bytes
Expensive parser function count: 7/500

List of chess books, M-Z:
Document render time: 46 seconds
Preprocessor node count: 343867/1000000
Post-expand include size: 1682659/2048000 bytes
Template argument size: 462589/2048000 bytes
Expensive parser function count: 1/500

Barack Obama:
Document render time: 42 seconds
Preprocessor node count: 256842/1000000
Post-expand include size: 2026458/2048000 bytes
Template argument size: 823644/2048000 bytes
Expensive parser function count: 21/500

Virginia:
Document render time: 41 seconds
Preprocessor node count: 172228/1000000
Post-expand include size: 1679062/2048000 bytes
Template argument size: 833123/2048000 bytes
Expensive parser function count: 28/500
Comment 6 j.mccranie 2011-05-04 04:45:07 UTC
I'm glad this problem is finally getting some attention.  I don't know if it is the same problem, but articles like [[Stalemate]] on the English Wikipedia can take more than 20 seconds to load (and longer to do a diff or save an edit.)
Comment 7 j.mccranie 2011-05-04 18:44:03 UTC
[[Endgame tablebase]] on the English WP also takes a long time to load.
Comment 8 Ryan Kaldari 2011-05-04 19:08:15 UTC
There are undoubtedly thousands of articles on en.wiki that take over 30 seconds to load. If you find any that take a minute or longer, however, those might be useful for testing against and/or profiling.
Comment 9 j.mccranie 2011-05-04 19:13:11 UTC
Can something be done to get these down to 10 seconds or less?
Comment 10 Platonides 2011-05-05 20:21:33 UTC
> I'm glad this problem is finally getting some attention.  I don't know if it is
> the same problem, but articles like [[Stalemate]] on the English Wikipedia can
> take more than 20 seconds to load (and longer to do a diff or save an edit.)

You don't need to wait for rendering it if you just want a diff. Change your preferences to show content-less diffs or apppend diffonly=1 to the url.
Subsequent views should be cached. Do you have some cache-breaking preference enabled? (eg. marking stub links)
Comment 11 j.mccranie 2011-05-05 20:29:34 UTC
Do you mean the "do not show page content below diffs" option?

As far as cache-breaking preferences, not as far as I know, but I don't know the implications of all of the options.
Comment 12 Platonides 2011-05-05 21:18:49 UTC
Yes. That should give you faster diffs, as the slow part is doing the rendering (but you need an additional click to see that content).

The worst cache offenders are 'never show cached page' and the stub threshold. Other user preferences isolate groups of users so that you can only get a cached page if someone (inclusing yourself) with the same preference set has viewed it recently before.
Comment 13 j.mccranie 2011-05-06 02:33:10 UTC
I can't find "never show cached page" or "stub threshold" under "My preferences" - where are they?  Also, what other preferences isolate groups - that sounds like my problem.
Comment 14 MZMcBride 2011-05-06 02:48:44 UTC
(In reply to comment #9)
> Can something be done to get these down to 10 seconds or less?

You should follow bug 26786.

(In reply to comment #13)
> I can't find "never show cached page" or "stub threshold" under "My
> preferences" - where are they?

"Disable browser page caching" and "Threshold for stub link formatting (bytes)" under the "Appearance" tab.

(In reply to comment #13)
> Also, what other preferences isolate groups - that sounds like my problem.

I don't really follow this. _Anything_ that requires a page to be parsed (purging, showing content under a diff, the parse API module, previewing a page, saving an edit to a page, etc.) is going to be slow with a lot of citation templates. A very limited number of user preferences might make this problem worse, but the underlying problem is going to remain a problem no matter what your user preferences are set to.
Comment 15 j.mccranie 2011-05-06 03:02:43 UTC
OK, I have "threshold for stub" disabled and "disable browser page caching" is not checked.  These are the way they should be, right?
Comment 16 Bawolff (Brian Wolff) 2011-05-06 03:06:56 UTC
yes.

Note the "disable browser page caching" only affects how mediawiki gives 304 not modified responses. As far as i know, it does not mess with parser cache (btw, whats the point of that pref anyways, seems pointless, but thats off topic)
Comment 17 Platonides 2011-05-07 22:26:16 UTC
My bad. You are completely right, Bawolff (description + looking pointless).
Comment 18 Mark A. Hershberger 2011-05-09 22:20:18 UTC
per May 2, 2010 bug triage: Please let robla unassign himself so that he is reminded about this he has time to incorporate it into our future development.
Comment 19 Ryan Kaldari 2011-06-23 23:41:14 UTC
Any updates on this? Editors are now resorting to untemplating citations so that pages will load in a reasonable time. I just tested the Barack Obama article and got a 54 second load time (not counting the js).
Comment 20 MZMcBride 2011-06-23 23:50:53 UTC
(In reply to comment #19)
> Any updates on this? Editors are now resorting to untemplating citations so
> that pages will load in a reasonable time. I just tested the Barack Obama
> article and got a 54 second load time (not counting the js).

I think Tim did some tests regarding this problem by using HipHop instead of Zend. It's a band-aid, but it'll help for a while. HipHop dropped the parsing time down to 10ms or so on the Barack Obama article, I think? But MediaWiki isn't close to being able to switch to HipHop, as far as I'm aware. Tim started support on that, so a healthy framework exists, but the fine-grained support is all missing at this point.

These particular templates (the citation ones) could be converted into a PHP extension (and Svip has done some work on this in extensions/TemplateAdventures), but people disagree about whether that's the right approach, and perfect is the enemy of the done.
Comment 21 Ryan Kaldari 2011-06-23 23:55:18 UTC
Converting the citation system from templates to PHP is an interesting idea. We could add RefToolbar into it while we're at it (which is still on-wiki JavaScript).
Comment 22 MZMcBride 2011-06-24 00:00:07 UTC
(In reply to comment #21)
> Converting the citation system from templates to PHP is an interesting idea. We
> could add RefToolbar into it while we're at it (which is still on-wiki
> JavaScript).

bug 26786 — sorry, should've included this in my last comment. It has most of the discussion regarding this idea.
Comment 23 j.mccranie 2011-07-11 14:57:50 UTC
This bug has been getting worse on the English Wikipedia.  Now, when I am logged in, articles that don't have nearly as many links are taking 35 seconds to load.  Examples are [[Stalemate]] amd [[Zugzwang]].  These use a moderate number of inline author/date (Harvard) references.  

Doing a diff, editing, or comparing selected versions takes probably 2 minutes or longer - when it works.  

This is becoming a severe problem for when I am logged in.
Comment 24 j.mccranie 2011-07-11 15:13:32 UTC
Well, that is the way it was yesterday.  It isn't as bad today.  The pages load quick enough but the edit takes a while.
Comment 25 Bawolff (Brian Wolff) 2011-07-11 15:29:14 UTC
>This is becoming a severe problem for when I am logged in

If its just one you're logged in, that would probably mean you are using a preference that interferes with page caching (like the stub threshold option).
Comment 26 j.mccranie 2011-07-11 15:32:31 UTC
I have the stub threshold disabled - what else can do it?
Comment 27 Ryan Kaldari 2011-08-12 17:12:56 UTC
It looks like this issue affects more than just citation templates. Articles with large numbers of Coord templates are also taking extremely long to load.

For example:
http://en.wikipedia.org/wiki/List_of_United_Kingdom_locations:_Am-Ar
took 59 seconds (excluding images and javascript)
Comment 28 Rob Lanphier 2011-09-02 20:58:25 UTC
This isn't really a single issue.  Every page is going to have a different specific reason for taking a long time to load.  Generally, the problem will be some combination of the following problems:
1.  Our template language is too slow
2.  Our PHP interpreter is too slow
3.  The templates being used by the page are too complicated or inefficient

We have initiatives to solve the first two problems (#1: use a new template language like Wikiscript, Lua, or Javascript; #2: use HipHop).  However, if a page is taking over a minute to parse, chances are that the templates themselves need to be made more efficient.  No matter how efficient we make the template language, it will always be possible to more than offset the efficiency gain with more complicated templates.  The more efficient we make templates, the more complicated people will make templates.

I think more sandbox testing like what MZMcBride did (see comment #2) would be very valuable to isolate specific templates that are ripe for optimization.

I'm not sure if this particular bug is going to be valuable to keep open.  It's not specific enough to ever be closed.
Comment 29 Ryan Kaldari 2011-09-02 21:52:44 UTC
Thanks for the enlightening post Rob. If template complexity is really to blame, we need to make a concerted effort to communicate this to the community. For the past several years the community has been told the opposite: That they should not worry about template costs or server performance issues. For example:

"Generally, you should not worry much about little things like templates and 'server load' at a policy level. If they're expensive, we'll either fix it or restrict it at a technical level; that's our responsibility..." -- Brion Vibber, 2006

In fact, there's an entire essay on en.wiki: "Wikipedia:Don't worry about performance"

Clearly this mindset is now outdated. Perhaps you or Brion could post about this issue on the Wikimedia Blog so that we can start to change this mindset and get people working on template optimization.
Comment 30 j.mccranie 2011-09-02 23:16:57 UTC
When I first posted this, it was taking 35 seconds or longer.  It got worse.  But I checked it today on IE and Firefox and it was fast - about 2 seconds.  Has something been fixed?
Comment 31 MZMcBride 2011-09-03 02:34:02 UTC
(In reply to comment #29)
> In fact, there's an entire essay on en.wiki: "Wikipedia:Don't worry about
> performance"
> 
> Clearly this mindset is now outdated. Perhaps you or Brion could post about
> this issue on the Wikimedia Blog so that we can start to change this mindset
> and get people working on template optimization.

That's most certainly not the solution. This can't be stressed enough. Tim and I have discussed this (though he comes down on your side still, I think, or did at one point).

The scope of Wikimedia projects is the dissemination of free educational material. When you make it the job of wiki users to debug complex wiki-templates and try to fine-tune them, it's a very bad and very undesirable situation.

Users should not be worried about performance, by and large. They certainly shouldn't be concerned that they're using too many calls to citation templates (of all things!). We want users to be encouraged to cite information and build content. That's the goal. We want to discourage mindless "optimizations" (without any real debugging tools) that users will inevitably and invariably make in the name of fixing a system that they didn't break and that's not their responsibility to maintain.

(In reply to comment #28)
> This isn't really a single issue.  Every page is going to have a different
> specific reason for taking a long time to load.

Err, prove it. The pages that I've seen that are slower all have the same root cause: too many calls to particular types of templates. Citation templates are the biggest issue, but the coord(inates) family and the convert family have also caused problems in the past.

> I think more sandbox testing like what MZMcBride did (see comment #2) would be
> very valuable to isolate specific templates that are ripe for optimization.

It's valuable when there's a dearth. But at the moment, finding large pages that take an excessive amount of time to load/render/parse is easy. And the solution(s) are already known (as Domas would say, this is very low-hanging fruit from an optimization standpoint). It's a matter of implementing the solutions (which I guess Tim and Victor are working on).

(And, going forward, users ideally won't even have a real concept of templates outside of "those things that make wiki-editing more standardized." We want to get users away from thinking about "{{cite web}}" or "{{coord}}" or anything like that. That's echoing what Brion and many others have said, especially as work on the new parser ramps up. Trying to get users to care about these templates and then trying to get them to make them faster is a step in the wrong direction.)

> I'm not sure if this particular bug is going to be valuable to keep open.  It's
> not specific enough to ever be closed.

This bug is fine. When there is a better system (or systems) in place that make the pages load faster, this bug can be closed. Just because a bug is difficult or is going to likely remain open for a long time doesn't make it any less valid. There's certainly something problematic and actionable here.

(In reply to comment #30)
> When I first posted this, it was taking 35 seconds or longer.  It got worse. 
> But I checked it today on IE and Firefox and it was fast - about 2 seconds. 
> Has something been fixed?

Sounds like you just hit cache. (Or I suppose it's possible someone drastically reduced the number of template calls in the page you're looking at.) Do you have a particular example page/URL? Have you tried with ?action=purge appended?
Comment 32 j.mccranie 2011-09-03 02:43:50 UTC
But when it takes 35+ seconds to load a page, performance does matter!  Many readers are not going to wait that long and will miss the content.

And when it takes 2 minutes to get a diff or an edit screen, performance does matter.  Some editors (including me) are just not going to wait for that long of a time just to get to the edit screen or to check other's edits.
Comment 33 Roan Kattouw 2011-09-03 10:01:24 UTC
(In reply to comment #32)
> But when it takes 35+ seconds to load a page, performance does matter!
No one said it didn't matter. We all agree this is a problem, it's just that we're saying this is something for *us* (developers) to fix, not for template editors necessarily.
Comment 34 Ryan Kaldari 2011-09-04 20:59:59 UTC
Well, it seems Rob's comments have muddied the waters a bit. Correct me if I'm wrong, but Rob seems to be saying that no matter how much effort the developers put into back-end performance and optimization, the current degree of template complexity means that well-cited articles are always going to be slow. If that is the case, then I think we need to tell that to the community and have them work on template optimization. If that isn't the case, then we need to be clear about that as well, and make sure that this bug stays a high priority for the developers.

As for this bug being too vague to fix, I will personally consider it fixed when I no longer get read timeouts from trying to view diffs, which I still do as of today.
Comment 35 Rob Lanphier 2011-09-07 05:04:38 UTC
Here's what I'm saying:  current performance is too slow.  We know it's too slow, and we have at least a couple initiatives that should make things significantly faster, along with other less dramatic improvements that we should also implement if we still have problems.

However, what I'm also saying is that there's no way to give people a general purpose programming environment, and then expect that it's going to perform well no matter what anyone throws at it.  It's just not possible.  It can perform well for most reasonable tasks, and we're not *aware* of any tasks that are unreasonable, but there's no guarantee that everything that every programmer does is going to be reasonable.  The programmer may be trying to accomplish something reasonable, but I've seen even very good programmers make very poor performance choices in their code.  On a wiki anyone can edit, there will almost always be someone(s) who is/are doing it wrong.

I believe that Brion's comment in 2006 was a reaction to the prevailing mood at the time.  If I recall his account of things correctly, there was a lot of pseudoscientific "thou shalt not use the foobar template, for you will anger the performance gods, and they will smite the server kittehs".  He saw that people were overreacting to advice about template performance, with no one actually doing any genuine profiling.

So, now the pendulum seems to have swung in the other direction.  Yes, we need references in articles.  Yes, there are plenty of other perfectly reasonable uses of templates.  Don't stop doing those things.   That said, if there are more efficient ways of achieving the same end using a more efficient template, please, for pete's sake, make the template more efficient.  Also, please help us figure out which templates are expensive and why they're expensive.  If we can actually narrow down which parts of templates suck, developers may have a better idea of what parts should be implemented directly in PHP or even C if need be.

My point is this: there's not a "problem".  There are "problems".  Having this all in a single bug suggests there is a single "problem", and that's what I have a problem with.
Comment 36 Tim Starling 2011-10-24 22:23:32 UTC
Just removing the COINS metadata from {{Citation/core}} would speed up article rendering significantly.
Comment 37 Ryan Kaldari 2012-11-04 08:49:34 UTC
It seems that it is currently very difficult to edit http://en.wiktionary.org/wiki/a due to this bug. It typically times out when trying to save. Here is the report for the page:

Preprocessor visited node count: 479524/1000000
Preprocessor generated node count: 132979/1500000
Post-expand include size: 1772116/2048000 bytes
Template argument size: 224175/2048000 bytes
Highest expansion depth: 31/40
Expensive parser function count: 219/500

Looking forward to the deployment of Scribunto :)
Comment 38 Derk-Jan Hartman 2012-11-08 09:18:33 UTC
*** Bug 41863 has been marked as a duplicate of this bug. ***
Comment 39 uwe 2012-11-10 00:43:01 UTC
Also it is impossible to edit article https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D9%84%D8%A7%D9%85 (arabic article about Islam)

Request: POST http://ar.wikipedia.org/w/index.php?title=%D8%A5%D8%B3%D9%84%D8%A7%D9%85&action=submit, from 41.43.16.246 via cp1006.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.141 (10.64.0.141)
Error: ERR_READ_TIMEOUT, errno [No Error] at Sat, 10 Nov 2012 00:09:00 GMT
Comment 40 Andre Klapper 2012-11-10 13:27:28 UTC
*** Bug 41941 has been marked as a duplicate of this bug. ***
Comment 41 Ryan Kaldari 2012-11-12 02:51:59 UTC
Confirmed that it is no longer possible to edit the Gaddafi article without parser timeout (http://en.wikipedia.org/wiki/Muammar_Gaddafi). That makes 3 reports of significantly important articles suffering read timeout in the past week (on 3 different wikis). Since this is a more significant bug than any of the others currently assigned to Highest priority, I'm going to bump it to Highest as well.

Would it be possible for us to adjust the parser timeout time until Scribunto is deployed?
Comment 42 Tim Starling 2012-11-12 03:51:43 UTC
(In reply to comment #41)
> Confirmed that it is no longer possible to edit the Gaddafi article without
> parser timeout (http://en.wikipedia.org/wiki/Muammar_Gaddafi). That makes 3
> reports of significantly important articles suffering read timeout in the past
> week (on 3 different wikis). 

According to slow-parse.log on fluorine, parse times for [[Muammar Gaddafi]] have been stable at 30-35 seconds since the log began in May. The [[a]] article on en.wiktionary.org has been taking more than 30 seconds since June 4. This is not a new or rapidly-changing problem.

> Since this is a more significant bug than any of
> the others currently assigned to Highest priority, I'm going to bump it to
> Highest as well.
> 
> Would it be possible for us to adjust the parser timeout time until Scribunto
> is deployed?

I don't think that would be a good idea, I think it would worsen our exposure to DoS attacks, and encourage template editors to make articles render even more slowly.
Comment 43 Andre Klapper 2012-11-12 14:40:07 UTC
Ryan: As you bumped this back to highest priority, is anybody working on this? I'd like to have an assignee for this...
Comment 44 Tim Starling 2012-11-12 22:31:40 UTC
(In reply to comment #43)
> Ryan: As you bumped this back to highest priority, is anybody working on this?
> I'd like to have an assignee for this...

Three members of the platform team are working on Lua support, and I removed the COINS metadata from {{Citation/core}} on the English Wikipedia, reducing the parse time for articles with many citations by about 25%. [[Muammar Gaddafi]] now takes only 23 seconds.
Comment 45 Ryan Kaldari 2012-11-13 00:06:26 UTC
Thanks. I was considering doing that myself, but your edit+opinion carries a lot more weight :)

Moving priority back to High for now.
Comment 46 Andre Klapper 2012-11-13 00:14:41 UTC
(In reply to comment #44)
> I removed the COINS metadata from {{Citation/core}} on the English Wikipedia

Thanks for the workaround!
Comment 47 Tim Starling 2012-11-13 00:38:26 UTC
I've opened a discussion about the Wiktionary problem here: 

<https://en.wiktionary.org/wiki/Wiktionary:Grease_pit/2012/November#Expanding_the_list_templates>
Comment 49 Project LibX 2012-11-15 19:59:49 UTC
LibX (libx.org) is a COinS processor, used by over 200,000 users affiliated with over 1,000 libraries worldwide.  We link users to their OpenURL resolvers to obtain referenced items - journal and newspaper articles and books.

We are in the middle of a project to greatly improve COinS processing, with Wikipedia as the primary beneficiary.  Whereas the current implementation simply links users; our planned implementation would contact the user's library through such APIs as the Summon API and directly find links to where the user can get the item.  This is of tremendous benefit, particularly to users of academic libraries with subscriptions to journal database or news paper archives.

Please restore this functionality, either by restoring COinS, ajaxing COinS, or using alternative microformats; please provide this functionality such that not only metadata extraction is facilitated (like Zotero needs), but also such that a user interface can be provided that alerts users that an agent has processed the metadata - LibX, for instance, places a 'cue' where a COinS appears; we would like to add a tooltip.  See an example of our envisioned design here: http://libx.org/how-to-set-up-libx-with-the-summon-api/  (This shows what we right now do for ISBNs on a page - we are working on doing just that for COinS, though would probably stop this project if Wikipedia drops COinS since you are the major provider at this point.)

Thank you for your consideration.
Comment 50 Bawolff (Brian Wolff) 2012-11-16 02:48:22 UTC
(In reply to comment #49)
> LibX (libx.org) is a COinS processor, used by over 200,000 users affiliated
> with over 1,000 libraries worldwide.  We link users to their OpenURL resolvers
> to obtain referenced items - journal and newspaper articles and books.
> 
> We are in the middle of a project to greatly improve COinS processing, with
> Wikipedia as the primary beneficiary.  Whereas the current implementation
> simply links users; our planned implementation would contact the user's library
> through such APIs as the Summon API and directly find links to where the user
> can get the item.  This is of tremendous benefit, particularly to users of
> academic libraries with subscriptions to journal database or news paper
> archives.
> 
> Please restore this functionality, either by restoring COinS, ajaxing COinS, or
> using alternative microformats; please provide this functionality such that not
> only metadata extraction is facilitated (like Zotero needs), but also such that
> a user interface can be provided that alerts users that an agent has processed
> the metadata - LibX, for instance, places a 'cue' where a COinS appears; we
> would like to add a tooltip.  See an example of our envisioned design here:
> http://libx.org/how-to-set-up-libx-with-the-summon-api/  (This shows what we
> right now do for ISBNs on a page - we are working on doing just that for COinS,
> though would probably stop this project if Wikipedia drops COinS since you are
> the major provider at this point.)
> 
> Thank you for your consideration.

There's probably a good chance the Wikipedians will add back COinS metadata once scribunto is deployed assuming the assumed performance predictions hold true. At this point I'd recommend just waiting it out.
Comment 51 Project LibX 2012-11-16 14:15:16 UTC
So I read through this thread, and I'm amazed, to put it politely.

There is a performance problem that affects only people logged into Wikipedia, which has got to be a small percentage of Wikipedia users, probably just contributors and editors.  In response, you disable a crucial feature that allows average users to actually find the article Wikipedia cites.  Not only do you disable it for editors, you disable it for everyone!

You know that people make fun of Wikipedia for its lack of reliable sources, and the circularity that sometimes results:
http://itst.net/wp-content/uploads/2009/06/informationsgesellschaft-wikipedia-presse-1024x768.jpg

I conclude a number of things. First, editors don't seem to be in the business of checking cited sources. Otherwise, clicking on a COinS, getting the primary source would be a *frequent* operation for them, and they'd be clamoring for tools like LibX that streamline this process.

Second, why was this disabled both for editors (where, I'm guessing, the page is rendered every time a visit occurs), and ordinary users (who, I'm guessing, fetch a cached, prerendered page?)  Why can't the COinS be in the cached page the majority of users sees?

Third, there doesn't seem to be any metadata in the page right now. See point #1 - how are editors checking primary sources efficiently? Why did you disable this feature *before* you had a replacement?
Comment 52 p858snake 2012-11-16 19:15:51 UTC
(In reply to comment #51)
> So I read through this thread, and I'm amazed, to put it politely.
> 
> There is a performance problem that affects only people logged into Wikipedia,
> which has got to be a small percentage of Wikipedia users, probably just
> contributors and editors.

[Citation Needed] The issue affects the ability to edit and save the pages which in turns affects the non logged in users because people don't edit the pages to update.


> In response, you disable a crucial feature that
> allows average users to actually find the article Wikipedia cites.  Not only
> do you disable it for editors, you disable it for everyone!

This isn't a crucial feature, The primary data (the refernces) are still in the page.


> I conclude a number of things. First, editors don't seem to be in the business
> of checking cited sources.

[Citation Needed]


> Second, why was this disabled both for editors (where, I'm guessing, the page
> is rendered every time a visit occurs), and ordinary users (who, I'm guessing,
> fetch a cached, prerendered page?)  Why can't the COinS be in the cached page
> the majority of users sees?

Because we currently don't have a system where we can do that.

> Third, there doesn't seem to be any metadata in the page right now. See point
> #1 - how are editors checking primary sources efficiently? Why did you disable
> this feature *before* you had a replacement?

Because most people would view actually editing the page is more important than metadata making source checking easier.


Also it would be nice if you changed your Bz account from showing that your a role account for a business/website to a individual so we know who were actually talking to.
Comment 53 Project LibX 2012-11-16 20:57:16 UTC
libx.org@gmail.com is backed by the LibX Team; I'm in charge of the technical aspects.  LibX is no business - it's open source; though we have received federal grants to employ some students, it's primarily community driven. Our key community are thousands of librarians who have set it up for their own local communities.

Currently, I'm happy that this happened this week, and not 3 months from now, because I was just able to recruit one student to (finally) improve support for COinS - Wikipedia was our primary target. We were going to analyze the quality of the COinS (which btw wasn't good - I think that's because you had Wikitags in the metadata, like brackets), then decide on which services we needed to use to make sure the user can get to the item cited.  Note that libraries have been slow to provide services that expose their knowledge base of what they hold and how their users can get access to it, which is why it's taken so many years that such a project has become feasible at all. Today, it is. Discovery systems like Summon provide full-text indices that not only include the combined content of many traditional abstracting and indexing databases, but also news paper archives, traditional library catalogs, and even local institutional sources like electronic theses and dissertation databases.

In any event, consider doing something - if the performance of your template structure is the issue, use other techniques.  Provide an AJAX service, or embed the data in client-side JavaScript (like nytimes.com does), then put it together on the client.  From our perspective, the goal is to show the user, upon a mouse gesture, whether they have access to an item that's cited in a Wikipedia article.  If so, a single click of the mouse should get them there.  This goal is difficult to achieve if only the unstructured, formatted data is present. But it's a worthwhile goal and, I'm convinced, would truly help editors if/when they check sources.

 - Godmar Back (libx.org@gmail.com)
Comment 54 Chad H. 2012-11-16 21:10:16 UTC
(In reply to comment #53)
> In any event, consider doing something - if the performance of your template
> structure is the issue, use other techniques.
>

We are doing something different and it's under active development (and much further along than starting fresh with some AJAXy hacks). It's called Lua/Scribunto, and it was mentioned in comment 50.
Comment 55 Project LibX 2012-11-16 21:26:52 UTC
I'm familiar with Lua (the programming language), and googling Scribunto leads to http://www.mediawiki.org/wiki/Extension:Scribunto which, upon 10 second inspection, doesn't explain how you'll be providing metadata.

My use of the acronym 'AJAX' was referring to the asynchronous nature any service would need to have to avoid holding up the rendering of the page, which seemed to be your main concern. In other words, the page would be rendered and sent to the user without metadata, just containing a quickly-generated key for each item. Only when the user accesses it, such as by hovering over an item, would a separate service be accessed that provides the metadata in usable form. You can see this technique in action in many webpages, and it's not a hack at all.
Comment 56 Chad H. 2012-11-16 22:18:46 UTC
(In reply to comment #55)
> I'm familiar with Lua (the programming language), and googling Scribunto leads
> to http://www.mediawiki.org/wiki/Extension:Scribunto which, upon 10 second
> inspection, doesn't explain how you'll be providing metadata.
> 

It's not just about metadata...the point is that we'll be able to (re)introduce complex things to templates without causing them to take ages to render (which was the whole reason for removing it).
Comment 57 Ryan Kaldari 2012-11-16 22:29:37 UTC
The current plan is to deploy Scribunto to the production wikis in early 2013 (although I don't know personally if we are still on target for that). One of the first things that Scribunto will be used for is re-implementing the Citation/core template on English Wikipedia. Scribunto will allow our citation templates to be generated with a real programming language (Lua), rather than through a convoluted Turing machine of Wikitext. It is also expected that this conversion will dramatically improve page parsing time so that we are no longer teetering on the edge of the parser timeout abyss.
Comment 58 Project LibX 2012-11-17 01:53:19 UTC
So - "reimplement" here means that COinS will just show up again, or will you provide metadata in a different format.

If you provide again COinS, it would be nice if you improved your implementation and made it compliant with NISO Z39.88's context object format. That would help tremendously in making items findable more easily.
Comment 59 Bawolff (Brian Wolff) 2012-11-17 19:42:33 UTC
(In reply to comment #58)
> So - "reimplement" here means that COinS will just show up again, or will you
> provide metadata in a different format.
> 
> If you provide again COinS, it would be nice if you improved your
> implementation and made it compliant with NISO Z39.88's context object format.
> That would help tremendously in making items findable more easily.

Well the "you" in that sentence is a bit ambiguous. "We" (MW devs) didn't have anything to do with the COinS metadata. Presumably when it gets re-added it will be done by the Wikipedians, so you would have to talk to them about different formats to use.
Comment 60 Mark A. Hershberger 2012-11-17 19:54:06 UTC
(In reply to comment #59)
> "We" (MW devs) didn't have
> anything to do with the COinS metadata. Presumably when it gets re-added it
> will be done by the Wikipedians, so you would have to talk to them about
> different formats to use.

Note that you should talk to the wiki editors on-wiki.  You probably need to post something about COinS on [[WP:VPT]].  They will at least be able to direct you to the right place.
Comment 61 Bawolff (Brian Wolff) 2012-11-17 19:59:17 UTC
There's conversation about the removal at [[template_talk:Citation/core]].
Comment 62 Project LibX 2012-11-17 21:54:53 UTC
Thanks. At the URL you link to, there's talk about an existing API for metadata extraction.  Is this true?  We would be fine with an API, as long as it's REST so we can run it from the user's browser, and as long as it allows accessing the metadata for specific references on a page.
Comment 63 Aude 2013-02-14 11:56:18 UTC
*** Bug 44982 has been marked as a duplicate of this bug. ***
Comment 64 Sumana Harihareswara 2013-03-13 01:14:13 UTC
Project LibX, now that we've deployed Scribunto to English Wikipedia, it's a good time to engage with the English Wikipedia template editors regarding COinS metadata at https://en.wikipedia.org/wiki/Template_talk:Citation/core#LUA_deployed , if you haven't already.
Comment 65 MZMcBride 2013-05-06 19:34:58 UTC
Given the deployment of Scribunto/Lua to all Wikimedia wikis, I'm inclined to mark this bug as resolved/fixed. Certain pages such as [[wikt:a]] are still taking over 30 seconds to parse, however these individual cases should be split out into individual bugs so that appropriate modules can be written on specific wikis, in my opinion.
Comment 66 Mark A. Hershberger 2013-05-06 20:08:54 UTC
(In reply to comment #65)
> Given the deployment of Scribunto/Lua to all Wikimedia wikis, I'm inclined to
> mark this bug as resolved/fixed.

Agreed for all the reasons MZ gave.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links