Last modified: 2014-04-14 04:43:00 UTC
This was supposedly fixed (bug 16865; r45360). And though MediaWiki is indeed outputting "noindex", Google appears to be ignoring it and as such is indexing duplicate content. A few examples: https://www.google.com/search?q=inurl:curid+site:mediawiki.org 1. Discussion - MediaWiki www.mediawiki.org/?curid=84252 Mar 29, 2012 Hi! Searching for the shortest urls for wikis using scripts other then Latin was a longtime nightmare. urls using the "wgArticleId" from ... 2. Link to - MediaWiki www.mediawiki.org/?curid=84277 Mar 30, 2012 mw.config.set({,,,, wgPageName":"Ernst_Lossa","wgTitle":"Ernst Lossa", "wgCurRevisionId":99548829,"wgArticleId":2809853, ...} ...) nr. https://www.google.com/search?q=inurl:curid+site:wikipedia.org 3. [edit] Notes - Wikipedia en.wikipedia.org/wiki/index.html?curid=7490642&action=render My Brightest Diamond is the project of singer–songwriter and multi-instrumentalist Shara Worden. The band has released three studio albums, 2006's Bring Me ... 4. Wikipedia, the free encyclopedia en.wikipedia.org/wiki?curid= The 1950 Atlantic hurricane season was the first year in the Atlantic hurricane database (HURDAT) in which storms were given names by the United States Air ... 5. Wikipedia simple.wikipedia.org/?curid= This is the front page of the Simple English Wikipedia. Wikipedias are places where people work together to write encyclopedias in different languages. We use ... 6. Table tennis at the 2004 Summer Paralympics - Wikipedia, the free ... en.wikipedia.org/wiki/index.html?curid=1011065 Table Tennis at the 2004 Summer Paralympics was staged at the Galatsi Olympic Hall from September 18 to September 27. Competitors were divided into ten ... 7. Upper Eastside - Wikipedia, the free encyclopedia en.m.wikipedia.org/wiki/index.html?curid=19698600 A MiMo restaurant on Biscayne Boulevard in the Upper Eastside. The Upper Eastside is famous for its post war MiMo architecture, and is home to the MiMo ... 8. Robert Loggia - Wikipédia fr.wikipedia.org/wiki/?curid=899678 Translate this page Vous pouvez partager vos connaissances en l'améliorant (comment ?) selon les recommandations des projets correspondants. Robert Loggia est un acteur et ... What I found is that: - The ones from mediawiki.org are LiquidThreads pages. LQT apparently overrides this logic from Article.php and as such is not outputting "robots => index". So those are a flaw on our end. - #3 has action=render. That's never supposed to be indexed (separate bug?) but the way it is used circumvents some of our deferences. #3 accesses an article by the name of "index.html",, but then overrides the curid and tacks on action=render. Basically doing: en.wikipedia.org/wiki/Some_page_name?curid=7490642&action=render - #4 and #5 have an empty curid - #6 and #7 are more examples of this odd "index.html" title - #8 is like the ones on mediawiki.org except that these are not from LQT and are actually outputting "noindex". This is the main problem. Though it is somewhat outside the scope of this bug, I think we should: * Always output rel=canonical when viewing a regular page (whenever not on a Special page, not a non-View action, no diff or oldid) So any url, no matter how weirdly constructed, with: - /?title= - /w?title= - /w/index.php?title= - any of the above with curid instead of title - any of the above via /wiki/ - any of the above with action=view Right now we're only doing rel=canonical on redirects which makes no sense to me. It is perfectly file to output rel=canonical on the canonical page itself. * Always output noindex when not rel=canonical but are viewing a page. Any wikipage/action=view that is not a simple view of the latest version of an article, e.g. with diff or oldid
Filed a separate bug for action=render, bug 63891 .