Last modified: 2014-02-12 18:43:30 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44867, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 42867 - action=info always returns search engine index status "Not indexable" when curid is specified


Summary:	action=info always returns search engine index status "Not indexable" when cu...

Status:	UNCONFIRMED

Product:	MediaWiki
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	1.21.x
Hardware:	All All

Importance:	Low normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:	38531
Blocks:	38450
	Show dependency tree / graph

Reported:	2012-12-08 19:26 UTC by Richard Guk
Modified:	2014-02-12 18:43 UTC (History)
CC List:	3 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Richard Guk 2012-12-08 19:26:20 UTC

If the curid parameter is specified with action=info, the search engine index status is always reported as "Not indexable", which is generally incorrect.

Presumably the returned information is intended to relate to the canonical page, regardless of how it is specified in the URL, so that action=info results should be identical irrespective of whether the page is identified by curid ("Page ID") or by title ("Display title").

Something seems to be going wrong with passing the canonical title to setIndexPolicy() or getRobotPolicy() in InfoAction::pageInfo().

Example with the [[Trumptonshire]] article on enwiki:

http://en.wikipedia.org/w/index.php?title=Trumptonshire&action=info
-> Search engine status: "Indexable" (as expected)

http://en.wikipedia.org/w/index.php?curid=11652196&action=info
-> Search engine status: "Not indexable" (unexpected)

http://en.wikipedia.org/w/index.php?title=Trumptonshire&curid=11652196&action=info
-> Search engine status: "Not indexable" (unexpected)

(If both curid and title are specified, title is ignored. This is presumably by design.)


* created by bug 38531 - "Add (search) index status to MediaWiki's info action"

* tracking bug 38450 - "Reimplement MediaWiki's info action (tracking)"

Comment 1 Andre Klapper 2013-01-15 14:43:12 UTC

I have a feeling that this is a misunderstanding because of the misleading wording, which would make it a duplicate of bug 43935?
Richard, could you take a look?

Comment 2 Richard Guk 2013-01-15 15:16:04 UTC

Thanks for the pointer, but this bug is a technical error caused by an internal inconsistency, different from the terminology issue in bug 43935.

Note also that http://en.wikipedia.org/robots.txt disallows *all* subpages of http://en.wikipedia.org/w/

So all 3 URLs above should certainly show the *same* search status.

Since the info parameter is intended to provide information about the canonical page identified by the request (not about the URL through which the information happens to be requested), the search status should always be that of the the canonical page.

So, as previously stated, all 3 examples in comment #0 ought to display:
-> Search engine status: "Indexable"

Note that the info results are unaffected by whether the entry point is "/w/index.php" or "/wiki/PAGENAME", as demonstrated in these further examples:

Ex 4: http://en.wikipedia.org/wiki/Trumptonshire?action=info
-> Search engine status: "Indexable" (as expected)

Ex 5: http://en.wikipedia.org/wiki/?curid=11652196&action=info
-> Search engine status: "Not indexable" (unexpected)

Ex 6: http://en.wikipedia.org/wiki/Trumptonshire?curid=11652196&action=info
-> Search engine status: "Not indexable" (unexpected)

Comment 3 Richard Guk 2013-01-16 09:54:10 UTC

MZMcBride has pointed out elsewhere (bug 43935 comment 4) that the reported "Search engine status" corresponds to the <meta name="robots" content="noindex,follow" /> tag controlled by namespace settings or the __NOINDEX__ page behavior switch.

That explains the cause of this bug: replacing "action=info" with "action=view" in each request returns a wikipage that contains the noindex meta tag IF AND ONLY IF "Not indexable" is shown as the search engine status on the corresponding "action=info" page.

However, this behavior is still inappropriate, because the user expects to see information about the canonical page, regardless of which URL entry point or parameters are used to identify it.

For example, "Redirects to this page" would be zero if the information related to the URL, but it is instead always reports the number of redirects to the canonical page, as expected.

Compare, for example:

Ex 7: http://en.wikipedia.org/w/index.php?title=Wikipedia:Assume_good_faith&action=info
-> Search engine status: "Indexable"
-> Number of page watchers: "276"
-> Redirects to this page: "30"

Ex 8: http://en.wikipedia.org/w/index.php?curid=502959&action=info
-> Search engine status: "Not indexable" (bug!)
-> Number of page watchers: "276"
-> Redirects to this page: "30"

The search engine status is the only info reported inconsistently.

Comment 4 db [inactive,noenotif] 2013-03-10 17:22:51 UTC

Article::getRobotPolicy contains the condition for this:

} elseif ( $this->getContext()->getRequest()->getInt( 'curid' ) ) {
	# For ?curid=x urls, disallow indexing
	return array(
		'index'  => 'noindex',
		'follow' => 'follow'
	);
}

So works as designed.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links