Last modified: 2011-03-13 18:06:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 5708 - Google over-counts en.wikipedia.org pages
Google over-counts en.wikipedia.org pages
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.google.com/search?hl=en&q=...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-24 22:04 UTC by S Page
Modified: 2011-03-13 18:06 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description S Page 2006-04-24 22:04:53 UTC
If you search site:en.wikipedia.org for a term that occurs on lots of pages, you
get 153,000,000 hits.

This seems out of proportion to the 1,000,000 English Wikipedia pages.  It
implies Google's spider is hitting the site too many times, which can affect
performance and distorts Google's search results.

The site is doing the right thing with its fine use of robots.txt and
noindex,nofollow meta tags on "Edit this page" and "history".

In my experience Google over-counting can happen with endless addition or
modification of query parameters, or other code that tacks on extra URL cruft. 
To track it down you either have to scan the server access logs for unexpected
GET requests from Google's spider, or get a Google Search Appliance in-house to
provide more info.

Cheers, just letting you know.  I apologize for wasting your time if this is
expected behavior.

(I filed bug 5707 that terms in the footer of all pages like 'privacy' should
not be indexed.)
Comment 1 Rob Church 2006-04-24 22:34:46 UTC
There are a lot more than 1,000,000 pages on the English language Wikipedia.
Comment 2 JeLuF 2006-04-24 22:44:01 UTC
Google uses sitemaps to index our site, not spiders.

The numbers Google shows are rough guesses and by no means accurate.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links