Last modified: 2008-08-19 23:46:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T14860, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 12860 - Limiting sitemap items by namespace
Limiting sitemap items by namespace
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-01 15:28 UTC by Sergey Chernyshev
Modified: 2008-08-19 23:46 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Namespace limit patch for generateSitemap.php (824 bytes, patch)
2008-02-01 15:28 UTC, Sergey Chernyshev
Details
Patch to implement blacklisting and whitelisting namespaces for sitemap generation (1.23 KB, patch)
2008-02-13 04:10 UTC, Sergey Chernyshev
Details

Description Sergey Chernyshev 2008-02-01 15:28:19 UTC
Created attachment 4606 [details]
Namespace limit patch for generateSitemap.php

Sometimes in addition to restricting crawling by robots.txt, it's a good idea to limit a list of what goes to a sitemap.

E.g. it might be helpful in case some extensions create non-content namespaces (similar to MediaWiki namespace) and it doesn't make sense to include them into sitemap.

Attached is a patch to allow user specify a list of namespaces for which to generate sitemaps.
Comment 1 Brion Vibber 2008-02-13 02:12:17 UTC
Perhaps do this via command-line options instead of a site config var?
Comment 2 Sergey Chernyshev 2008-02-13 04:09:30 UTC
Not sure - there are quite a lot of namespaces in configuration usually and it's pain in the neck to put them all into command line, besides, LocalSettings.php is usually changed a lot anyway and adding more to it is quite usual (I have tons of things in there).

Also, it seems that having black list instead of white list might also be good idea because there are fewer namespaces to exclude and this list is usually constant - if someone adds new namespace, it's most probably content and should be indexed by crawlers.

I've added changes for exclusion and fixed a bug with undefined variable.
Comment 3 Sergey Chernyshev 2008-02-13 04:10:22 UTC
Created attachment 4646 [details]
Patch to implement blacklisting and whitelisting namespaces for sitemap generation
Comment 4 Sergey Chernyshev 2008-02-13 04:13:12 UTC
Oops. Patch also adds full server URL to sitemap - I believe I saw a bug for it, but can't remember where.
Feel free to remove the change if you don't feel like adding it.
Comment 5 Robert Leverington 2008-04-17 18:06:02 UTC
Fixed in r33498 using slightly modified version of the first patch.  Exclusion may be added later, but it seams a rather large jump from no discrimination whatsoever; I suggest you open another bug for that, however.
Comment 6 Dan Jacobson 2008-08-19 23:46:09 UTC
Perhaps add an example of usage. Say: In LocalSettings.php put:
$wgSitemapNamespaces=array
  (NS_MAIN,
   NS_TALK,
...
   NS_CATEGORY,
   NS_CATEGORY_TALK,
   );

Actually it seems a waste to put it in LocalSettings.php as is might be used only 1/999999 of the times that file is read...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links