Last modified: 2011-04-14 15:13:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17674, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15674 - --articles (or/and) --namespace option for dumpHTML.php
--articles (or/and) --namespace option for dumpHTML.php
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
DumpHTML (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://svn.wikimedia.org/viewvc/media...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-21 13:31 UTC by Kelson [Emmanuel Engelhart]
Modified: 2011-04-14 15:13 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Kelson [Emmanuel Engelhart] 2008-09-21 13:31:59 UTC
Every pages are not articles, every page_id are not article_id. Categories, talk pages, image are pages too. 

That's not possible currently to generate HTML files only for articles (main namespace or namespace=0) although this is possible for example to get only categories, images, etc. with dumpHTML.php.

I propose to add the --articles command line option to do that : that's trivial.

To avoid this type of feature request in the future I propose to introduce in addition the --namespace command line option to be able to mirror only each namespace separately.

In this case, --articles, --categories, etc... should be only shortcuts to the namespace filtering implementation (directly provided by --namespace command line argument).

One remark : currently you can do that (get only all articles) by putting the last article_id with the option -e. This art of doing seems to be not perfect because :
* the notion of page_id has nothing to do wiht articles (it can be a talk page id)
* If you want to have the last article_id, you have to do before an extra work to get it.

Comments ?

PS: Please create the dumpHTML extension component in bugzilla to be able to assign this bug to the correct component. cf. https://bugzilla.wikimedia.org/show_bug.cgi?id=15265

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links