Last modified: 2011-04-14 15:13:02 UTC
Every pages are not articles, every page_id are not article_id. Categories, talk pages, image are pages too. That's not possible currently to generate HTML files only for articles (main namespace or namespace=0) although this is possible for example to get only categories, images, etc. with dumpHTML.php. I propose to add the --articles command line option to do that : that's trivial. To avoid this type of feature request in the future I propose to introduce in addition the --namespace command line option to be able to mirror only each namespace separately. In this case, --articles, --categories, etc... should be only shortcuts to the namespace filtering implementation (directly provided by --namespace command line argument). One remark : currently you can do that (get only all articles) by putting the last article_id with the option -e. This art of doing seems to be not perfect because : * the notion of page_id has nothing to do wiht articles (it can be a talk page id) * If you want to have the last article_id, you have to do before an extra work to get it. Comments ? PS: Please create the dumpHTML extension component in bugzilla to be able to assign this bug to the correct component. cf. https://bugzilla.wikimedia.org/show_bug.cgi?id=15265