Last modified: 2014-09-23 23:59:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17872, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 15872 - enhancement of importTextFile.php
enhancement of importTextFile.php
Status: NEW
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.gbrowse.org/reports/import...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-06 21:11 UTC by Alessandra Bilardi
Modified: 2014-09-23 23:59 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
enhancement of importTextFile.php (3.62 KB, text/plain)
2008-10-06 21:11 UTC, Alessandra Bilardi
Details
Patch against current trunk of the above (4.83 KB, patch)
2008-10-06 21:42 UTC, Brion Vibber
Details

Description Alessandra Bilardi 2008-10-06 21:11:00 UTC
Created attachment 5395 [details]
enhancement of importTextFile.php

it works as mainteinace/importTextFile.php and it has got two options:

--morepages
        <filename> contents more wiki pages divide by <title>Title for the new page<title>
--fileslist
        <filename> contents one file path for line

if you decide to insert it, then I could create http://www.mediawiki.org/w/index.php?title=ImportTextFile.php&action=edit&redlink=1

thanks,

Alessandra Bilardi.
Comment 1 Brion Vibber 2008-10-06 21:42:06 UTC
Created attachment 5396 [details]
Patch against current trunk of the above
Comment 2 Brion Vibber 2008-10-06 21:54:44 UTC
+	echo( "\nUsing title '" . $title->getPrefixedText() . "'..." );
+	if( is_object( $title ) ) {

^ This sequence will cause a fatal error in the 'echo' line, so the is_object() check will never be reached for the invalid title case.

Alas this is in the original too, but now's a chance to fix it. ;)


+$separator="<title>";
...
+				$pages = explode( $separator, $text );
+				for ($i=1,$cnt=count($pages);$i<$cnt;$i+=2) {
+					$title = $pages[$i];
+					$text = $pages[$i+1];
+					insertNewArticle( $title, $text, $comment, $flags );
+				}

Couple of things I'm not sure I like about this.

First, it means that the separator cannot appear in the page text. This could be a problem if your text might be documentation -- docs about HTML or about the wiki might legitimately want to talk about <title> tags, and they'll break here. Unlike the XML import, there's no general provision for escaping; you'd have to manually escape, and then they'd be explicitly escaped in the actual imported text as well.


Second, it looks like the idea is to do something like:

<title>First title<title>
First text
First text continues
<title>Second title<title>
Second text
Second text continues

The use of XML-looking tags here is a bit uggy, in that one might expect <title>...</title> (with the slash in the close tag), but that wouldn't work.

Additionally, I think you'll end up with an extra newline at the start of the page text, unless you do it like this:

<title>First title<title>First text
First text continues
<title>Second title<title>Second text
Second text continues

which looks odd.

My personal inclination is to recommend that if you're building batches of pages to import programmatically, it'll be almost as easy and more reliable to just generate the XML import/export format.


+			} else if (isset( $options['fileslist']) && !strstr( $text, $separator ) && !isset( $options['morepages'])) {
+				$pages = preg_split( "/\s+/", $text );
+				for ($i=0,$cnt=count($pages);$pages[$i] && $i<$cnt;$i++) {
+					$text = file_get_contents( $pages[$i]);
+					$title = titleFromFilename($pages[$i]);
+					insertNewArticle( $title, $text, $comment, $flags );

This seems to be meant to allow passing a file containing a list of filenames to import. The main problem here is that the file is split on all whitespace; thus any pathnames containing spaces will be incorrectly split.

Generally where we accept lists of target pages or files, we do the separation by newline, which won't interfere with spaces inside the target page/file name.
Comment 3 Alessandra Bilardi 2008-10-07 10:37:11 UTC
I don't understand if you want remove line
+       echo( "\nUsing title '" . $title->getPrefixedText() . "'..." );
or if you want this:
+       if( is_object( $title ) ) {
+              echo( "\nUsing title '" . $title->getPrefixedText() . "'..." );

About $separator, I change all and now user could decide <separator> from command line. And I remove 'extra newline' with command line:
+       $separator="/".$separator."\s*/";
+       $pages = preg_split( $separator, $text );

About "\s+" of files list I modify with "\n".

Script modified is here: http://gbrowse.org/reports/importTextsFile_php

Thanks,
Alessandra Bilardi.
Comment 4 p858snake 2011-04-30 00:09:51 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Comment 5 Sumana Harihareswara 2011-11-09 03:10:07 UTC
+need-review to signal to developers that this patch needs reviewing.  Alessandra, it'll be easier for them to review it if you attach the patch to Bugzilla per https://www.mediawiki.org/wiki/Patch#Posting_a_patch .  Thanks!
Comment 6 Sam Reed (reedy) 2011-11-19 19:54:16 UTC
Comment on attachment 5396 [details]
Patch against current trunk of the above

Patch won't apply, and issues also not addressed

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links