Last modified: 2014-09-23 23:59:39 UTC
Created attachment 5395 [details] enhancement of importTextFile.php it works as mainteinace/importTextFile.php and it has got two options: --morepages <filename> contents more wiki pages divide by <title>Title for the new page<title> --fileslist <filename> contents one file path for line if you decide to insert it, then I could create http://www.mediawiki.org/w/index.php?title=ImportTextFile.php&action=edit&redlink=1 thanks, Alessandra Bilardi.
Created attachment 5396 [details] Patch against current trunk of the above
+ echo( "\nUsing title '" . $title->getPrefixedText() . "'..." ); + if( is_object( $title ) ) { ^ This sequence will cause a fatal error in the 'echo' line, so the is_object() check will never be reached for the invalid title case. Alas this is in the original too, but now's a chance to fix it. ;) +$separator="<title>"; ... + $pages = explode( $separator, $text ); + for ($i=1,$cnt=count($pages);$i<$cnt;$i+=2) { + $title = $pages[$i]; + $text = $pages[$i+1]; + insertNewArticle( $title, $text, $comment, $flags ); + } Couple of things I'm not sure I like about this. First, it means that the separator cannot appear in the page text. This could be a problem if your text might be documentation -- docs about HTML or about the wiki might legitimately want to talk about <title> tags, and they'll break here. Unlike the XML import, there's no general provision for escaping; you'd have to manually escape, and then they'd be explicitly escaped in the actual imported text as well. Second, it looks like the idea is to do something like: <title>First title<title> First text First text continues <title>Second title<title> Second text Second text continues The use of XML-looking tags here is a bit uggy, in that one might expect <title>...</title> (with the slash in the close tag), but that wouldn't work. Additionally, I think you'll end up with an extra newline at the start of the page text, unless you do it like this: <title>First title<title>First text First text continues <title>Second title<title>Second text Second text continues which looks odd. My personal inclination is to recommend that if you're building batches of pages to import programmatically, it'll be almost as easy and more reliable to just generate the XML import/export format. + } else if (isset( $options['fileslist']) && !strstr( $text, $separator ) && !isset( $options['morepages'])) { + $pages = preg_split( "/\s+/", $text ); + for ($i=0,$cnt=count($pages);$pages[$i] && $i<$cnt;$i++) { + $text = file_get_contents( $pages[$i]); + $title = titleFromFilename($pages[$i]); + insertNewArticle( $title, $text, $comment, $flags ); This seems to be meant to allow passing a file containing a list of filenames to import. The main problem here is that the file is split on all whitespace; thus any pathnames containing spaces will be incorrectly split. Generally where we accept lists of target pages or files, we do the separation by newline, which won't interfere with spaces inside the target page/file name.
I don't understand if you want remove line + echo( "\nUsing title '" . $title->getPrefixedText() . "'..." ); or if you want this: + if( is_object( $title ) ) { + echo( "\nUsing title '" . $title->getPrefixedText() . "'..." ); About $separator, I change all and now user could decide <separator> from command line. And I remove 'extra newline' with command line: + $separator="/".$separator."\s*/"; + $pages = preg_split( $separator, $text ); About "\s+" of files list I modify with "\n". Script modified is here: http://gbrowse.org/reports/importTextsFile_php Thanks, Alessandra Bilardi.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
+need-review to signal to developers that this patch needs reviewing. Alessandra, it'll be easier for them to review it if you attach the patch to Bugzilla per https://www.mediawiki.org/wiki/Patch#Posting_a_patch . Thanks!
Comment on attachment 5396 [details] Patch against current trunk of the above Patch won't apply, and issues also not addressed