Last modified: 2009-10-21 19:56:30 UTC
An easy test showing just top of the iceberg: * Create a page with a space in title in a namespace that supports subpages (e.g. User). * Create a subpage for it. * Click on former page's move tab. You'll see "This page has no subpages" at the bottom of the dialog instead of subpage(s). This is because SQLite requires an implicit ESCAPE, for example, if you would like to find all values that end with "100_%", you'll have to write something like: SELECT * FROM table WHERE field LIKE "%100\_\%" ESCAPE '\' Therefore, every query that uses LIKE must be appended with ESCAPE for databases that require it.
Created attachment 6469 [details] Sketch of a possible solution Here's my idea of fixing this mess (it should also theoretically decrease the chances of SQL injection by providing yet another function that automatically escapes everything needed). If a developer approves the idea, I will make and post a complete patch.
(In reply to comment #0) > An easy test showing just top of the iceberg: > * Create a page with a space in title in a namespace that supports subpages > (e.g. User). > * Create a subpage for it. > * Click on former page's move tab. You'll see "This page has no subpages" at > the bottom of the dialog instead of subpage(s). > > This is because SQLite requires an implicit ESCAPE, for example, if you would > like to find all values that end with "100_%", you'll have to write something > like: > > SELECT * FROM table WHERE field LIKE "%100\_\%" ESCAPE '\' > > Therefore, every query that uses LIKE must be appended with ESCAPE for > databases that require it. Where does the ESCAPE have to go? Right after the LIKE or at the end of the query? I'd expect this function to be usable even with additional conditions after it. Also remember that there might be parentheses and other fun stuff thrown in; make sure you pay attention to associativity. I'd call the method like() rather than prepareLike(), and do away with the extra escapeSyntax() thing -- just roll it into the like() method and have SQLite override that. Also, I'm not sure whether it's better to do 'foo' . $dbr->like( 'bar%' ), or $dbr->like( 'foo', 'bar%' ), or what. And if the latter, I'm not sure if 'foo' should be assumed to be a table name or taken literally. Do we have any similar constructions currently to compare to?
(In reply to comment #2) > Where does the ESCAPE have to go? Right after the LIKE or at the end of the > query? I'd expect this function to be usable even with additional conditions > after it. Also remember that there might be parentheses and other fun stuff > thrown in; make sure you pay attention to associativity. At the end of the LIKE expression: http://sqlite.org/syntaxdiagrams.html#expr > I'd call the method like() rather than prepareLike(), and do away with the > extra escapeSyntax() thing -- just roll it into the like() method and have > SQLite override that. Also, I'm not sure whether it's better to do 'foo' . > $dbr->like( 'bar%' ), or $dbr->like( 'foo', 'bar%' ), or what. And if the > latter, I'm not sure if 'foo' should be assumed to be a table name or taken > literally. Do we have any similar constructions currently to compare to? The $dbr->like( 'bar%' ) variant is clearer, but it requires you to take care of its parameter manually, i.e. to escape 'bar' but not '%', if you really mean "all pages starting with 'bar'". To reduce the possibility of creating an unescaped LIKE statement (not calling escapeLike may result in a query that does not do what it is intended to do, not escaping quotes properly is an outright security threat). My proposed function takes care of most uses of LIKE -- that only need to select things starting with a given string. For those cases where the LIKE statement needs to be much trickier, no predefined function can take care of 100% of cases, that's why I proposed to create escapeSyntax(). People should be able to use it directly when building a query manually.
(In reply to comment #3) > The $dbr->like( 'bar%' ) variant is clearer, but it requires you to take care > of its parameter manually, i.e. to escape 'bar' but not '%', if you really mean > "all pages starting with 'bar'". To reduce the possibility of creating an > unescaped LIKE statement (not calling escapeLike may result in a query that > does not do what it is intended to do, not escaping quotes properly is an > outright security threat). My proposed function takes care of most uses of > LIKE -- that only need to select things starting with a given string. For those > cases where the LIKE statement needs to be much trickier, no predefined > function can take care of 100% of cases, that's why I proposed to create > escapeSyntax(). People should be able to use it directly when building a query > manually. I'm not convinced this is the best API. Currently we do fine with calling escapeLike() manually; I don't see any better way in general. You're right that my previous suggestion doesn't work, but how about: $dbr->like( $dbr->escapeLike( $foo ) . '%' . $dbr->escapeLike( $bar ) ) addQuotes() would be handled by like(). I think this is a better syntax than ' LIKE ' . $this->addQuotes( $this->escapeLike( $foo ) . '%' . $this->escapeLike( $bar ) ) . $this->escapeSyntax() or such. In particular, escapeSyntax() exposes a weird implementation detail of SQLite, which nobody is going to understand, so they're probably just going to forget to add it. It's easier to remember "all LIKEs should use the like() method" than "all LIKEs should have the output of some method tacked on the end, what was that called again?" Also, like() gives us more flexibility in case some other DBMS has odd LIKE handling in the future. It's possible that an extra method to handle the common prefix case would be good, like $dbr->likePrefix( $foo ) == $dbr->like( $dbr->escapeLike( $foo ) . '%' ). But then people might not realize that both methods exist, and so might resort to manually crafting non-prefix LIKEs. Unless there are really almost none of those, I think just having like() is probably the best syntax here.
Created attachment 6492 [details] All LIKEs in core wrapped New patch, takes into account Aryeh's review.
Created attachment 6493 [details] Tests (require the previous patch) Posting these tests separately, as I haven't figured out yet with which version of PHPUnit these tests are expected to run (if any:P).
Okay, it seems like escapeLike() does strencode() as well. So you've got like() doing no escaping at all. I think this is potentially quite surprising; I'd prefer to see it doing addQuotes() on the input. Generally speaking, APIs that escape more than expected are better than those that escape less than expected -- the former leads to double-escaping, the latter leads to SQL injection (or XSS, etc.). escapeLike() clearly can't be used in that case, of course, some new and confusingly similar method would have to be made that only does the replacement of _ and % and \ and nothing else. When reviewing your patch, I got seriously confused about escaping. I had to rewrite the whole thing twice as I figured out what your method did and what escapeLike() did and how LIKE works (I didn't realize \\\\ was necessary for a single slash in LIKE). As far as I understand it, the correct way to do things currently is 'LIKE ' . $db->escapeLike( $foo ) . '%', while with your patch the correct way would be $db->like( $db->escapeLike( $foo ) . '%' ), and I'd like to see $db->like( $db->someNewName( $foo ) . '%' ). If I'm wrong, then most of my review comments here are nonsense. :D - $likeprefix = str_replace( '_', '\\_', $prefix); ... - AND pl_title LIKE '$likeprefix:%'"; + AND pl_title " . $wgDatabase->like( $wgDatabase->escapeLike( $name ) . ':%' ); ... - $likeprefix = str_replace( '_', '\\_', $prefix); ... - AND {$page}_title LIKE '$likeprefix:%'"; + AND {$page}_title " . $this->db->like( $this->db->escapeLike( $name ) . ':%' ); Don't these add double-escaping of some kind? (Or was $likeprefix not already escaped, and this would previously have been SQL injection if it weren't from a maintenance script?) - array( 'el_index LIKE ' . $dbr->addQuotes( $like ) ), __METHOD__ ); + array( 'el_index ' . $dbr->like( $like ) ), __METHOD__ ); // LinkFilter::makeLike already escapes everything ... - array( 'el_index LIKE ' . $dbr->addQuotes( $like ) ), __METHOD__ ); + array( 'el_index ' . $dbr->like( $like ) ), __METHOD__ ); // LinkFilter::makeLike already escapes everything So these were double-escaping before? + $this->mQueryConds = array( 'LOWER(img_name) ' $dbr->like( '%' . strtolower( $nt->getDBkey() ) . '%' ); Syntax error? A dot is missing. - $m = $dbr->strencode( strtolower( $nt->getDBkey() ) ); - $m = str_replace( "%", "\\%", $m ); - $m = str_replace( "_", "\\_", $m ); - $this->mQueryConds = array( "LOWER(img_name) LIKE '%{$m}%'" ); + $this->mQueryConds = array( 'LOWER(img_name) ' $dbr->like( '%' . strtolower( $nt->getDBkey() ) . '%' ); ... - $encSearch = $dbr->addQuotes( $stripped ); + $like = $dbr->like( $stripped ); $encSQL = ''; if ( isset ($this->mNs) && !$wgMiserMode )@@ -144,7 +144,7 @@ $externallinks $use_index WHERE page_id=el_from... - AND $clause LIKE $encSearch + AND $clause $like ... - $encRange = $dbr->addQuotes( "$range%" ); $encAddr = $dbr->addQuotes( $iaddr ); $conds[] = "(ipb_address = $encIp) OR - (ipb_range_start LIKE $encRange AND + (ipb_range_start " . $dbr->like( "$range%" ) . " AND ... - $this->addWhere('el_index LIKE ' . $db->addQuotes( $likeQuery )); + $this->addWhere('el_index ' . $db->like( $likeQuery )); } else if(!is_null($protocol)) - $this->addWhere('el_index LIKE ' . $db->addQuotes( "$protocol%" )); + $this->addWhere('el_index ' . $db->like( "$protocol%" )); These all remove addQuotes()/strencode(). Are you adding SQL injection here? Even if theoretically you don't think the variables should contain characters that need to be escaped, it's better to call addQuotes()/strencode() to be on the safe side, and the closer to the point of use, the better. + $db = $this->mDb; ... - $this->mConds[] = $this->mDb->bitAnd('log_deleted', LogPage::DELETED_ACTION) . ' = 0'; + $this->mConds[] = $db->bitAnd('log_deleted', LogPage::DELETED_ACTION) . ' = 0'; } else if( !$wgUser->isAllowed( 'suppressrevision' ) ) { - $this->mConds[] = $this->mDb->bitAnd('log_deleted', LogPage::SUPPRESSED_ACTION) . + $this->mConds[] = $db->bitAnd('log_deleted', LogPage::SUPPRESSED_ACTION) . Try to avoid making unrelated changes like this, it makes patches harder to review. /** + * LIKE statement wrapper + * @param s$ String: pattern for LIKE to match. Must already be escaped with escapeLike() where needed. + * @ return String: fully built LIKE statement + */ It's better if you make sure to use the normal spacing here, with an extra space before the * starting each line of the comment after the first, so the *'s on the left all line up. It would be good if the comment were more verbose, but that's no problem, I can expand it a bit in a later commit. + return "LIKE '" . $s . "'"; I would add an extra space before and after the return value (also for the SQLite modified version). It can't hurt, and it might avoid SQL syntax errors from stuff like "tablename" . $db->like( $foo ). I can't comment on the tests, since I've never looked at the PHPUnit that's checked in. We don't really use tests except for the parser.
(In reply to comment #7) Thanks, Aryeh. I'll make another patch in a couple of days. > Okay, it seems like escapeLike() does strencode() as well. So you've got > like() doing no escaping at all. I think this is potentially quite surprising; > I'd prefer to see it doing addQuotes() on the input. Generally speaking, APIs > that escape more than expected are better than those that escape less than > expected -- the former leads to double-escaping, the latter leads to SQL > injection (or XSS, etc.). escapeLike() clearly can't be used in that case, of > course, some new and confusingly similar method would have to be made that only > does the replacement of _ and % and \ and nothing else. That would create a situation where a misleaded developer uses the new function in an unsafe context and gets an SQL injection. Giving it a scary name (e.g. unsafeEscapeLike) may make it less likely, but the confusion "ZOMG I've seen UNSAFE function being used everyewhere in the code base!!!" can't be avoided. Will do it in the next patch if nothing better will be invented. There also could be made a second parameter to escapeLike() that defaults to the current behaviour, but could instruct it not to escape quotes if overridden. That would be pretty ugly though. Another idea would be to unescape quotes in the input before passing it to addQuotes(). Although this may look insane, it will ensure injection-safety and will cause input corruption only where it's already incorrect - not passed through escapeLike() and therefore with % and _ possibly unescaped. I'd like to see some comments from other developers on this. > When reviewing your patch, I got seriously confused about escaping. I had to > rewrite the whole thing twice as I figured out what your method did and what > escapeLike() did and how LIKE works (I didn't realize \\\\ was necessary for a > single slash in LIKE). \\\\ appears to be a mysqlism, in SQLite it looks two times more sane:D > As far as I understand it, the correct way to do things > currently is 'LIKE ' . $db->escapeLike( $foo ) . '%', while with your patch the > correct way would be $db->like( $db->escapeLike( $foo ) . '%' ), and I'd like > to see $db->like( $db->someNewName( $foo ) . '%' ). If I'm wrong, then most of > my review comments here are nonsense. :D Yes, except that your first snippet has no quotes around the pattern. > Don't these add double-escaping of some kind? (Or was $likeprefix not already > escaped, and this would previously have been SQL injection if it weren't from a > maintenance script?) On further review, it's a MySQL-specific code for upgrade between versions that had not supported anything else, so I'll just revert it back. > So these were double-escaping before? Yes, it seems. This script is pretty obscure though, and non-encoded quotes are non-standard in URLs. > Syntax error? A dot is missing. Thanks, it will be corrected. > These all remove addQuotes()/strencode(). Are you adding SQL injection here? > Even if theoretically you don't think the variables should contain characters > that need to be escaped, it's better to call addQuotes()/strencode() to be on > the safe side, and the closer to the point of use, the better. Yes, an injection (along with syntax errors) in the first case (SpecialListFiles.php) - fixed. In two other cases the input is already sanitized via LinkFilter::makeLike (one execution path in mungeQuery() that doesn't use it is sanitized by regexes that accept only IPs), I'll make it clear by adding comments. > Try to avoid making unrelated changes like this, it makes patches harder to > review. Avoided it as best as I could, but two "$this->mDb->" in one line is pretty ugly and hard to read, while introducing a new variable and using it only selectively is a WTF on its own right. > It's better if you make sure to use the normal spacing here, with an extra > space before the * starting each line of the comment after the first, so the > *'s on the left all line up. > I would add an extra space before and after the return value (also for the > SQLite modified version). It can't hurt, and it might avoid SQL syntax errors > from stuff like "tablename" . $db->like( $foo ). Will do. Notepad++ made the misaligned *'s unnoticeable.
(In reply to comment #8) > That would create a situation where a misleaded developer uses the new function > in an unsafe context and gets an SQL injection. Giving it a scary name (e.g. > unsafeEscapeLike) may make it less likely, but the confusion "ZOMG I've seen > UNSAFE function being used everyewhere in the code base!!!" can't be avoided. > Will do it in the next patch if nothing better will be invented. > > There also could be made a second parameter to escapeLike() that defaults to > the current behaviour, but could instruct it not to escape quotes if > overridden. That would be pretty ugly though. > > Another idea would be to unescape quotes in the input before passing it to > addQuotes(). Although this may look insane, it will ensure injection-safety and > will cause input corruption only where it's already incorrect - not passed > through escapeLike() and therefore with % and _ possibly unescaped. > > I'd like to see some comments from other developers on this. Yes, me too. I'm not sure what the best interface is here. > \\\\ appears to be a mysqlism, in SQLite it looks two times more sane:D Then SQLite needs to do more overriding of escapeLike(). We should check how other databases behave -- if MySQL is the odd one out, the base method should be changed. > Yes, except that your first snippet has no quotes around the pattern. Er, right. > Yes, an injection (along with syntax errors) in the first case > (SpecialListFiles.php) - fixed. In two other cases the input is already > sanitized via LinkFilter::makeLike (one execution path in mungeQuery() that > doesn't use it is sanitized by regexes that accept only IPs), I'll make it > clear by adding comments. It's best to move the escaping down as close as possible to where the query is actually built, so you can look in one place and be sure that everything's escaped. Otherwise maybe someone will add some code somewhere that will initialize the variable differently, to a non-escaped value, and it will be hard to spot. Alternatively, it's best to at least change the variable name from $foo to $encFoo so it will raise a red flag if the reader is familiar with that convention and sees that the variable is assigned an unescaped value. But this assumes the reader is familiar with the convention, and still scatters the escaping across more places. If it's a pain to change, though, don't bother, comments are okay for existing code. > Avoided it as best as I could, but two "$this->mDb->" in one line is pretty > ugly and hard to read, while introducing a new variable and using it only > selectively is a WTF on its own right. Okay.
Some random thoughts... My initial thought on reading this was that LIKE escaping should really be an internal function called by the wrapper functions when building the WHERE clause, but I couldn't really think of a syntax that wouldn't be clunky, unreadable, and backwards compatible. However, from that came another idea. Perhaps something like http://pastebin.com/m657d3b8a (just a rough idea)
(In reply to comment #10) > Some random thoughts... > > My initial thought on reading this was that LIKE escaping should really be an > internal function called by the wrapper functions when building the WHERE > clause, but I couldn't really think of a syntax that wouldn't be clunky, > unreadable, and backwards compatible. That would be impossible to use with query(), so we couldn't deploy it without either rewriting a bunch of old stuff or exposing an extra method anyway. > However, from that came another idea. Perhaps something like > http://pastebin.com/m657d3b8a (just a rough idea) I like that a lot, except that LIKE_PERCENT and LIKE_UNDERSCORE are a bit clunky. But the idea of distinguishing based on argument types somehow is nice. I'd allow func_get_args()-based input too, to save the array(), like $dbr->like( 'page_title', 'Foo_bar', LIKE_PERCENT ); I'm still not totally sure if the first argument should be the left-hand argument of LIKE, or if that should be prepended manually. I.e., 'page_title' . $dbr->like( 'Foo_bar', LIKE_PERCENT ); I think the latter is cleaner and easier to read, but that's arguable.
(In reply to comment #9) > (In reply to comment #8) > > \\\\ appears to be a mysqlism, in SQLite it looks two times more sane:D > > Then SQLite needs to do more overriding of escapeLike(). We should check how > other databases behave -- if MySQL is the odd one out, the base method should > be changed. Nope, PosgtreSQl also needs four. I was unable to find any mention for SQLite, so minus point for poor documentation seems to go to SQLite (two points if the behaviour is actually different).
(In reply to comment #12) > (In reply to comment #9) > > (In reply to comment #8) > > > \\\\ appears to be a mysqlism, in SQLite it looks two times more sane:D > > > > Then SQLite needs to do more overriding of escapeLike(). We should check how > > other databases behave -- if MySQL is the odd one out, the base method should > > be changed. > > Nope, PosgtreSQl also needs four. I was unable to find any mention for SQLite, > so minus point for poor documentation seems to go to SQLite (two points if the > behaviour is actually different). > Results of die(wfGetDb(DB_MASTER)->addQuotes(wfGetDb(DB_MASTER)->escapeLike('foo_bar'))): * SQLite: 'foo\_bar' * MySQL: 'foo\\_bar' Both seem to work as inteneded (and produce the same results when run from within MW). After further experiments in PhpMyAdmin I discovered that SELECT * FROM page WHERE page_title LIKE 'M_in\\_P%' works indistinguishable from SELECT * FROM page WHERE page_title LIKE 'M_in\_P%' Quirks mode against double-escaping?
A simple test: mysql> SELECT '\\' LIKE '\\' AS a, '\\' LIKE '\\\\' AS b; +---+---+ | a | b | +---+---+ | 1 | 1 | +---+---+ 1 row in set (0.03 sec) sqlite> SELECT '\\' LIKE '\\' AS a, '\\' LIKE '\\\\' AS b; 1|0 So the double-escaping is wrong for SQLite. I don't have any of the other supported DBMSes handy to test with.
Created attachment 6502 [details] Sample patch using idea by Mr.Z-man I like Alex's idea, here is an example of its implementation. I used different names for constants. It still has some problems though - even after replacement of switch statement with ifs using === I'm still afraid of possible problems when someone calls the function with an int parameter intended to be used as a string.
(In reply to comment #14) > A simple test: > > mysql> SELECT '\\' LIKE '\\' AS a, '\\' LIKE '\\\\' AS b; > +---+---+ > | a | b | > +---+---+ > | 1 | 1 | > +---+---+ > 1 row in set (0.03 sec) > > sqlite> SELECT '\\' LIKE '\\' AS a, '\\' LIKE '\\\\' AS b; > 1|0 > > So the double-escaping is wrong for SQLite. I don't have any of the other > supported DBMSes handy to test with. Nikerabbit points out that I forgot the ESCAPE thing, so of course SQLite doesn't treat backslash specially. If I add ESCAPE '\', it matches MySQL: sqlite> SELECT '\\' LIKE '\\', '\\' LIKE '\\\\', '\\' LIKE '\\' ESCAPE '\', '\\' LIKE '\\\\' ESCAPE '\'; 1|0|0|1 (In reply to comment #15) > I like Alex's idea, here is an example of its implementation. I used different > names for constants. I prefer LIKE_ to MATCH_, personally. Also, PERCENT and UNDERSCORE are more readily comprehensible if you know much of any SQL, IMO. > It still has some problems though - even after replacement > of switch statement with ifs using === I'm still afraid of possible problems > when someone calls the function with an int parameter intended to be used as a > string. Then define a class, like class LikePercent {} class LikeUnderscore {} Then use those as magic markers. You can create a function or static method or member variable or whatever to avoid having to write "new LikePercent": $dbr->like( 'foo', likePercent() ); $dbr->like( 'foo', Database::percent() ); $dbr->like( 'foo', $dbr->percent ); where each of those three would return or be equal to "new LikePercent". A somewhat more hackish way of avoiding the problem in practice is to define LIKE_PERCENT to be 440366359746.12671 and LIKE_UNDERSCORE to be 501450300271.66339, or something like that. This is kind of scary-looking, I have to say. I'd like feedback from Tim or Brion before committing any system like this. Maybe there's a better way we're missing.
I also said: Exception: At the end of the pattern string, backslash can be specified as “\\”. Compare with: mysql> SELECT '\\a' LIKE '\\a' AS a, '\\a' LIKE '\\\\a' AS b; +---+---+ | a | b | +---+---+ | 0 | 1 | +---+---+ While you work on it, it would be nice to support case sensitive behaviour, if possible.
(In reply to comment #17) > I also said: Exception: At the end of the pattern string, backslash can be > specified as “\\”. > > Compare with: > mysql> SELECT '\\a' LIKE '\\a' AS a, '\\a' LIKE '\\\\a' AS b; > +---+---+ > | a | b | > +---+---+ > | 0 | 1 | > +---+---+ Ah, interesting. So it looks like we need \\\\ for everyone.
Created attachment 6511 [details] Full patch v2 Still needs an agreement on constants' names and implementation technique.
Created attachment 6531 [details] Full patch v2.1 Less crappy LinkFilter::keepOneWildcard()
Fixed in r57989