Last modified: 2014-11-13 14:18:11 UTC
I suggest to trim leading/trailing spaces around statements of string type. Due to a c&p error I added some trainling spaces at a VIAF statement and had to remove them with an extra edit, see URL.
The "malformed input" error is thrown by the RegexValidator in function buildStringType from WikibaseDataTypeBuilders. It often happens when copy-pasting text from other sources (see http://www.wikidata.org/wiki/Wikidata:Project_chat#Error when adding commonscat) and it is made worse by bug 63301 which sometimes adds newlines when pressing return to save the claim, thus triggering the error. The trimming was previously done in ValueView but it has since been removed (see http://github.com/wmde/ValueView/commit/0a8350999bb1ee028db9487286173aee0b20640f). We could use the StringNormalizer class to trim the string and also remove incomplete UTF-8 sequences (see related bug 50486), however I'm not familiar with the data value processing code so I'm not sure where is the correct place to do it.
Sorry, correct link is https://www.wikidata.org/wiki/Wikidata:Project_chat#Error_when_adding_commonscat
Just to be clear: ValueView did not trim the value, it just checked for empty or whitespace-only strings.
At the time of writing, the current behaviour seems to be: When a value with a leading space character is submitted, the back-end parser will return a "malformed input" error. I wonder whether silent trimming should rather be implemented in the back-end parser instead of in the front-end. Can some authority please decide on that?
IIRC When I discussed this with Daniel his comment was that the parser shouldn't do anything to the string but only parse it. That makes sense to me. If we only do it in the frontend then these could still get in via the API for example. That seems sub-optimal. Daniel: Can you chime in on the pros and cons of doing this in the backend please?
I said that? Hm, then I have to disagree with myself there... The parsevalue API module (resp. StringValueParser) should trim input (and apply utf8 normalization).
Hm... apparently, StringValueParser does not create StringValues from strings. We seem to use the NullParser for this, which returns an UnknownValue. Whatever. So, let's have an actual StringValueParser that applies normalization.