Last modified: 2014-11-13 14:18:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T47925, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 45925 - Trim spaces around statements of string type


Summary:	Trim spaces around statements of string type

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	WikidataRepo (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	High normal with 1 vote (vote)
Target Milestone:	---
Assigned To:	Wikidata bugs

URL:	https://www.wikidata.org/w/index.php?...
Whiteboard:	u=dev c=backend p=0
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-03-09 09:04 UTC by Raimond Spekking
Modified:	2014-11-13 14:18 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Raimond Spekking 2013-03-09 09:04:58 UTC

I suggest to trim leading/trailing spaces around statements of string type.

Due to a c&p error I added some trainling spaces at a VIAF statement and had to remove them with an extra edit, see URL.

Comment 1 Mushroom 2014-04-04 14:44:03 UTC

The "malformed input" error is thrown by the RegexValidator in function buildStringType from WikibaseDataTypeBuilders. It often happens when copy-pasting text from other sources (see http://www.wikidata.org/wiki/Wikidata:Project_chat#Error when adding commonscat) and it is made worse by bug 63301 which sometimes adds newlines when pressing return to save the claim, thus triggering the error.

The trimming was previously done in ValueView but it has since been removed (see http://github.com/wmde/ValueView/commit/0a8350999bb1ee028db9487286173aee0b20640f). We could use the StringNormalizer class to trim the string and also remove incomplete UTF-8 sequences (see related bug 50486), however I'm not familiar with the data value processing code so I'm not sure where is the correct place to do it.

Comment 2 Mushroom 2014-04-04 14:48:38 UTC

Sorry, correct link is https://www.wikidata.org/wiki/Wikidata:Project_chat#Error_when_adding_commonscat

Comment 3 Adrian Lang 2014-04-07 08:48:12 UTC

Just to be clear: ValueView did not trim the value, it just checked for empty or whitespace-only strings.

Comment 4 Henning 2014-10-29 07:47:37 UTC

At the time of writing, the current behaviour seems to be: When a value with a leading space character is submitted, the back-end parser will return a "malformed input" error.
I wonder whether silent trimming should rather be implemented in the back-end parser instead of in the front-end.
Can some authority please decide on that?

Comment 5 Lydia Pintscher 2014-11-03 18:06:10 UTC

IIRC When I discussed this with Daniel his comment was that the parser shouldn't do anything to the string but only parse it. That makes sense to me.
If we only do it in the frontend then these could still get in via the API for example. That seems sub-optimal.

Daniel: Can you chime in on the pros and cons of doing this in the backend please?

Comment 6 Daniel Kinzler 2014-11-13 14:04:00 UTC

I said that? Hm, then I have to disagree with myself there... The parsevalue API module (resp. StringValueParser) should trim input (and apply utf8 normalization).

Comment 7 Daniel Kinzler 2014-11-13 14:18:11 UTC

Hm... apparently, StringValueParser does not create StringValues from strings. We seem to use the NullParser for this, which returns an UnknownValue. Whatever.

So, let's have an actual StringValueParser that applies normalization.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links