Last modified: 2012-04-19 21:42:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23228, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21228 - Search and Replace is replacing an extra character for some words - Sinhala wiki
Search and Replace is replacing an extra character for some words - Sinhala wiki
Status: CLOSED FIXED
Product: MediaWiki extensions
Classification: Unclassified
UsabilityInitiative (Other open bugs)
unspecified
PC All
: Normal major (vote)
: ---
Assigned To: Trevor Parscal
:
Depends on:
Blocks: 36111
  Show dependency treegraph
 
Reported: 2009-10-22 06:43 UTC by Calcey QA
Modified: 2012-04-19 21:42 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Screen print of the error (69.74 KB, application/pdf)
2009-10-22 06:43 UTC, Calcey QA
Details

Description Calcey QA 2009-10-22 06:43:09 UTC
Created attachment 6699 [details]
Screen print of the error

Reporting against Babaco Release : r57957

Steps to Reproduce ::
Link : http://prototype.wikimedia.org/si.wikipedia.org/%E0%B6%B8%E0%B7%94%E0%B6%BD%E0%B7%8A_%E0%B6%B4%E0%B7%92%E0%B6%A7%E0%B7%94%E0%B7%80

1)Select a random page
2)Edit a section
3)Select a word and select a replace word
4)Replace
<<Extra character is added>>

Expected Outcome::
There should not be any extra character

Test Environment::
Browser (User-Agent):	Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/532.0 (KHTML, like Gecko)Chrome/3.0.195.27 Safari/532.0

Browser (User-Agent): 	Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)

Browser (User-Agent): Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Comment 1 Roan Kattouw 2009-10-22 13:18:30 UTC
My gut says this is probably due to a bad interaction between regexes and multibyte strings; if that's the case, we can't do much about it.

Basically what I think is happening is that the [^ ] part of the regex is selecting one byte, but the character at that position is really two (or more) bytes long. That one byte will be matched and replaced, but the second (and any subsequent) bytes will stick around and be interpreted as a different character. I'll try to confirm this suspicion later.
Comment 2 Roan Kattouw 2009-11-02 12:10:04 UTC
The suspicion in comment #1 doesn't seem to be right, so now I think this may have something to do with compound characters. Could you paste all texts from the PDF (textarea contents before, search regex, replace string, textarea contents after) in a bug comment?
Comment 3 Trevor Parscal 2010-01-26 00:34:42 UTC
The underlying search and replace code is completely different now that we are using an iframe rather than a textarea.
Comment 4 Roan Kattouw 2010-01-26 13:44:09 UTC
(In reply to comment #3)
> The underlying search and replace code is completely different now that we are
> using an iframe rather than a textarea.

That doesn't necessarily mean that multibyte character handling is magically fixed. Reopening and asking Calcey to try and reproduce again; please close as FIXED or WORKSFORME if this can't be reproduced any more.
Comment 5 Trevor Parscal 2010-01-26 20:20:06 UTC
I've tested this with double-byte characters quite a bit now, and am sure it's fixed.
Comment 6 Platonides 2010-01-26 20:50:46 UTC
Note that Sinhala seems to be using three-byte characters.
Comment 7 Calcey QA 2010-01-27 09:03:46 UTC
Verified and closed

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links