Last modified: 2006-04-06 22:21:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T7401, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 5401 - "Corrupted"/truncated text on deletion log entry in Recent Changes
"Corrupted"/truncated text on deletion log entry in Recent Changes
Status: RESOLVED DUPLICATE of bug 332
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
unspecified
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
http://ka.wikipedia.org/wiki/special:Log
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-03-30 18:18 UTC by Malafaya
Modified: 2006-04-06 22:21 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Malafaya 2006-03-30 18:18:52 UTC
Recently I noticed, while viewing Recent Changes that sometimes the text for deleted articles (also 
on user creation logs) is truncated and all the text following it on the page becomes italicized.
An example of such corrupted text is (this example taken from ka: of 30th March at 12:01 UTC):

(წაშლილთა სია); 13:01 . . Zangala (განხილვა | წვლილი | ბლოკირება) (წაშლილია "კატეგორია:დაუსრულებელი 
სტატიები ბიოლოგია": შინაარსი იყო: 'კატეგორია:ბიოლოგია[[კატე᩼/span> 

As you can see here, the text was cut at [[კატე.... (which is a link that was on the deleted page 
contents) and then there is that square, which I believe is the merge of the first byte of a 
Unicode character in the text appended to the first byte of the Unicode '<' character (of </span> 
tag).

As I mentioned before, this also happens on user creation logs in Recent Changes when the text is a 
bit longer (longer username).
Comment 1 Malafaya 2006-03-30 18:25:22 UTC
(წაშლილთა სია); 13:01 . . Zangala (განხილვა | წვლილი | ბლოკირება) (წაშლილია "კატეგორია:დაუსრულებელი 
სტატიები ბიოლოგია": შინაარსი იყო: 'კატეგორია:ბიოლოგია[[კატე᩼/span>

can be translated as

(deletion log); 13:01 . . Zangala (talk| contribs | block) (deleted "Category:Unfinished Biology articles": 
contents was: 'Category:Biology[[Cate᩼/span>
Comment 2 Melancholie 2006-04-03 11:29:30 UTC
*** Bug 2386 has been marked as a duplicate of this bug. ***
Comment 3 Melancholie 2006-04-03 11:38:34 UTC
Oops, bug 2386 is not a duplicate; but see bug 2386 comment #2!
Comment 4 lɛʁi לערי ריינהארט 2006-04-04 10:19:15 UTC
Hallo Malafaya!

I changed the url to [[ka:special:Log]] because I assume that
http://ka.wikimedia.org/ does not exist yet. I also changed "Component" to
"Internationalization" because this seems more appropriate then "Categories".

Please use consistently {{ns:special}}, {{ns:project}}, {{ns:user}} etc. in the
*translations* / *localizations" at [[ka:special:Allmessages]]. This would make
it easier to verify your setup.

If the problem is still present I would suggest that you take a view of the
source code from your browser, copy it and make an attachment of type "HTML
source (text/html).

Good luck and best regards reinhardt [[user:gangleri]]
Comment 5 lɛʁi לערי ריינהארט 2006-04-04 11:48:25 UTC
Hallo again!

Looking at
http://ka.wikipedia.org/w/index.php?title=special:Log&type=&user=Zangala
and searching for
კატეგორია:ბიოლოგია
I found some entries (with other timestamps then mentioned, probably of your/y
different settings in [[ka:special:Preferences]] > 'Date and time')

I assume that the background of your request is the 'discrepancy' between what
you enter in the "Summary"-field, "Reason for deletion"-field, "Reason for
move"-field, "Reason for protection"-field etc. and what you get.

These fields have a limited size. I suppose that the GUI (browser, Java,
MediaWiki-SW) counts the character but does *not* care about the *final* length
if the characters get UTF-8 encoded in the database. Truncation happens later
somewhere in the MediaWiki software.

You will / might see less depending on how long the comments etc. are and how
many UTF-8 characters (requiring two or three bytes) you are using.

I would say this is "behaviour as today" and one should find out what duplicate
bug report this is.

There would be more ways to fix this:
a) The limitation on the lenght field should care about the final requested size
inside the database.
b) "Preview" / "Confirm" should notify about truncations; this could lead to
multiple posts instead of one which can iritate contributors.

best regards reinhardt [[user:gangleri]]
Comment 6 Malafaya 2006-04-04 17:28:35 UTC
Hi reinhardt.

The problem here is not the truncation "per se". The truncation of a character 
long than one byte in its middle (let's say a 3-byte UTF-8 character gets cut IN 
THE DATABASE after the 1st character) causes a strange character to be output. 
Browsers like IE which don't take into account that truncation of UTF-8 characters 
may occur fail to properly render the page.
Nikerabbit has investigated this problem and Brion says it's a known issue.

Please note that I'm not talking about the contents truncation (which happens in 
every Wiki, even in English) but about the truncation of the last character's 
bytes (which only happens if there are UTF-8 characters longer than one byte, like 
in many Asian languages).
Comment 7 Brion Vibber 2006-04-05 00:11:23 UTC
Ganleri, please stop adding comments to this bug; the problem is well known and 
understood, and the fix is forthcoming. :)
Comment 8 Rob Church 2006-04-06 22:21:39 UTC

*** This bug has been marked as a duplicate of 332 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links