Last modified: 2006-04-06 22:21:39 UTC
Recently I noticed, while viewing Recent Changes that sometimes the text for deleted articles (also on user creation logs) is truncated and all the text following it on the page becomes italicized. An example of such corrupted text is (this example taken from ka: of 30th March at 12:01 UTC): (წაშლილთა სია); 13:01 . . Zangala (განხილვა | წვლილი | ბლოკირება) (წაშლილია "კატეგორია:დაუსრულებელი სტატიები ბიოლოგია": შინაარსი იყო: 'კატეგორია:ბიოლოგია[[კატე᩼/span> As you can see here, the text was cut at [[კატე.... (which is a link that was on the deleted page contents) and then there is that square, which I believe is the merge of the first byte of a Unicode character in the text appended to the first byte of the Unicode '<' character (of </span> tag). As I mentioned before, this also happens on user creation logs in Recent Changes when the text is a bit longer (longer username).
(წაშლილთა სია); 13:01 . . Zangala (განხილვა | წვლილი | ბლოკირება) (წაშლილია "კატეგორია:დაუსრულებელი სტატიები ბიოლოგია": შინაარსი იყო: 'კატეგორია:ბიოლოგია[[კატე᩼/span> can be translated as (deletion log); 13:01 . . Zangala (talk| contribs | block) (deleted "Category:Unfinished Biology articles": contents was: 'Category:Biology[[Cate᩼/span>
*** Bug 2386 has been marked as a duplicate of this bug. ***
Oops, bug 2386 is not a duplicate; but see bug 2386 comment #2!
Hallo Malafaya! I changed the url to [[ka:special:Log]] because I assume that http://ka.wikimedia.org/ does not exist yet. I also changed "Component" to "Internationalization" because this seems more appropriate then "Categories". Please use consistently {{ns:special}}, {{ns:project}}, {{ns:user}} etc. in the *translations* / *localizations" at [[ka:special:Allmessages]]. This would make it easier to verify your setup. If the problem is still present I would suggest that you take a view of the source code from your browser, copy it and make an attachment of type "HTML source (text/html). Good luck and best regards reinhardt [[user:gangleri]]
Hallo again! Looking at http://ka.wikipedia.org/w/index.php?title=special:Log&type=&user=Zangala and searching for კატეგორია:ბიოლოგია I found some entries (with other timestamps then mentioned, probably of your/y different settings in [[ka:special:Preferences]] > 'Date and time') I assume that the background of your request is the 'discrepancy' between what you enter in the "Summary"-field, "Reason for deletion"-field, "Reason for move"-field, "Reason for protection"-field etc. and what you get. These fields have a limited size. I suppose that the GUI (browser, Java, MediaWiki-SW) counts the character but does *not* care about the *final* length if the characters get UTF-8 encoded in the database. Truncation happens later somewhere in the MediaWiki software. You will / might see less depending on how long the comments etc. are and how many UTF-8 characters (requiring two or three bytes) you are using. I would say this is "behaviour as today" and one should find out what duplicate bug report this is. There would be more ways to fix this: a) The limitation on the lenght field should care about the final requested size inside the database. b) "Preview" / "Confirm" should notify about truncations; this could lead to multiple posts instead of one which can iritate contributors. best regards reinhardt [[user:gangleri]]
Hi reinhardt. The problem here is not the truncation "per se". The truncation of a character long than one byte in its middle (let's say a 3-byte UTF-8 character gets cut IN THE DATABASE after the 1st character) causes a strange character to be output. Browsers like IE which don't take into account that truncation of UTF-8 characters may occur fail to properly render the page. Nikerabbit has investigated this problem and Brion says it's a known issue. Please note that I'm not talking about the contents truncation (which happens in every Wiki, even in English) but about the truncation of the last character's bytes (which only happens if there are UTF-8 characters longer than one byte, like in many Asian languages).
Ganleri, please stop adding comments to this bug; the problem is well known and understood, and the fix is forthcoming. :)
*** This bug has been marked as a duplicate of 332 ***