Last modified: 2009-03-23 01:57:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6253, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 4253 - Long edit summaries may get truncated in RC->IRC feeds
Long edit summaries may get truncated in RC->IRC feeds
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
PC Windows XP
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 16097 (view as bug list)
Depends on:
Blocks: 16599
  Show dependency treegraph
 
Reported: 2005-12-12 00:26 UTC by Ronald Beelaard
Modified: 2009-03-23 01:57 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ronald Beelaard 2005-12-12 00:26:17 UTC
If an article name is >>180 chars the rc feedback to irc is incorrect. In CDFV there 
appears a user : (colon). The summary shows 27.0.0.1 PRIVMSG #nl.wikipedia : <the 
article name>
Around 180 chars the behaviour varies.
A test with 160 chars long article name and a long summary in the edit screen gives a 
truncated summary in irc. A test with 150 chars, however, although also truncating 
the summary, does not give a 10 chars longer summary.

History in MediaWiki looks fine.

Any subsequent edit on such an article maintains the faulty reporting in irc. 

The article triggering the investigation is:
http://nl.wikipedia.org/w/index.php?
title=Instructie_betreffende_de_criteria_voor_het_onderscheiden_van_roepingen_met_betr
ekking_tot_personen_met_homoseksuele_tendensen_in_het_licht_van_hun_toelating_tot_semi
naries_en_de_heilige_geloften

This article, which is also a redir, does not allow the addition of text after the 
link. However a test with a short title and looking like this:
  #redirect[[blabla]] this is a test
can be edited.

This might be a second but related problem.
Comment 1 Merlijn van Deen (test) 2007-07-29 11:41:33 UTC
This also occurs rather more often with the ja.wikipedia rc stream. Because unicode url's tend to be long, the 470 character limit (probably some 512 char limit for the entire line?) is reached quite fast... and incomplete messages are hard to parse ;)
Comment 3 Chad H. 2008-10-27 20:09:29 UTC
*** Bug 16097 has been marked as a duplicate of this bug. ***
Comment 4 Brad Jorsch 2008-10-27 21:10:35 UTC
The IRC protocol (see rfc 2812) has a hard limit of 512 characters per command, and no way to split one message over multiple commands. If this is to be fixed the format of the RC feed messages must be changed somehow, and the various bots updated to handle the new format.
Comment 5 Ilmari Karonen 2008-10-27 21:18:37 UTC
Would probably have to involve some kind of continuation line syntax.  Note that, due to URL-encoding, it's possible for the diff link alone to exceed 512 characters in length.

...or we could just leave the page name _out_ of the diff link: simply http://en.wikipedia.org/w/index.php?diff=168124969&oldid=168124852 works just fine.
Comment 6 Brad Jorsch 2008-10-27 21:33:13 UTC
(In reply to comment #5)
> ...or we could just leave the page name _out_ of the diff link: simply
> http://en.wikipedia.org/w/index.php?diff=168124969&oldid=168124852 works just
> fine.

That would certainly reduce the incidence of the problem, but it would still fail if a long edit summary were used on a page with a long title or by a user with a long username, or if the user with the long name edits the page with the long title.
Comment 7 Ilmari Karonen 2008-10-27 21:39:28 UTC
(In reply to comment #6)
>
> That would certainly reduce the incidence of the problem, but it would still
> fail if a long edit summary were used on a page with a long title or by a user
> with a long username, or if the user with the long name edits the page with the
> long title.

True, but as long as it's just the summary that's truncated I suspect most users of the feed can live with it.  Of course, long username + long page title could still push it over the limit, but I think a lot of wikis these days cap username length (via the username blacklist) to something like 40-80 characters max anyway.
Comment 8 Gurch 2008-10-27 22:13:19 UTC
(In reply to comment #7)
> True, but as long as it's just the summary that's truncated I suspect most
> users of the feed can live with it.

If by "live with it", you mean "spam the crap out of the API instead because it's the only way to get the same information while guaranteeing it to actually be correct", then yes... :/
Comment 9 Alex Z. 2008-10-27 22:55:54 UTC
Removed the title from diff URLs and trimmed the summary if its still too long in r42695. This should fix the majority of cases, but "really long username" + "really long title" will still have the problem. 

Ideally there would be sort of continuation to second or third messages if necessary, but this would be more difficult to implement, and has the risk of breaking bots that use the IRC RC feed, due to unexpected content in the messages and things being in the wrong places, so more discussion would be needed on that. Input from bot operators who use the IRC feeds would be welcome.
Comment 10 Brad Jorsch 2008-10-28 02:00:45 UTC
Note that the trimming you have in there won't really do much good. The IRC protocol tacks on the source user identifier, a command ("PRIVMSG"), a target, some punctuation, and a trailing "\r\n" which leaves rather less than 512 characters for the actual message (the exact amount less depends on the length of the channel name and the length of the user identifier).
Comment 11 Ilmari Karonen 2008-10-28 02:35:23 UTC
In fact, it might be best to just forego the trimming entirely: as Brad notes, it's not very effective, and not having it allows bots to tell if the message has been truncated by checking for the trailing \003.
Comment 12 Alex Z. 2008-10-28 02:40:06 UTC
Comment trimming removed in r42711.
Comment 13 Gurch 2008-12-09 19:27:18 UTC
(In reply to comment #9)
> Ideally there would be sort of continuation to second or third messages if
> necessary, but this would be more difficult to implement, and has the risk of
> breaking bots that use the IRC RC feed, due to unexpected content in the
> messages and things being in the wrong places, so more discussion would be
> needed on that. Input from bot operators who use the IRC feeds would be
> welcome.

All IRC bots that I'm aware of use a regex to match the IRC messages, so all that *should* happen is the continuation would not be picked up.
Comment 14 Ilmari Karonen 2008-12-10 10:32:10 UTC
Just took a look at the #en.wikipedia channel, and it seems new page creations still include the title in the URL.  They should probably be changed to include an "oldid=" parameter instead.  Will take a look at it myself later, if I don't forget.
Comment 15 Gurch 2008-12-10 14:30:08 UTC
(In reply to comment #14)
> Just took a look at the #en.wikipedia channel, and it seems new page creations
> still include the title in the URL.  They should probably be changed to include
> an "oldid=" parameter instead.  Will take a look at it myself later, if I don't
> forget.

Please do; that would fix bug 16586 too. :)
Comment 16 Ilmari Karonen 2008-12-10 16:00:15 UTC
Comment #14 should be fixed in r44406.
Comment 17 Mike.lifeguard 2009-03-23 01:57:43 UTC
The original problem here is INVALID AFAICT, due to limits of IRC. Instead, can replace IRC by XAMPP (bug 17450), which blocks bug 14045 which is basically the real complaint here. Everything else mentioned seems to have been addressed. I guess I will mark this as FIXED.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links