Last modified: 2014-10-28 13:23:23 UTC
Usernames of block users are stored in ipblocks.ipb_address. This field generally uses spaces in place of underscores. However many entries are inexplicably using underscores. This is screwing up the API. Example: https://en.wikipedia.org/w/api.php?action=query&list=blocks&bkusers=Mistress+Selina+Kyle No results are currently found. From the Toolserver's enwiki_p, however: mysql> select * from ipblocks where ipb_address ='Mistress_Selina_Kyle' limit 1\G *************************** 1. row *************************** ipb_id: 175820 ipb_address: Mistress_Selina_Kyle ipb_user: 697854 ipb_by: 129409 ipb_reason: restoring block as this was not discussed; this user has been extremely disruptive and was never a useful contributor ipb_timestamp: 20060528233006 ipb_auto: 0 ipb_anon_only: 0 ipb_create_account: 1 ipb_expiry: infinity ipb_range_start: ipb_range_end: ipb_enable_autoblock: 1 ipb_deleted: 0 ipb_block_email: 0 ipb_by_text: SlimVirgin ipb_allow_usertalk: 1 1 row in set (0.10 sec) Similarly, https://en.wikipedia.org/w/api.php?action=query&list=blocks&bkusers=MilkMan_Chocolate only references one block entry. The database actually has two: mysql> select * from ipblocks where ipb_address ='MilkMan_Chocolate' limit 1\G*************************** 1. row *************************** ipb_id: 36301 ipb_address: MilkMan_Chocolate ipb_user: 416818 ipb_by: 61329 ipb_reason: Vandal ipb_timestamp: 20050904050301 ipb_auto: 0 ipb_anon_only: 0 ipb_create_account: 1 ipb_expiry: infinity ipb_range_start: ipb_range_end: ipb_enable_autoblock: 1 ipb_deleted: 0 ipb_block_email: 0 ipb_by_text: Rdsmith4 ipb_allow_usertalk: 1 1 row in set (0.01 sec) mysql> select * from ipblocks where ipb_address ='MilkMan Chocolate' limit 1\G *************************** 1. row *************************** ipb_id: 36680 ipb_address: MilkMan Chocolate ipb_user: 416818 ipb_by: 58666 ipb_reason: sockpuppet, evading block ipb_timestamp: 20050906061037 ipb_auto: 0 ipb_anon_only: 0 ipb_create_account: 1 ipb_expiry: infinity ipb_range_start: ipb_range_end: ipb_enable_autoblock: 1 ipb_deleted: 0 ipb_block_email: 0 ipb_by_text: MarkSweep ipb_allow_usertalk: 1 1 row in set (0.01 sec)
A bit more data. It looks like this phenomenon began in 2004 and ended in 2006: mysql> select ipb_address, ipb_timestamp from ipblocks where ipb_address like '%\_%' order by ipb_timestamp asc limit 1; +--------------------+----------------+ | ipb_address | ipb_timestamp | +--------------------+----------------+ | Live_and_let_Troll | 20041025021840 | +--------------------+----------------+ 1 row in set (0.02 sec) mysql> select ipb_address, ipb_timestamp from ipblocks where ipb_address like '%\_%' order by ipb_timestamp desc limit 1; +--------------------------------+----------------+ | ipb_address | ipb_timestamp | +--------------------------------+----------------+ | I_learned_poker_from_wikipedia | 20060711184521 | +--------------------------------+----------------+ 1 row in set (1.96 sec)
Sounds like we just need to go through all wikis and update the underscores back to spaces UPDATE ipblocks SET ipb_address = REPLACE(ipb_address, '_', ' ') WHERE ipb_address LIKE '%\_%'; Wonder if it's worth adding an updater action for this too, or whether it's a WMF site issue
(In reply to comment #2) > Sounds like we just need to go through all wikis and update the underscores > back to spaces > > UPDATE ipblocks SET ipb_address = REPLACE(ipb_address, '_', ' ') WHERE > ipb_address LIKE '%\_%'; > > Wonder if it's worth adding an updater action for this too, or whether it's a > WMF site issue Well, as I pasted above, some of these (duplicative) block entries apply to the same user. I think you'll need a maintenance script to go through, find instances where ipb_user is not equal to 0 and is non-unique, select the block with the highest ipb_timestamp, delete the older entries, and then fix ipb_address as necessary.
Change 168965 had a related patch set uploaded by Anomie: Maintenance script to clean up mismatched user names in ipblocks https://gerrit.wikimedia.org/r/168965
Change 168965 merged by jenkins-bot: Maintenance script to clean up mismatched user names in ipblocks https://gerrit.wikimedia.org/r/168965
Maintenance script is created, now we just need to run it.
Change 169231 had a related patch set uploaded by Aaron Schulz: Maintenance script to clean up mismatched user names in ipblocks https://gerrit.wikimedia.org/r/169231
Change 169231 merged by jenkins-bot: Maintenance script to clean up mismatched user names in ipblocks https://gerrit.wikimedia.org/r/169231