Last modified: 2014-10-28 13:23:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T36014, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 34014 - ipblocks.ipb_address has inconsistent format
ipblocks.ipb_address has inconsistent format
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2012-01-29 08:17 UTC by MZMcBride
Modified: 2014-10-28 13:23 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-01-29 08:17:16 UTC
Usernames of block users are stored in ipblocks.ipb_address. This field generally uses spaces in place of underscores. However many entries are inexplicably using underscores. This is screwing up the API.

Example:

https://en.wikipedia.org/w/api.php?action=query&list=blocks&bkusers=Mistress+Selina+Kyle

No results are currently found.

From the Toolserver's enwiki_p, however:

mysql> select * from ipblocks where ipb_address ='Mistress_Selina_Kyle' limit 1\G
*************************** 1. row ***************************
              ipb_id: 175820
         ipb_address: Mistress_Selina_Kyle
            ipb_user: 697854
              ipb_by: 129409
          ipb_reason: restoring block as this was not discussed; this user has been extremely disruptive and was never a useful contributor
       ipb_timestamp: 20060528233006
            ipb_auto: 0
       ipb_anon_only: 0
  ipb_create_account: 1
          ipb_expiry: infinity
     ipb_range_start: 
       ipb_range_end: 
ipb_enable_autoblock: 1
         ipb_deleted: 0
     ipb_block_email: 0
         ipb_by_text: SlimVirgin
  ipb_allow_usertalk: 1
1 row in set (0.10 sec)

Similarly, https://en.wikipedia.org/w/api.php?action=query&list=blocks&bkusers=MilkMan_Chocolate only references one block entry. The database actually has two:

mysql> select * from ipblocks where ipb_address ='MilkMan_Chocolate' limit 1\G*************************** 1. row ***************************
              ipb_id: 36301
         ipb_address: MilkMan_Chocolate
            ipb_user: 416818
              ipb_by: 61329
          ipb_reason: Vandal
       ipb_timestamp: 20050904050301
            ipb_auto: 0
       ipb_anon_only: 0
  ipb_create_account: 1
          ipb_expiry: infinity
     ipb_range_start: 
       ipb_range_end: 
ipb_enable_autoblock: 1
         ipb_deleted: 0
     ipb_block_email: 0
         ipb_by_text: Rdsmith4
  ipb_allow_usertalk: 1
1 row in set (0.01 sec)

mysql> select * from ipblocks where ipb_address ='MilkMan Chocolate' limit 1\G
*************************** 1. row ***************************
              ipb_id: 36680
         ipb_address: MilkMan Chocolate
            ipb_user: 416818
              ipb_by: 58666
          ipb_reason: sockpuppet, evading block
       ipb_timestamp: 20050906061037
            ipb_auto: 0
       ipb_anon_only: 0
  ipb_create_account: 1
          ipb_expiry: infinity
     ipb_range_start: 
       ipb_range_end: 
ipb_enable_autoblock: 1
         ipb_deleted: 0
     ipb_block_email: 0
         ipb_by_text: MarkSweep
  ipb_allow_usertalk: 1
1 row in set (0.01 sec)
Comment 1 MZMcBride 2012-01-29 08:23:57 UTC
A bit more data. It looks like this phenomenon began in 2004 and ended in 2006:

mysql> select ipb_address, ipb_timestamp from ipblocks where ipb_address like '%\_%' order by ipb_timestamp asc limit 1;
+--------------------+----------------+
| ipb_address        | ipb_timestamp  |
+--------------------+----------------+
| Live_and_let_Troll | 20041025021840 |
+--------------------+----------------+
1 row in set (0.02 sec)

mysql> select ipb_address, ipb_timestamp from ipblocks where ipb_address like '%\_%' order by ipb_timestamp desc limit 1;
+--------------------------------+----------------+
| ipb_address                    | ipb_timestamp  |
+--------------------------------+----------------+
| I_learned_poker_from_wikipedia | 20060711184521 |
+--------------------------------+----------------+
1 row in set (1.96 sec)
Comment 2 Sam Reed (reedy) 2012-01-29 14:48:18 UTC
Sounds like we just need to go through all wikis and update the underscores back to spaces

UPDATE ipblocks SET ipb_address = REPLACE(ipb_address, '_', ' ') WHERE ipb_address LIKE '%\_%';

Wonder if it's worth adding an updater action for this too, or whether it's a WMF site issue
Comment 3 MZMcBride 2012-01-29 19:15:48 UTC
(In reply to comment #2)
> Sounds like we just need to go through all wikis and update the underscores
> back to spaces
> 
> UPDATE ipblocks SET ipb_address = REPLACE(ipb_address, '_', ' ') WHERE
> ipb_address LIKE '%\_%';
> 
> Wonder if it's worth adding an updater action for this too, or whether it's a
> WMF site issue

Well, as I pasted above, some of these (duplicative) block entries apply to the same user. I think you'll need a maintenance script to go through, find instances where ipb_user is not equal to 0 and is non-unique, select the block with the highest ipb_timestamp, delete the older entries, and then fix ipb_address as necessary.
Comment 4 Gerrit Notification Bot 2014-10-27 14:48:31 UTC
Change 168965 had a related patch set uploaded by Anomie:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/168965
Comment 5 Gerrit Notification Bot 2014-10-27 21:00:56 UTC
Change 168965 merged by jenkins-bot:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/168965
Comment 6 Brad Jorsch 2014-10-27 21:03:50 UTC
Maintenance script is created, now we just need to run it.
Comment 7 Gerrit Notification Bot 2014-10-27 22:09:12 UTC
Change 169231 had a related patch set uploaded by Aaron Schulz:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/169231
Comment 8 Gerrit Notification Bot 2014-10-27 22:16:43 UTC
Change 169231 merged by jenkins-bot:
Maintenance script to clean up mismatched user names in ipblocks

https://gerrit.wikimedia.org/r/169231

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links