Last modified: 2014-08-07 08:00:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67246, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 65246 - unread count for watchlist changes is not available via API
unread count for watchlist changes is not available via API
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Unprioritized normal (vote)
: ---
Assigned To: Brad Jorsch
:
Depends on: 66089
Blocks: CPB
  Show dependency treegraph
 
Reported: 2014-05-13 07:08 UTC by Jared Zimmerman (WMF)
Modified: 2014-08-07 08:00 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Jared Zimmerman (WMF) 2014-05-13 07:08:06 UTC

    
Comment 1 Brad Jorsch 2014-05-13 17:04:04 UTC
"unread count for watchlist changes" meaning what exactly? The number of entries in the user's watchlist with wl_notificationtimestamp set?

On that assumption, I'd lean towards meta=userinfo as the best place for it. I'd also want to see Sean Pringle comment on whether "SELECT COUNT(*) FROM watchlist WHERE wl_user = ? AND wl_notificationtimestamp IS NOT NULL" would blow up the database if someone has a watchlist with a million entries and whether it should be aimed at the watchlist group or the general group or some other group.
Comment 2 Jared Zimmerman (WMF) 2014-05-13 18:01:04 UTC
Let me try to clarify…

When you go to your watchlist which have changed since you've seen them in the last XX days are displayed in bold. Currently there is no obvious way to get a count of these bolded items. We're obviously tracking them somehow, even if it's just on the fly. This bug is to create/discover a logical way to at minimum get a count of these items (unread/unviewed watchlisted items with changes) and ideally we able to generate a list of these pages as well.
Comment 3 Brad Jorsch 2014-05-13 18:35:57 UTC
Ok, that matches my assumption. Thanks for clarifying.

Once we have queries that we're reasonably sure won't kill the database, adding it to the API should be easy.

Conceptually, getting the count is as simple as "SELECT COUNT(*) FROM watchlist WHERE wl_user = ? AND wl_notificationtimestamp IS NOT NULL" and getting the list of only the unread changes would be a matter of adding "AND wl_notificationtimestamp IS NOT NULL" to the existing queries in list=watchlist.[1] But if someone has a watchlist with 1000000 pages and only 10 of them are unseen, that might overload the database. So it might be that we need to add an index on (wl_user,wl_notificationtimestamp), or do things in a less perfect way (e.g. fetch the 500 lines that would actually be displayed on Special:Watchlist using the existing queries and count up which ones would be bold). Part of my problem is that the watchlist queries are already fairly awful by my rules-of-thumb so I can't tell if they're getting worse or not. And Sean knows much more about the database stuff than I do, which is why I suggest asking him.


 [1]: Those queries look something like "SELECT ... FROM recentchanges INNER JOIN watchlist ON (wl_user = ? AND wl_namespace = rc_namespace AND wl_title = rc_title) LEFT JOIN page ON (rc_cur_id = page_id) WHERE rc_timestamp >= ? AND rc_timestamp <= ? AND wl_namespace IN (...) AND (rc_this_oldid=page_latest OR rc_type=3) [and maybe more filtering on rc_minor/rc_bot/rc_user/rc_patrolled/rc_type fields here] AND (rc_type!=3 OR (rc_deleted&9)!=9) ORDER BY rc_timestamp, rc_id LIMIT 5001
Comment 4 Jared Zimmerman (WMF) 2014-05-13 21:27:43 UTC
How its displayed can be resolved, but might also imply a cap on querying this, e.g. if we detect a slowdown on watchlists with greater than 1000, 10,000 100,000 unread (or total) items. We could also limit the query to changes within the last 30 day (or less)
Comment 5 Sean Pringle 2014-05-15 03:15:09 UTC
We already have a couple of COUNT(*) queries around for watchlists, like countItems(), so I think this would be OK to consider as long as it stays a simple select without joins. The extra covering index would be necessary.

Using countItems() as the example; it accepts a db connection arg, so I suggest stick with that approach and use the api group connection for api-generated traffic.

Both watchlist and api groups are on specific slaves now, so if needs be we can do further tuning for them relatively easily, such as partitioning by wl_user.

To be safe we could also:

a) Limit by date as Jared said. Presumably wouldn't help with run-away bots generating large lists, though.

b) Cap the number of rows hit using a subquery, like:

SELECT COUNT(*) FROM (SELECT wl_user FROM watchlist WHERE wl_user = ? AND wl_notificationtimestamp IS NOT NULL LIMIT 1001);

If < 1000, count is accurate.

If > 1000, be vague: "more than..", "thousands", a magic constant, etc.
Comment 6 Sean Pringle 2014-05-15 03:31:32 UTC
Looking through some of the complex joins used for watchlist display: adding wl_notificationtimestamp IS NOT NULL doesn't really affect the query plans. I say trial it.
Comment 7 Brad Jorsch 2014-05-15 15:32:55 UTC
Ok, I'll see about writing patches at some point soon. One to add the index, and then a followup to add the features to the API.

Limiting the count by newer than $wgRCMaxAge makes sense, since the watchlist can't show anything older than that anyway.
Comment 8 Jared Zimmerman (WMF) 2014-05-15 19:29:45 UTC
@Brad, thank you, much appreciated.
Comment 9 Jared Zimmerman (WMF) 2014-06-02 23:47:46 UTC
@Brad, is this something you're actively working on, wanted to know if its something we can plan on working for the next version of the compact personal bar…
Comment 10 Brad Jorsch 2014-06-03 19:56:54 UTC
(In reply to Jared Zimmerman (WMF) from comment #9)
> @Brad, is this something you're actively working on

It wasn't, but I did it this afternoon. Patches to be uploaded momentarily.


(In reply to Brad Jorsch from comment #7)
> Limiting the count by newer than $wgRCMaxAge makes sense, since the
> watchlist can't show anything older than that anyway.

Now that I think about it again, it doesn't make sense. If you haven't checked your watchlist in 2 months, wl_notificationtimestamp for [[Barack Obama]] might be 2 months ago but it'll still show up in your watchlist as unread if it was edited today. To really put a date limit on it we'd need to join with recentchanges, which I doubt would be a net gain.
Comment 11 Gerrit Notification Bot 2014-06-03 19:58:24 UTC
Change 137147 had a related patch set uploaded by Anomie:
API: Add show=unread to ApiQueryWatchlist

https://gerrit.wikimedia.org/r/137147
Comment 12 Gerrit Notification Bot 2014-06-03 19:58:39 UTC
Change 137149 had a related patch set uploaded by Anomie:
Add wl_user_notificationtimestamp index

https://gerrit.wikimedia.org/r/137149
Comment 13 Gerrit Notification Bot 2014-06-03 19:58:44 UTC
Change 137150 had a related patch set uploaded by Anomie:
API: Add prop=unreadcount to ApiQueryUserInfo

https://gerrit.wikimedia.org/r/137150
Comment 14 Gerrit Notification Bot 2014-06-04 06:10:22 UTC
Change 137149 merged by jenkins-bot:
Add wl_user_notificationtimestamp index

https://gerrit.wikimedia.org/r/137149
Comment 15 Gerrit Notification Bot 2014-06-09 17:48:27 UTC
Change 137150 merged by jenkins-bot:
API: Add prop=unreadcount to ApiQueryUserInfo

https://gerrit.wikimedia.org/r/137150
Comment 16 Gerrit Notification Bot 2014-06-09 18:47:46 UTC
Change 137147 merged by jenkins-bot:
API: Add show=unread to ApiQueryWatchlist

https://gerrit.wikimedia.org/r/137147
Comment 17 Brad Jorsch 2014-06-09 18:51:14 UTC
All the changes have been merged now. This should be deployed to WMF wikis with 1.24wmf9, see https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap for the schedule.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links