Last modified: 2014-06-02 12:42:34 UTC
Each time a page is loaded, currently, the prev/next links in the navbar are generated by looping through the entire sections block of the JSON and finding the current page. This is probably quite slow for very large books. Relevant IRC discussion: 14:33 < GorillaWarfare> mwalker: If you look at BookManagerv2.hooks.php on line 399, you'll see it's getting the prev/next sections by looping through each section 14:33 < GorillaWarfare> mwalker: But some books have literally 40,000 sections 14:33 < GorillaWarfare> mwalker: Any thoughts on how to deal wiht that better? 14:33 < mwalker> looking 14:34 < GorillaWarfare> Sure 14:38 < mwalker> GorillaWarfare: so two things come to mind -- first that this if you keep it this way could be cached 14:38 < mwalker> the second thing is; why is it unsuitable to use the list order given in the JSON? 14:39 < mwalker> that way it could be numerically indexed and you wouldn't have this problem 14:39 < GorillaWarfare> Use the list order? 14:40 < mwalker> the order in which they were listed in the raw file 14:42 < mwalker> you have an implicit ordering there which you can actually use as an index -- and if you wanted on page save; you could turn that into an explicit ordering which you can then pass down to JS 14:43 < mwalker> or... because I'm an idiot and now see what you're trying to do 14:43 < mwalker> this hook gets run for a specific page 14:44 < mwalker> so... find the page you're rendering for; and then use that implicit index to construct just those forward/backward links? 14:54 < GorillaWarfare> How would it know which page was which index? 14:56 < mwalker> GorillaWarfare: for more generic cases it's in_array; but in this case you can use array_walk() 14:57 < mwalker> or 14:59 < mwalker> hurm... that's not impressively efficient; your best bet might just be to make an array of index => section name on json page save 14:59 < mwalker> save that in cache 14:59 < mwalker> and then retrieve it 15:00 < mwalker> this is where I wish php had proper trees and tree searching 15:03 < GorillaWarfare> Wait, how is index => section name different from what it's doing now? 15:05 < mwalker> it's not; except for the bit where you can't search for a section name to find it's index 15:05 < mwalker> what you're doing right now is actually probably totally fine so long as you cache it 15:05 < mwalker> it's just that you dont want to do this operation for every page 15:05 < GorillaWarfare> Even for 40,000 sections? 15:05 < GorillaWarfare> Ohh; it is doing it for every page 15:06 < GorillaWarfare> And the navbars aren't currently cached 15:06 < mwalker> check 15:06 < GorillaWarfare> So it could be slooow 15:06 < mwalker> indeed! 15:06 < GorillaWarfare> Hmm... so how would we do this.. 15:06 < GorillaWarfare> Create each navbar at once and cache them sometime? 15:06 < mwalker> the smaller advantage of just having a simple lookup table without the rest of the metadata is that then you dont have to store all the metadata into memcache 15:07 < mwalker> well; pre-generate the metadata required for each navbar 15:08 < mwalker> but with 20k sections you're going to run into other problems -- like how do you even display that 15:09 < GorillaWarfare> Yeaaaah 15:09 < GorillaWarfare> When Raylton was talking about "big books", I don't think I quite envisioned the true scale 15:09 < mwalker> do they have subsections? 15:09 < mwalker> because then you could cache it sort of as a tree 15:11 < GorillaWarfare> We can't count on it 15:12 < mwalker> grrr 15:12 < mwalker> ok -- so we know we have some limited structured data here 15:13 < mwalker> it might make sense to put this into a database table and then it's persistent and queryable 15:13 < GorillaWarfare> Eek 15:13 < GorillaWarfare> That... doesn't sound simple 15:13 < mwalker> probably easier than you think; but it will force you to make some modifications to your algorithms 15:14 < mwalker> because you'll pull in the section ordering data from the database; and the rest of the book metadata from json 15:14 < mwalker> this might help http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ 15:14 < mwalker> just for a concept
Could this bug be good candidate for Google Code-in? Meaning, could an eperienced contributor like e.g. mwalker write a atch to fix this problem within 2-3 hours? If so, can you describe in a new comment the recommended solution? Sending GCI students (or anybody external to this project) to read between the lines in an IRC log is not a good idea. Thank you!
(In reply to comment #1) > Could this bug be good candidate for Google Code-in? Meaning, could an > eperienced contributor like e.g. mwalker write a atch to fix this problem > within 2-3 hours? > I don't think this is a good GCI task.
Change 134627 had a related patch set uploaded by Deepali: Efficient navigation through sections https://gerrit.wikimedia.org/r/134627