Last modified: 2014-03-14 10:38:50 UTC
I need to move it to tools :o
Summary of why wm-bot isn't going to be moved so far: 2 blocker bugs that aren't going to be fixed: https://bugzilla.wikimedia.org/show_bug.cgi?id=51943 - not being fixed result in wm-bot being randomly killed as linux kernel allocates for it non-sense vmem (this happens for unknown reasons, I need to talk to some kernel guru's in order to understand why it happens). Until this is fixed the wm-bot would be very unstable https://bugzilla.wikimedia.org/show_bug.cgi?id=51936 - no query relaying means that very useful module NetCat wouldn't work. This would decrease the bot functionalities significantly and would be very unfortunate. The bug is not going to be fixed which means the wm-bot would have only limited functionalities within tools project In addition there is a number of complex issues that would need to be changed in core only and only in order to make it work within tools project environment (these are rather useless patches that wouldn't need to exist in any other environment). Most of these issues require dozen of classess being rewritten a lot and lot of developer work which result only in simple optimization for tools project, so the end user of wm-bot wouldn't even see any difference (from my point of view useless hard work that produces no fruit). Wm-bot already works on separate instance, utilize it very well, and react very bad if the instance is shared / or running any other processes. The instance can be very small and wm-bot naturally requires little CPU and operating memory (vmem is some nonsense calculated by kernel to which SGEN on tools is bound, so even if WM-Bot itself run perfectly with 500 MB of ram, it would die OOM even if it allocated 2GB of ram on SGEN box). We are already using separate projects / instances on wikimedia project for from my point of view "useless non-sense" like some super-huge dumps of 3rd wiki's, empty instances called just "bob" which nobody knows what they are for and some under optimized bots or tools that consume 30000 times more resources than they would need if they were written properly. For this reason I see no reason why wm-bot couldn't have own super small instance where it happily lives, instead of being migrated to complex grid such as tool labs which is more than unsuitable for a bot like this. On other hand I can see a number of reasons why it SHOULD run on separate instance. One of them is simply, that it would need less resources. As I already mentioned wm-bot can happily live with minimum RAM, because of SGEN limitations, it would however need to request at least 2gb or more of VMEM for it to work, which is significantly more than it needs and a huge waste. Given the architecture of bot, being able to access the instance where it lives is very helpful (not possible on tools) as well as being able to setup multiple separate filesystems for different components of bot (for IO optimizations) not possible on labs as well. In nutshell: running wm-bot on tools grid is as easy as running oracle or postgre rdbms on tools grid.
btw that doesn't mean wm-bot is rdbms itself, but it consist of many separate components that interact with each other (such as delayed IO writer, telnet listener, queues, caches and buffers and shared memory pools) that are not easy to spread across multiple servers.
In addition I started using btrfs snaphots to clone the logs / databases using COW for user backups as well as system backups and in order to generate log tarballs (generating a log tarball may take several minutes during which the bot is writing to these log files, this caused random issues with tar). There is no btrfs on tools and even if there was, this requires root
Petr: Thanks for your explanation here. Appreciated!