Last modified: 2013-07-24 13:08:35 UTC
There are several processes that under linux kernel eats significantly more vmem than resident memory. I was looking to find a source of this problem, tested the software on non-linux kernels, and found that this problem ONLY happens in linux. For example process X eats 20mb of resident memory but 1800mb of vmem currently, grid is limiting processes by vmem so these processes has to ask for this huge amount of ram so they can work, even if in fact they eat only little resident memory. This makes grid unusable for processes that need a lot of resident memory, because for these processes the vmem requirements may go to hundreds of gigabytes (for example process that needs 2gb of resident memory, may have 40gb vmem usage in linux kernel, even if it makes no sense) There needs to be some workaround so that processes that use non-sense vmem values can live on grid as well
Not an issue except for processes that map hardware, which nobody should be running on execution hosts (your example, with X, is illustrative: it includes the mmap() of video memory) The vmem usage includes the the data and executable of the process itself, of course, but also include every shared library that is bound at runtime. In practice, it is the maximum amount of memory it /could/ use were it the only one using those shared libraries -- and therefore the correct amount to check against to prevent overcommiting. This is why the execution nodes were given a generous amount of swap; it guarantees we won't overcommit yet allows mostly efficient usage of the actual ram. "(for example process that needs 2gb of resident memory, may have 40gb vmem usage in linux kernel, even if it makes no sense)", on the other hand, is simply wrong. The overhead from shared libraries is fixed.