mourning sanity

Sunday, October 19, 2008

out-of-memory: a sad case

the problem..

while Turing machine's tape is infinite, all real world programs are running within some resource constraints -- there is no such thing as infinite memory. for some programs that ain't a problem -- amount of memory needed by the algoritm can bee known beforehands, at programming time. but for most real world applications memory requirements are not known until run time, and sometimes it is very hard to predict how much does it need.

obvious example is an application that allocates memory according to a user input -- for example, an image editor asks user for a dimension of an image he'd like to create. it needs to allocate an array in memory for the image (to have a fast access), so when dimensions exceed possible bounds, good-behaving application should notify user -- and user can either reduce image size or, perhaps, upgrade his machine, if he really wants to work with large images. while it is fairly easy to implement such functionality on a single-task OS, for modern environments it's not so -- in presence of virtual memory it is really hard to define what is "available memory": while technically memory is available, using it might lead to severe thrashin (then updating image will take ages -- that's not what user needs). another problem is that many of modern operating systems (such as Linux) allow to over-allocate memory (knowing that software often does not use all the memory it allocates) -- it will report success on allocation, but attempts to use memory will produce an error.

another example of a case when software might need to know memory stats is caching -- usually, a bigger cache leads to a better performance, but once cache grows too big, it causes system thrashing (due to swap memory used) or a program crash. thus, program needs some way to detect low memory conditions to curb caches in a sane bounds.

and one more example -- one close to what i'm working with -- a web application server. stateless design is considered to be superior from scalability point of view -- it only needs to allocate resources for currently active connections, and typically it's not much. but stateful design (with per-session data) often is beneficial from functionality or ease of programming aspects. notable example is continuation-based frameworks (such as UnCommon Web): they have to capture current execution state in a continuation and save it to be able to resume computation after user input. some frameworks keep this state literally for each request client have sent (to be able to work correctly with back-button and multiple windows), and needlesss to say that can eat a lot of memory on a busy server. whether this continuation stuff actually makes sense and what are alternative ways to do it is out of scope of this article, but if we choose such apporach with server-side state kept in memory, we obviously have to deal with out-of-memory conditions. for example, framework might choose to catch low-memory condition and run some sort of cleanup to get rid of outdated session data.

..and how to deal with it..

there are some approaches that can be used to deal with these issues (sometimes they are used in a combination), but all of them are less then perfect.

  • application-specific limits: for example, by default UCW can allocate up to 15000 sessions per application, and then it refuses creating new ones. i guess it helps to provide smoother user experience then it would do otherwise (users who are already working with system won't lose their data in case of inflow of new users), but obviously it has problems -- these sessions are not anyhow related to real memory usage. if you need 1 GB for 15000 sessions, but your server has only 512 MB, surge of users will blow server, and session limit won't help. on the other hand, if you set limit too low, you will under-use your server resources.
  • detecting out-of-memory conditions on application level: in good old days, you could just check what malloc() returns -- if it returns 0, you're out of memory. (it is another story that it is damn hard to check this everywhere, and once you've got an error it is hard to deal with it, provided that you cannot allocate new memory.) but some modern operating systems (like Linux) lie -- it will give you some memory address, but once try you use it, it will try to kill you. or kill some other process to satisfy your needs.. this is as weird as it sounds/li>
  • operating system level OOM management: aforementioned Linux has a mechanism called OOM-killer -- that is, when it detects that system is very low on memory, it tries to free memory killing some processes, using heuristics to find best candidates. apparently, this doesn't always work good. for example, i once checked what happens when a web server (UCW/SBCL) tries to grab all memory for itself: first Linux went very unresponsive, with that abusing SBCL process eating almost 100% CPU time (on all 8 CPUs, damn). (it wasn't swapping because i didn't enable swap to prevent thrashing.) then, after some time of "hanging", it slowly started killing innocent postgres child processes -- because OOM-killer heuristics say it to kill children if there are many of them. this did not help, of course, so i ended this stagnation rebooting machine.
  • heuristics: in theory you could watch for some parameter like "memory available", but there is no such parameter on a system with virtual memory (and, by the way, virtual memory itself works in very heuristical way). and even if you do that, most likely that will end up like a thing pretty close to oom-killer, that is, not very good
  • limit memory per application: that is, you allocate some maximum memory usage for each application, which it should not exceed in any circumstances. this can be enforced either via OS mechanisms (see ulimit), in application-specific way -- for example, many Lisp implementations allow to set maximum heap size. what happens when application approaches the limit depends on application -- most likely you want it to do some cleanup, but if nothing helps, application should be terminated (and then restarted). in any case, the idea is to prevent whole system panick and thrashing. the problem with it is that it is damn hard to allocate these limits in a system with multiple processes.
    say, out of a total of 500 MB of RAM, you allocate 200 MB for a database server , 200 MB for web/application server, and rest (100 MB) for kernel and other processes. of course the problem is "other processes" (that can include backup and update scripts, etc) -- if they exceed 100 MB (together with kernel), whole system will go low on memory, and oom-killer can start shooting, with unexpected consequences. on the other hand, too pessimistic limits will lead to resource under-use.

what to do then? personally, i find memory limits per application + whole system protection (oom-killer or custom heuristic script) being a reasonable combination: application should try to work within a specified limit, cleaning up caches if needed, or restarting if nothing helps. for an optimal resources usage these limits should be set in rather optimistic way, and if something unexpected happens, oom-killer will try to sort out things (it is possible to tune it to prefer to kill guilty/unimportant processes)

..in Common Lisp..

many CL implementations allow to specify reserved memory size (heap size) and operate within that limits. when limit is reached and garbage collection does not free up memory anymore, it should signal an out-of-memory condition, stack is unwound to some point, freeing up some memory. then application can catch this error and do application-specific cleanup, like clearing caches or purging sessions or whatever. then it can continue its operations. or, if application detects that it is totally botched, it can restart itself in some way. that is very good in theory..

Heap exhausted, game over.

there are some gotchas that make this working less than perfect:

  • out-of-memory can happen at any point in program, particularly, when you do not expect this (out of your error handlers)
  • cleanup procedures operate in low-memory conditions. if they'd try to allocate some memory for temporary lists, this can trigger second OOM condition, to which you won't be prepared

but these gotchas are not as bad as glitches from CL implementations. one of issues, implementation can try freeing up memory too hard before signalling OOM: for example, you have just 1KB of memory left, and your application generates lots of garbage. you will fill that 1KB very quickly, then implementation does full GC (which takes considerable time), and cleans almost all of these 1KB. then you fill it quickly again.. your application ends up spending 99.9% of time doing garbage collection, that is almost as bad as system thrashing.

another problem is that many implementations will try to squeeze memory up to last bit, and then they won't be even able to signal a condition, so they crash (or hange) whole process instead. also it seems that garbage collection has nasty corner cases when very little is left, so it can die during GC.

for my testing i've used a function like this:

(defun testmem (size)
  (ignore-errors
    (loop for i upfrom 1 collect (make-array size))))

the idea is to try different allocation sizes -- as described above, allocating bigger chunks is less problematic than smaller ones (but in real world situation more smaller one (like conses) are allocated). ignore-errors is there to avoid breaking into debugger, as that can be problematic in low-memory situation.

SBCL (1.0.13, on Debian GNU/Linux 4.0, x86)

i've limited heap to 100MB (sbcl --dynamic-space-size 100) and runned test with size of 10000. a real surprise is that ignore-errors have not caught OOM condition, SBCL reported it like this:

Heap exhausted during allocation: 114688 bytes available, 400008 requested.

debugger invoked on a SB-KERNEL::HEAP-EXHAUSTED-ERROR in thread...

SBCL does not consider SB-KERNEL::HEAP-EXHAUSTED-ERROR being a CL:ERROR. but it is possible to catch it with a (handler-case ... (SB-KERNEL::HEAP-EXHAUSTED-ERROR ())). making this change to a testing code, it indeed was able to catch an error, but it crashed later during GC:

* (testmem 10000)
Heap exhausted during allocation: 8192 bytes available, 40008 requested.

NIL
* (room)

Dynamic space usage is:   102,380,208 bytes.
Read-only space usage is:      1,912 bytes.
...
Breakdown for dynamic space:
  79,359,744 bytes for    61,350 simple-vector objects.
...
* (sb-ext:gc :full t)
Heap exhausted during garbage collection: 0 bytes available, 984 requested.
...
fatal error encountered in SBCL pid 5634(tid 3085342400):
Heap exhausted, game over.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>

for a size of 100000 it was working fine (successfully lived through a few cycles), but that is not really inspiring. on the other hand, size of 1 produced instant crash:

* (testmem 1)
Heap exhausted during garbage collection: 0 bytes available, 16 requested.

that sucks..

SBCL 1.0.12 with soft heap limit

as David Lichteblau have pointed in comment, he have made an experimental SBCL branch that handles memory allocation in somewhat different way than mainline SBCL -- instead of reserving huge chunk of memory upfront, it can allocate space incrementally, as needed. this allows to treat dynamic-space-size parameter as a soft heap limit rather than a hard one, and this makes a big difference -- it is possible to temporarily go beyond the limit in emergency situation to do cleanup and garbage collection, and then return to a soft limited bounds. here is the example transcript of an interactive session -- David shows how user can choose to change limit in case of OOM condition. that is a great feature for interactive development involving large amounts of data -- this allows user to decide whether to continue computation that demands excessive amount of memory or to abort it. according to description in non-interactive environment it will do the right thing automatically -- limit won't be enforced during GC and error unwinding, thus allowing doing cleanup. it still will signal a condition allowing application to know about an out of memory situation.

this makes version with a soft heap size limit pretty attractive for those who'd like to hanlde OOM conditions in a predictable way. however, one should take into account that process can go off limit while doing cleanup, so it makes sense to create a larger swap file for temporarily handling excess allocation without triggering OOM-killer.

CMUCL (19d, on Debian GNU/Linux 4.0 x86)

in CMUCL situation was pretty similar to SBCL:

* (testmem 10000)
; [GC threshold exceeded with 12,054,816 bytes in use.  Commencing GC.]
; [GC completed with 11,952,672 bytes retained and 102,144 bytes freed.]
...
; [GC will next occur when at least 108,280,736 bytes are in use.]
*A2 gc_alloc_large failed, nbytes=40008.
 CMUCL has run out of dynamic heap space (100 MB).
  You can control heap size with the -dynamic-space-size commandline option.

Imminent dynamic space overflow has occurred:
Only a small amount of dynamic space is available now. Please note that you will be returned to the Top-Level without warning if you run out of space while debugging.

Heap (dynamic space) overflow
   [Condition of type KERNEL:HEAP-OVERFLOW]

Restarts:
  0: [ABORT] Return to Top-Level.

with handler-case it returned from function, but than crashed in (room). testing with size of 100000 revealed pretty interesting thing:

; [GC completed with 24,070,808 bytes retained and -2,008 bytes freed.]
; [GC will next occur when at least 36,070,808 bytes are in use.]

that is, GC freed negative amount of memory, probably that's why it crashes during GC. despite it returned from function, further operations were botched in CMUCL (even with size of 1000000), and i gave up. with size of 1 CMUCL hanged.

Scieneer CL (1.3.8.1, on same Debian)

maybe commercial offspring of CMUCL does better? let's see. (-dynamic-space-size option is not documented, but it seems to work like in CMUCL)

$ /opt/scl/bin/lisp -dynamic-space-size 100
...
* (testmem 100000)
Scieneer Common Lisp: Internal error, quitting.
Scieneer Common Lisp: Debug dump written to: /tmp/scl-debug-dump-5733.txt

$ cat /tmp/scl-debug-dump-5733.txt
Error event log:
Thread 0xB7C5BBB0: GC alloc_large failed, nbytes=400008.
Thread 0xB7C5BBB0: Lisp memory allocation failure.
...

i like how it created really nice and detailed dump, but still this sucks. size of 1 produced same result. at least it did not hang..

Clozure Common Lisp (Version 1.2-r10552 on Ubuntu 8.04 x86-64)

Clozure CL has --heap-reserve parameter, but it works in a really weird way. for example, with reserve of 100 000 000, (room) says -976.359 MB reserved for heap expansion., that is, it is negative. and it does not joke -- testmem quickly signaled some meaningless internal error when that value was used. so i guess i have to adjust it somehow for some internal reservation. experimenting with parameter, i've found that with reserve of 1 300 000 000 it says 132.375 MB reserved for heap expansion. ok, and now lets test it..
nothing new -- with 100000 (and more) it is enters into debugger and is able to recover (it did not hint how can i catch error programmatically, though), with 10000 (and less) it hangs without making any progress. (probably endless GC loop)

CLISP (2.41 on Debian x86)

CLISP does not have a parameter to limit memory allocation, so i could only test OS mechanisms. in default configuration, this went as expected -- oom-killer first killed nearby SBCL process that was not allocating any memory at time, and only then killed CLISP that was allocating memory.

with ulimit, i could limit maximum virtual memory to 100MB, and it was able to detect an error:

[2]> (testmem 100000)
map memory to address 0x25a55000 .
[spvw_mmap.d:359] errno = ENOMEM: Not enough memory.
Trying to make room through a GC...
Cannot map memory to address 0x25a54000 .

*** - No more room for LISP objects
Break 1 [3]>

trying to use debugger hanged CLISP, making it constantly "trying to make room through GC", but if i just instantly abort, it recovered itself fine. with small memory allocations (size of 1) it went into constant GC without first giving a chance in debugger..

ECL (0.9i on Debian x86)

situation with ECL is similar to CLISP -- it handled size of 10000 just fine (and ignore-errors worked in ECL!), but size of 1 knocked it out:

> (testmem 1)
GC Warning: Out of Memory!  Returning NIL!
GC Warning: Out of Memory!  Returning NIL!
Segmentation fault

UPDATE: as Juanjo writes in comment, handling of out of memory conditions was improved in a new version of ECL, and it correctly handles case with size of 1 now.

ABCL (0.0.10.5 on Sun Java 1.5.0 on Windows XP x86)

JVM-based ABCL successfully caught out-of-memory condition, it ignored ignore-errors/handler-case constructions though:

> java.exe -Xloggc:gclog.txt -server -Xmx100M -jar abcl.jar

CL-USER(2): (testmem 1)
Debugger invoked on condition of type ERROR:
  Out of memory.
Restarts:
  0: TOP-LEVEL Return to top level.

what's interesting, it handled allocation of size 1 without problems, and "stagnation" period only lasted for 12 seconds, about 60% of time. stagnation becomes a problem with larger heap, for example, it took really long time with 300 MB, but in this case alternative GC algorithm comes to help -- parralel GC "throws an out-of-memory exception if an excessive amount of time is being spent collecting a small amount of the heap". it has configurable parameters, but default parameters were good enough for me.

Lispwork

Sean Ross have tested Lispworks on Ubuntu and Mac OS X platform and found out that it does the right thing:

CL-USER> (princ (nth-value 1 (testmem 10000))) ;; or (testmem 1)
<!>  Failed to enlarge memory
<!>  Failed to enlarge memory
<!>  Failed to enlarge memory
<!>  Failed to enlarge memory
<**> Failed to allocate object of size 9c48
Failed to allocate object of size 40008 bytes ;; <- thats our error
=> #<SIMPLE-ERROR 200906AB>

this worked woth on Linux when memory was limited via ulimit -v and on Mac OS X where it have reached "natural" memory limit

Conclusion

only a few implementations (experimental SBCL version, Lispworks and ABCL) were able to handle out of memory conditions correctly (on small allocations), majority either crashed or hanged, which is sort of sad.

while some "real" Lisp implementations correctly handled the challenge, i still think JVM-based ABCL did it in slightly superior way -- it stayed within hard limits, and it is possible to configure how hard does it try before signalling OOM. (however for now i can't check that "real" Lisp GCs are actually worse)

P.S.: i know i used quite outdated versions, and i expect angry replies like "we've fixed it just yesterday, why didn't you test CVS version??". so, please, if you think situation is better than i described -- please send me a version and a log that proves so, and i'll update article

4 comments:

David Lichteblau said...

Hi Alex,

in the spirit of "we've fixed it just yesterday", I'd like to point out my experimental SBCL branch, which has incremental heap allocation (more friendly with a Linux that doesn't allow overcommit) and a soft heap limit (allowing run-away memory allocations to be caught in a reasonably safe way).

With my changes, --dynamic-space-size on the SBCL command line specifies a soft limit only, which can be enlarged if exhausted, allowing program execution to continue.

For a transcript of your test case see here:

http://www.lichteblau.com/tmp/soft-heap-limit.text

Patch and discussion are here:

http://article.gmane.org/gmane.lisp.steel-bank.devel/10506

nikodemus said...

Trivia: I believe that heap exhaustion and stack exhaustion should not be CL:ERRORs, but rather CL:STORAGE-CONDITIONs (subclass of CL:SERIOUS-CONDITION).

True, they could be both ERRORs and STORAGE-CONDITIONs, but that doesn't seem to add any more information of consequence, and making IGNORE-ERRORS handle heap exhaustion doesn't sound quite right either...

Juanjo said...

The next release of ECL will also feature detection and handling of out of memory conditions, using the facilities provided by the Boehm-Weiser garbage collector. The code is currently in an unstable branch of our GIT repository, but seems to work just nice.

$ ecl -norc
ECL (Embeddable Common-Lisp) 8.10.0 (CVS 2008-07-12 18:54)
Copyright (C) 1984 Taiichi Yuasa and Masami Hagiya
Copyright (C) 1993 Giuseppe Attardi
Copyright (C) 2000 Juan J. Garcia-Ripoll
ECL is free software, and you are welcome to redistribute it
under certain conditions; see file 'Copyright' for details.
Type :h for Help. Top level.
> (defun testmem (size) (ignore-errors (loop for i upfrom 1 collect (make-array size))))

TESTMEM
> (testmem 1)
GC Warning: Out of Memory! Returning NIL!
Memory limit reached. Please jump to an outer point or quit program.
Broken at SI:BYTECODES.No restarts available.
Broken at TESTMEM.
>> :q
Top level.
>

adam said...

Hi.
I have :
Using Lisp GNU Common Lisp (GCL) GCL 2.6.8 (aka GCL)
thru :
wxMaxima 0.7.6 http://wxmaxima.sourceforge.net
Maxima 5.16.3 http://maxima.sourceforge.net

I have tried :

==================
(%i1) to_lisp();
Type (to-maxima) to restart, ($quit) to quit Maxima.
MAXIMA> (room)
1637/2358 73.6% CONS RATIO COMPLEX STRUCTURE
74/345 11.8% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE READTABLE SPICE
204/404 91.2% SYMBOL STREAM
1/2 32.1% PACKAGE
90/373 89.3% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME CCLOSURE CLOSURE
268/317 99.9% STRING
305/325 7.7% CFUN BIGNUM LONG-FLOAT
30/115 98.2% SFUN GFUN VFUN AFUN CFDATA
998/1426 contiguous (541 blocks)
13013 hole
5242 1.7% relocatable
2609 pages for cells
21862 total pages
99303 pages available
9907 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
MAXIMA> (progn
(si::allocate 'cons 10000 t)
(si::allocate 'fixnum 200 t)
(si::allocate 'symbol 100 t)
(si::allocate-relocatable-pages 2000 t)
(si::allocate 'cfun 1000 t))
T
MAXIMA> (room)
10000/10000 12.1% CONS RATIO COMPLEX STRUCTURE
200/200 4.5% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE READTABLE SPICE
204/204 91.2% SYMBOL STREAM
1/2 32.1% PACKAGE
90/373 89.3% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME CCLOSURE CLOSURE
268/317 99.9% STRING
1000/1000 2.4% CFUN BIGNUM LONG-FLOAT
30/115 98.2% SFUN GFUN VFUN AFUN CFDATA
998/1426 contiguous (541 blocks)
3829 hole
2000 4.6% relocatable
11793 pages for cells
18620 total pages
96603 pages available
15849 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages
MAXIMA> (si::allocate 'string 2000 t)
T
MAXIMA> (room)
10000/10000 12.1% CONS RATIO COMPLEX STRUCTURE
200/200 4.6% FIXNUM SHORT-FLOAT CHARACTER RANDOM-STATE READTABLE SPICE
204/204 91.2% SYMBOL STREAM
1/2 32.1% PACKAGE
90/373 89.3% ARRAY HASH-TABLE VECTOR BIT-VECTOR PATHNAME CCLOSURE CLOSURE
2000/2000 13.4% STRING
1000/1000 2.4% CFUN BIGNUM LONG-FLOAT
30/115 98.2% SFUN GFUN VFUN AFUN CFDATA
998/1426 contiguous (541 blocks)
2097 hole
2000 4.6% relocatable
13525 pages for cells
18620 total pages
94871 pages available
17581 pages in heap but not gc'd + pages needed for gc marking
131072 maximum pages

======================

I have run testmem function but I have no result ( even for 1 it has worked some time with no effect so I have restarted).
How many seconds it needs ?

Adam