These are my notes taken from Gil Tene's talk on Java garbage collection. It covers characteristics of garbage collectors and how the different garbage collectors in java compare.
Classifying Garbage Collectors
Here are common terms used to classify garbage collectors. Note that a garbage collector can be a combination of these.
- Concurrent
Collector performs garbage collection work concurrently with the application's own execution. - Parallel
Collector uses multiple CPUs to perform garbage collection work. - Stop-the-world
(Opposite of concurrent) Collector performs garbage collection while the application is completely stopped. - Monolithic
Collector does all the work in a single indivisible step. - Incremental
(Opposite of monolithic) Collector performs work as a series of smaller steps. - Mostly
Qualifier used to say something is sometimes not what it is (e.g., mostly concurrent collector).
Evaluating Garbage Collectors
- Throughput
How much time is spent doing actual application work vs garbage collection? - Latency
How does GC affect individual app operations? - Footprint
How much extra memory is needed for the GC? - Scalability
How do these metrics change as heap sizes decrease? ZGC for example works well with large heaps (terabytes)
What do all GCs do?
- Identify live objects in heap
- Reclaim resources held by dead objects
- Periodically relocate live objects
Garbage Collection Terms
- Live set
Objects in the heap that are still live (reachable). - Safepoint
Point in a thread's execution when the collector can identify all the references in that execution stack. - Bringing a thread to a safepoint
Act of getting a thread to reach a safepoint (but not past it). Generally done by stopping the thread, but JNI is an exception, since the thread can continue to run while in the safepoint. - Global Safepoint
All threads are in a safepoint. - Stop-the-world pause
A global safepoint.
Garbage Collection Types
Mark/Sweep/Compact Collector
Mark
- Start from roots (thread stacks, static, etc)
- Mark everything you find as live (basically a graph traversal)
- At the end, all reachable objects are marked live and all others are marked dead
- Work is generally linear to live-set
Sweep
- Scan the heap and track/free the dead objects
- Scan's the entire heap
Compact
- Over time, heap will get "swiss cheesed"
- The free space (holes) might not be large enough to accommodate new allocations (fragmentation)
- Compaction moves live objects together to reclaim contiguous empty space
- Has to fix pointers to ALL objects (when moving an object, all things that point to it also have to update their pointers)
- Work is linear to the live set - only the live objects need to be fixed
Copy Collector
- Heap region is divided into a "from" space and a "to" space, meaning this collector requires 2X max
- Moves all objects from a "from" space to a "to" space and reclaims the "from" space
- Copy root references to "from" space then follow with everything else
- At the end, all live objects are in "to" space and "from" space can be reclaimed
- Work is generally linear to live set
Generational Collector
- Weak generational hypothesis: most objects die young.
- Garbage collectors take advantage of the generation hypothesis by splitting the heap into a young and old generation, collecting the young generation more frequently.
- Live set in the young generation is a small percentage of space.
- Promote objects that live long enough to older generations.
- Only collect old generation as they fill up.
- Young generation usually uses a copy collector while old generation uses something like mark/sweep/compact.
- This results in a more efficient collector.
Java Garbage Collectors
Now that you know some theory for garbage collectors, here are the collectors you can pick from in Java.
Serial
A single threaded, low footprint collector.
Parallel
A collector that focuses on throughput. It stops the world on everything and uses multiple threads to collect.
Characteristics:
- Generational
- Young-gen: monolithic, stop-the-world copying
- Old-gen: Monolithic stop-the-world Mark/sweep/compact
G1
A collector that focuses on a balance of throughput and latency.
Characteristics:
- Generational
- Young-gen: monolithic, stop-the-world copying
- Splits heap into many regions
- Old-gen: mostly concurrent
- But also stops the world for compaction (compaction always stops the world)
- But it is incremental, so we can do this in increments
- Tries to avoid full GC, but sometimes has to do it
CMS
A collector that also focuses on a balance of throughput and latency, but was replaced by G1.
ZGC
A collector that focuses on low latency while also scaling to large heaps (multiple terabytes). While many GCs can pause for hundreds of milliseconds, ZGC pauses for at most 1ms!
Characteristics:
- Generational
- Concurrent
- Constant pause times
- Splits heap into many regions
- Parallel
- Compacting
- Simple to tune
Epsilon
A collector that does not actually do any garbage collection. It's very fast since it does not do anything, but your application will run out of memory.