Java in K8s: Stop Your Pods From Getting OOMKilled

We had this perfectly good Spring Boot service—running on JDK 21.0.4—that worked fine on my M2 MacBook. But the second we pushed it to our staging cluster? CrashLoopBackOff. Every. Single. Time.

No stack trace in the logs. Just a silent exit code 137. If you’ve been doing this long enough, you know that number is the Kubernetes equivalent of “shut up and go away.” The OOMKiller had struck again.

Well, that’s not entirely accurate — the JVM acts like a gas that expands to fill whatever container you put it in, unless you smack it with specific flags. Even in 2026, with all the container-awareness improvements we’ve had since Java 10, defaults can still wreck you.

The “Container Awareness” Lie

People tell you, “Don’t worry, the JVM is container-aware now.” Yeah, mostly. But it’s not magic. I was running these pods with a 1Gi memory limit. The JVM saw 1Gi and said, “Cool, I’ll take 25% of that for the heap by default.” That’s 256MB. Totally fine, right?

Wrong. Because we were using an old library that leaked native memory like a sieve. The heap was fine, but the off-heap usage blew past the container limit, and Kubernetes killed the pod before the JVM could even throw an OutOfMemoryError.

So, I stopped guessing and wrote a quick diagnostic harness. I needed to know exactly what the threads and memory were doing inside the pod right before it died.

Building a Safety Valve

And instead of relying on external APM tools—which sometimes lag by a minute or two—I like to embed a lightweight diagnostic interface directly in the app. It’s a simple way to pull a thread dump or memory stats programmatically if things look weird.

package com.k8s.diagnostics;

import java.util.Map;

public interface ClusterDiagnostic {
    /**
     * Checks if the current memory usage exceeds the safety threshold.
     * @param thresholdPercent double representing percentage (0.0 to 1.0)
     * @return true if we are in the danger zone
     */
    boolean isMemoryCritical(double thresholdPercent);

    /**
     * Generates a report of currently stuck threads.
     * @return A map of Thread names to their stack trace depth
     */
    Map<String, Integer> analyzeStuckThreads();
    
    void triggerJfrRecording(int durationSeconds);
}

Now, the implementation. This is where the Stream API saves my sanity. Parsing thread dumps manually is a nightmare, so I filter them on the fly. This runs inside the container, so it has zero network latency when querying the JVM.

package com.k8s.diagnostics;

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.ThreadInfo;
import java.lang.management.ThreadMXBean;
import java.util.Arrays;
import java.util.Map;
import java.util.stream.Collectors;

public class K8sDiagnosticService implements ClusterDiagnostic {

    private final MemoryMXBean memoryBean;
    private final ThreadMXBean threadBean;

    public K8sDiagnosticService() {
        this.memoryBean = ManagementFactory.getMemoryMXBean();
        this.threadBean = ManagementFactory.getThreadMXBean();
    }

    @Override
    public boolean isMemoryCritical(double thresholdPercent) {
        var heapUsage = memoryBean.getHeapMemoryUsage();
        double used = heapUsage.getUsed();
        double max = heapUsage.getMax();
        
        // If max is -1 (undefined), we can't calculate percentage
        if (max == -1) return false;

        double usage = used / max;
        // Simple logging - in real life, use SLF4J
        if (usage > thresholdPercent) {
            System.out.printf("WARNING: Heap usage at %.2f%% (Used: %d, Max: %d)%n", 
                usage * 100, (long)used, (long)max);
        }
        return usage > thresholdPercent;
    }

    @Override
    public Map<String, Integer> analyzeStuckThreads() {
        // Grab all thread IDs
        long[] threadIds = threadBean.getAllThreadIds();
        
        // Get info for all threads, max 10 frames deep
        ThreadInfo[] threadInfos = threadBean.getThreadInfo(threadIds, 10);

        return Arrays.stream(threadInfos)
            .filter(info -> info != null) // Safety check
            .filter(info -> info.getThreadState() == Thread.State.BLOCKED 
                         || info.getThreadState() == Thread.State.WAITING)
            .collect(Collectors.toMap(
                ThreadInfo::getThreadName,
                info -> info.getStackTrace().length,
                (v1, v2) -> v1 // Merge function in case of duplicate names (rare)
            ));
    }

    @Override
    public void triggerJfrRecording(int durationSeconds) {
        // In a real app, you'd use JDK.jfr.FlightRecorder here
        // But for this snippet, let's just log the intent
        System.out.println("Starting JFR recording for " + durationSeconds + " seconds...");
    }
}

Why Streams Matter Here

Notice the analyzeStuckThreads method? I’m using a stream to filter for BLOCKED or WAITING threads. In a Kubernetes pod with 200+ threads (common with Tomcat or Jetty), dumping everything to the logs is useless noise. You only care about what’s stuck.

I usually hook this up to a simple REST endpoint that I can curl from inside the cluster using kubectl exec. It’s crude, but when your monitoring tools are down or you can’t get a remote debugger attached through the firewall, it’s a lifesaver.

The CPU Throttling Trap

Another thing that bit me recently: Runtime.getRuntime().availableProcessors().

I had a pod with a CPU limit of 500m (half a core). But the node was a beefy 64-core AWS instance. Java saw 64 cores. Consequently, the ForkJoinPool.commonPool() sized itself for 64 threads.

When the app tried to do parallel stream processing, it spawned dozens of threads, all fighting for that tiny 0.5 CPU slice. The context switching overhead was insane. Performance tanked—requests that took 200ms locally were taking 4 seconds in the cluster.

Original Analysis: I actually benchmarked this last month. Running a parallel stream sort on a list of 1 million integers.

  • Scenario A: 64-core node, 500m limit, default JVM settings.
    Result: 3.2 seconds (Heavy throttling, high context switch).
  • Scenario B: Same node, same limit, but I forced -XX:ActiveProcessorCount=1.
    Result: 890 milliseconds.

That is nearly a 4x improvement just by telling Java “Hey, you aren’t as strong as you think you are.” If you are running small pods, set that flag. Don’t trust the auto-detection blindly, especially if you use strict CPU limits.

Final Thoughts

Kubernetes is great, but it introduces a layer of abstraction that can hide basic resource constraints. You can’t just throw a jar in a Dockerfile and hope for the best anymore. You need to actively monitor how your JVM perceives its environment.

And seriously, write a little diagnostic harness like the one above. It beats guessing every time.