I still have nightmares about JNI. Actually, let me back up—back in 2018, I had to wrap a C library for a high-frequency trading platform. I spent three weeks writing boilerplate C++, fighting with javac -h, and debugging segfaults that would silently crash the entire JVM without a stack trace. It was miserable. I swore I’d never touch native interop again unless my job depended on it.
Well, it’s 2026, and I’m breaking that promise. But this time, it’s actually… sane. If you’ve been ignoring Project Panama because you thought “I don’t need native access,” you might want to reconsider. With the Foreign Function & Memory (FFM) API now fully mature in JDK 25, we can finally talk to the OS directly—specifically Linux’s io_uring—without writing a single line of C. And the performance difference isn’t just a rounding error. It’s ridiculous.
Why NIO Isn’t Enough Anymore
Don’t get me wrong, Java NIO (New I/O) is fine. Selector and ByteBuffer have served us well for decades. Netty built an empire on them. But under the hood, the JDK’s implementation on Linux is still largely relying on epoll. And that requires syscalls. Lots of them. You make a syscall to add a file descriptor, another to wait for events, another to read data. Every context switch costs CPU cycles.
Then there’s io_uring. It sets up a shared memory ring buffer between the kernel and your application. You push a request onto the submission queue, the kernel picks it up, does the work, and drops the result in the completion queue. No syscall overhead for every single operation. Until recently, accessing this from Java meant using JNI or JNA, which added enough overhead to negate the benefits. Now, we can map that ring buffer directly into Java’s memory space using the FFM API.
The Panama Way: Structs in Java

The coolest part of the FFM API is how we define C structs. We don’t write C. We write a MemoryLayout. Here is what an io_uring submission queue entry (SQE) looks like in modern Java. I wrote this snippet last Tuesday while testing on a Fedora 41 VM running kernel 6.12:
import java.lang.foreign.*;
import static java.lang.foreign.ValueLayout.*;
public class IoUringLayouts {
// This maps strictly to the C struct io_uring_sqe
public static final GroupLayout SQE_LAYOUT = MemoryLayout.structLayout(
JAVA_BYTE.withName("opcode"),
JAVA_BYTE.withName("flags"),
JAVA_SHORT.withName("ioprio"),
JAVA_INT.withName("fd"),
JAVA_LONG.withName("off"),
JAVA_LONG.withName("addr"),
JAVA_INT.withName("len"),
// union for flags/rw_flags
MemoryLayout.unionLayout(
JAVA_INT.withName("rw_flags"),
JAVA_INT.withName("fsync_flags")
).withName("op_flags"),
JAVA_LONG.withName("user_data"),
// padding to match 64 bytes
MemoryLayout.sequenceLayout(2, JAVA_LONG).withName("pad")
).withName("io_uring_sqe");
// Method handle to access the 'opcode' field quickly
private static final VarHandle OPCODE_HANDLE =
SQE_LAYOUT.varHandle(PathElement.groupElement("opcode"));
public static void setOpcode(MemorySegment sqe, byte opcode) {
OPCODE_HANDLE.set(sqe, opcode);
}
}
See that? No native library compilation. That layout is the struct. The JVM knows exactly how to read and write those bytes to off-heap memory that the Linux kernel reads directly.
Memory Segments and Arenas
The old Unsafe API was… well, unsafe. You allocated memory and prayed you remembered to free it. The FFM API introduces MemorySegment and Arena, which is basically a scope for memory. When I was prototyping a simple file reader, I used a ConfinedArena. It’s fast because it assumes only one thread accesses it, so there’s no volatile overhead.
public void submitReadRequest(int fd, long bufferAddress, int length, long offset) {
// try-with-resources ensures memory is freed instantly when the scope ends
try (Arena arena = Arena.ofConfined()) {
// Allocate a block of memory for the SQE off-heap
MemorySegment sqe = arena.allocate(IoUringLayouts.SQE_LAYOUT);
// Fill the struct
IoUringLayouts.setOpcode(sqe, (byte) 22); // IORING_OP_READ
// We can also use standard setters for other fields
sqe.set(JAVA_INT, 8, fd); // offset 8 is 'fd'
sqe.set(JAVA_LONG, 16, offset); // offset 16 is 'off'
sqe.set(JAVA_LONG, 24, bufferAddress);
sqe.set(JAVA_INT, 32, length);
// In a real app, you'd now call the submission syscall
// using a Linker to notify the kernel
submitToKernel(sqe);
}
}
This code runs dangerously close to C speeds. The VarHandle optimizations in JDK 25 are insane. The JIT compiler can inline these memory access operations essentially into direct CPU instructions.
The “Oh Crap” Moment: Thread Safety
Here’s where I hit a wall. In my first benchmark, I tried to share a MemorySegment across threads in a standard ForkJoinPool. Boom. WrongThreadException.




If you allocate memory with Arena.ofConfined(), it belongs to that thread. Period. If you want to pass buffers between your I/O loop and your worker threads (which you definitely do in an async server), you have to use Arena.ofShared(). But—and this is the kicker—shared arenas rely on the Garbage Collector to clean up if you don’t close them explicitly, or you have to coordinate the closing. I ended up implementing a reference-counting mechanism just to manage the lifecycle of these buffers without leaking memory. It felt very 1998.
Real Numbers: Is It Worth It?
I ran a test on my local machine (Ubuntu 24.04 LTS, 32GB RAM, Ryzen 7 5800X). I set up a simple echo server. One version used standard Java NIO ServerSocketChannel. The other used my hacked-together Panama/io_uring wrapper.
I blasted both with wrk for 60 seconds.




- Java NIO: ~145,000 requests/sec. CPU usage was around 65%.
- Panama + io_uring: ~380,000 requests/sec. CPU usage dropped to 40%.
That is not a typo. More than double the throughput with less CPU. The lack of context switching is the real hero here. When you aren’t paying the tax of crossing the user/kernel boundary thousands of times a second, the JVM can actually breathe.
A Warning for the Brave
Before you go rewriting your entire backend, take a breath. This stuff is low-level. If you mess up the memory layout offsets, you won’t get a nice Java exception; you might corrupt your process
Common questions
How does Java’s FFM API in JDK 25 replace JNI for calling native code?
The Foreign Function & Memory API, fully mature in JDK 25, lets Java talk to the OS directly without writing C code or compiling native libraries. You define C structs using MemoryLayout, access fields via VarHandle, and map kernel memory into Java’s space. It eliminates JNI boilerplate, javac -h headaches, and silent JVM-crashing segfaults that plagued older native interop.
Why is io_uring faster than epoll for Java network I/O?
io_uring uses a shared memory ring buffer between the kernel and application, avoiding per-operation syscalls. You push requests onto a submission queue, the kernel processes them, and drops results into a completion queue. epoll, which underlies Java NIO on Linux, requires syscalls to add file descriptors, wait for events, and read data, incurring context-switch costs on every operation.
What performance gain does Panama plus io_uring give over Java NIO?
In an echo server benchmark on Ubuntu 24.04 with a Ryzen 7 5800X, standard Java NIO handled about 145,000 requests per second at 65% CPU. A Panama plus io_uring implementation reached roughly 380,000 requests per second while CPU usage dropped to 40%. The author attributes the gain to eliminating context switches across the user/kernel boundary.
How do you share a MemorySegment across threads without WrongThreadException?
Memory allocated with Arena.ofConfined() belongs to a single thread, so passing it into a ForkJoinPool throws WrongThreadException. To share buffers between an I/O loop and worker threads, use Arena.ofShared() instead. However, shared arenas depend on the Garbage Collector unless closed explicitly, so the author implemented reference counting to coordinate buffer lifecycles and avoid leaks.
