Java Collections: You’re Probably Using Them Wrong

I still remember the first time I crashed a production server with a List. It wasn’t even a complicated piece of code. I was just storing IDs—millions of them—in memory to do a quick reconciliation job. It worked fine on my local machine with a small dataset. But the moment it hit the real world data? OutOfMemoryError. Boom.

The problem wasn’t the logic. It was my naive assumption that an integer in a collection costs the same as a primitive int. It doesn’t. Not even close.

Java Collections are the bread and butter of our daily work. We type List<String> list = new ArrayList<>(); without thinking. It’s muscle memory. But after digging through heap dumps and fighting garbage collection pauses for the last decade, I’ve realized that treating collections as “magic bags” to hold data is the fastest way to write slow, bloated software.

The Wrapper Class Tax

Here’s the thing that trips up almost everyone when they start looking at performance. Java Generics are erased at runtime. This means you can’t have a ArrayList<int>. You have to use ArrayList<Integer>.

This seems harmless until you do the math.

  • A primitive int takes 4 bytes.
  • An Integer object? You’re looking at a 12-16 byte object header, the 4 byte payload, plus a 4-8 byte reference in the list itself. Plus padding.

You are effectively quadrupling your memory footprint (or worse) just to wrap a number in an object so it can sit inside a generic collection. This is where Autoboxing—the automatic conversion between primitives and their wrappers—becomes a silent killer.

Check this out:

// The "Easy" Way - High Memory Overhead
List<Integer> heavyList = new ArrayList<>();
for (int i = 0; i < 1_000_000; i++) {
    heavyList.add(i); // Autoboxing happens here: int -> Integer
}

// The Efficient Way (using arrays or primitive streams)
int[] rawArray = new int[1_000_000];
for (int i = 0; i < 1_000_000; i++) {
    rawArray[i] = i; // No object creation, just raw bytes
}

If you’re dealing with massive datasets of numbers, standard Java Collections might not be your friend. I usually reach for libraries like fastutil or Eclipse Collections in these scenarios, or just stick to primitive arrays if the logic permits. Don’t let the convenience of List.add() blind you to the heap usage.

Frustrated programmer at computer - Free Frustrated programmer working Image - Technology, Frustration ...
Frustrated programmer at computer – Free Frustrated programmer working Image – Technology, Frustration …

ArrayList vs. LinkedList: The Old Debate

I have a strong opinion here: Stop using LinkedList.

University textbooks love to teach that LinkedList is faster for insertions because you just change a pointer. In theory? Yes. In practice, on modern hardware? Almost never.

Modern CPUs are obsessed with cache locality. They want to read memory in predictable, contiguous blocks. An ArrayList is backed by an array—a solid block of memory. When the CPU reads index 0, it likely pulls index 1, 2, and 3 into the cache automatically.

A LinkedList, on the other hand, is a scattering of nodes all over the heap. Traversing it is a cache-miss nightmare. Unless you are doing heavy modifications at the head of the list constantly, ArrayList usually wins simply because the CPU isn’t waiting on RAM.

The HashMap Contract

The most dangerous bug I ever hunted down involved a custom object used as a key in a HashMap. The code looked fine, but values kept “disappearing” from the map. We’d put them in, but get() would return null.

The culprit? A mutable field inside the key object.

When you put an object into a HashMap, Java calculates its bucket location using hashCode(). If you modify the object later in a way that changes its hash code, the map can no longer find it. It’s looking in the wrong bucket.

Since Java 16 (and firmly established now in 2025), I force everyone to use Records for map keys. They are immutable by design and implement hashCode and equals correctly out of the box.

// The Old, Risky Way
class UserKey {
    private int id; // If this changes, the Map breaks
    
    // ... extensive boilerplate for equals/hashCode
}

// The Modern Way - Use Records!
public record UserKey(int id, String region) {
    // Immutable, correct hashCode/equals automatically generated.
    // Safe to use in Maps.
}

Map<UserKey, UserData> cache = new HashMap<>();

If you aren’t using Records for your data carriers yet, you’re just writing boilerplate for the sake of it.

Streams: Readable vs. Debuggable

I love Streams. I really do. They transformed how we process collections. But there is a point of diminishing returns where a clever “one-liner” becomes a maintenance liability.

I’ve seen streams nested three levels deep, with flatmaps inside filters. Good luck debugging that stack trace when it throws a NPE.

However, for transformations, they are unbeatable. The trick is knowing when to materialize them back into a Collection.

List<String> rawNames = List.of("  alice  ", "bob", "  charlie");

// Clean, readable processing pipeline
List<String> cleanNames = rawNames.stream()
    .map(String::trim)
    .filter(name -> !name.isEmpty())
    .map(String::toUpperCase)
    .toList(); // Java 16+ direct collector, so much cleaner than .collect(Collectors.toList())

Notice the List.of()? That creates an immutable list. If you try to add() to rawNames, it explodes. This is a feature, not a bug. Immutability makes your code predictable. If a method accepts a list, and you pass an immutable one, you know for a fact that method didn’t mess with your data.

Choosing the Right Tool

It usually comes down to this mental flowchart for me:

  1. Do you need key-value pairs? Use HashMap. (Or TreeMap if you need them sorted).
  2. Do you need unique items? Use HashSet.
  3. Do you need a list? Use ArrayList.
  4. Is it thread-safe? ConcurrentHashMap or CopyOnWriteArrayList (but be careful with writes on the latter).

And if you are storing millions of primitives? Step away from the standard library and look at primitive-specialized frameworks. Your heap (and your DevOps team) will thank you.

The Collections framework is powerful, but it’s not free. Every object you create has a cost. The best Java developers I know aren’t the ones who memorize every method in the API; they’re the ones who know what’s happening in memory when they call new ArrayList<>().