erickto.dev

HashMap: from the high level down to bytes

What happens between map.get() and the value landing in your variable. Hashing, bits, and memory.

Every time you write map.get("age"), the computer has to do a bunch of work before handing you back 28. Hashing, bit manipulation, memory management. This post opens all of that up.

What you already know

From the outside, it's just this:

const map = new Map<string, number>();
map.set("age", 28);
map.get("age"); // 28

Key goes in, value comes out. You use this all the time, even if you don't think about it.

Counting how many times each word shows up in a text:

const count = new Map<string, number>();
for (const word of text.split(" ")) {
  count.set(word, (count.get(word) ?? 0) + 1);
}

Caching so you don't hit the same API over and over:

const cache = new Map<string, User>();

async function getUser(id: string) {
  if (cache.has(id)) return cache.get(id)!;
  const user = await fetch(`/api/users/${id}`).then((r) => r.json());
  cache.set(id, user);
  return user;
}

Rate limiting by IP:

const hits = new Map<string, number>();

function rateLimit(ip: string): boolean {
  const current = hits.get(ip) ?? 0;
  if (current >= 100) return false;
  hits.set(ip, current + 1);
  return true;
}

All three do the same thing underneath: key lookup in constant time. But for that to be fast, three problems need solving:

  1. Turn the key into a number
  2. Use that number to find a position in an array
  3. Handle it when two different keys land on the same position

Turning text into a number

Picture 16 drawers numbered 0 through 15. You need a fixed rule to decide which drawer each key goes into. Same key, same drawer, every time, otherwise you won't know where to look later.

That rule is a hash function. It takes any text and spits out a number.

One of the most well-known is djb2. In C:

unsigned long hash(const char *str) {
    unsigned long h = 5381;
    int c;
    while ((c = *str++))
        h = ((h << 5) + h) + c; // h * 33 + c
    return h;
}

If C isn't something you deal with, here's the same idea in JavaScript:

function hash(str) {
  let h = 5381;
  for (let i = 0; i < str.length; i++) {
    h = ((h << 5) + h + str.charCodeAt(i)) >>> 0;
  }
  return h;
}

hash("age");  // 193486130
hash("name"); // 2090536006

It starts with 5381 (a prime that someone tested a lot and saw it distributed results well), walks through each character, multiplies the accumulator by 33, and adds the character's numeric value. At the end you get an integer. Same string, always the same integer.

The << 5 in C is a bit shift, equivalent to multiplying by 32. Shift plus the original addition gives you multiplication by 33 without using an actual multiply instruction. Sounds trivial, but when you're doing this millions of times, it adds up.

This hash doesn't need to be cryptographic. Speed matters more than security here. The exception: if the HashMap takes external user input, use SipHash. Otherwise someone can craft keys that all collide on purpose and freeze your system (HashDoS).

From number to position

The hash of "age" can be a huge number. But the table only has 16 slots. We need to squeeze it into the range 0-15.

The obvious way: modulo.

hash("age") % 16 // = 5, goes to slot 5

Works, but division is expensive for the CPU. If the table size is a power of 2, you can swap it for a bitwise AND:

hash("age") & (16 - 1) // = 5, same result

Why? 16 in binary is 10000. 16 - 1 = 15 is 01111. The & (AND) zeros out every bit above the fourth, keeping only the ones that fit in the 0-15 range. Same result as modulo, one instruction.

To get a sense of scale: on the CPU, integer division takes 20-30 cycles. An AND takes 1. In a HashMap accessed millions of times, that gap adds up fast.

A cycle is the smallest unit of work for a CPU. A 3 GHz processor does 3 billion cycles per second. Each instruction takes one or more cycles to complete. A simple addition takes 1 cycle. Integer division takes 20-30 because the hardware needs multiple internal steps to compute the result. Think of it as the difference between knowing the answer off the top of your head versus having to work it out on paper.

If you want to see the difference at the lowest level, in assembly (the instructions the CPU actually runs):

; division — 20-30 cycles
mov  rax, rdi        ; put the hash in a register
xor  rdx, rdx        ; zero out rdx (required by the div instruction)
div  rsi             ; divide by table size
; remainder ends up in rdx

; AND — 1 cycle
mov  rax, rdi        ; put the hash in a register
dec  rsi             ; 16 becomes 15
and  rax, rsi        ; keep only the bits that fit

You don't need to understand every line. The point is that % becomes div (slow) and & becomes and (fast). That's why HashMaps almost always use power-of-2 sizes.

The table and collisions

The hash alone doesn't solve anything. You need a structure to store the key-value pairs.

The classic approach is separate chaining: the array has N slots, each slot holds a list. If two keys land on the same slot, both go into that slot's list.

In C the structure looks like this:

typedef struct Entry {
    char *key;           // the key (text)
    int value;           // the value
    struct Entry *next;  // next item in the list, or NULL
} Entry;

typedef struct {
    Entry **buckets;     // array of lists
    size_t capacity;     // how many slots
    size_t size;         // how many items were inserted
} HashMap;

If you don't read C: each Entry is a box holding a key, a value, and an arrow pointing to the next box (or to nothing). The HashMap is an array of slots, each slot pointing to the start of a linked list of boxes.

Initial capacity 16, power of 2. Not a coincidence.

Insertion

void hashmap_put(HashMap *map, const char *key, int value) {
    size_t idx = hash(key) & (map->capacity - 1);
    Entry *entry = map->buckets[idx];

    // walk the list to see if the key already exists
    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            entry->value = value; // found it, update
            return;
        }
        entry = entry->next;
    }

    // didn't find it, create a new entry at the head of the list
    Entry *new_entry = malloc(sizeof(Entry));
    new_entry->key = strdup(key);
    new_entry->value = value;
    new_entry->next = map->buckets[idx];
    map->buckets[idx] = new_entry;
    map->size++;

    // if getting too full, double the size
    if ((double)map->size / map->capacity > 0.75)
        hashmap_resize(map);
}

In words: compute the hash, find the slot, walk the list to check if the key already exists. If it does, update the value. If not, create a new entry at the head of the list. And if the table is past 75% capacity (the load factor), double the size.

Lookup

Same path in reverse:

int hashmap_get(HashMap *map, const char *key, int *out) {
    size_t idx = hash(key) & (map->capacity - 1);
    Entry *entry = map->buckets[idx];

    while (entry) {
        if (strcmp(entry->key, key) == 0) {
            *out = entry->value;
            return 1; // found
        }
        entry = entry->next;
    }
    return 0; // not found
}

Normal case: the list has 0 or 1 items. Instant lookup, O(1). Worst case: every key collided, the list has N items, and the lookup becomes O(n). The 0.75 load factor exists to keep the lists short enough that the normal case stays the common case.

Resize

When the number of items exceeds 75% of capacity, the table doubles in size and redistributes everything:

void hashmap_resize(HashMap *map) {
    size_t new_cap = map->capacity * 2;
    Entry **new_buckets = calloc(new_cap, sizeof(Entry *));

    for (size_t i = 0; i < map->capacity; i++) {
        Entry *entry = map->buckets[i];
        while (entry) {
            Entry *next = entry->next;
            size_t idx = hash(entry->key) & (new_cap - 1);
            entry->next = new_buckets[idx];
            new_buckets[idx] = entry;
            entry = next;
        }
    }

    free(map->buckets);
    map->buckets = new_buckets;
    map->capacity = new_cap;
}

It walks every item, recomputes indices for the new table (which now has double the slots), and reinserts. O(n), touches everything. Painful.

If you know roughly how many items you'll have, pre-allocate:

// Rust
let mut map = HashMap::with_capacity(10_000);
// Go
m := make(map[string]int, 10_000)

Resize is where HashMaps actually get slow. If a profiler shows high time in rehash, the fix is usually one line of pre-allocation.

The memory layout

A table with 4 slots and 2 items looks like this in RAM:

HashMap (on the stack)
  .buckets   -> 0x7f00
  .capacity  =  4
  .size      =  2

Pointer array at 0x7f00 (contiguous, 32 bytes):
  [0] NULL
  [1] -> 0xa200
  [2] NULL
  [3] -> 0xb100

Entry at 0xa200 (heap):           Entry at 0xb100 (heap):
  key:   "age"                      key:   "name"
  value: 28                         value: 1
  next:  NULL                       next:  NULL

Notice the addresses: 0x7f00, 0xa200, 0xb100. The pointer array is contiguous, but the entries are in completely different regions.

The slot array is contiguous in memory, but each entry was allocated separately with malloc. They can be anywhere on the heap.

This matters because of how the CPU reads memory. It doesn't read one byte at a time. It pulls 64 bytes at once (a cache line) and keeps them nearby, betting that you'll need the neighboring data soon. If the next piece of data you need is in some other region of memory, that bet fails. That's a cache miss, and it's slow.

With chaining, each entry can be anywhere. Sequential access becomes a sequence of cache misses.

That's the problem.

Open addressing

Modern implementations like Rust's, Go's, and Python's solve this differently: they store the items directly in the array, no pointers jumping elsewhere. Google did the same with Swiss Table in Abseil.

typedef struct {
    char *key;
    int value;
    uint8_t occupied; // 0 = empty, 1 = occupied, 2 = deleted
} Slot;

typedef struct {
    Slot *slots;      // contiguous array of slots
    size_t capacity;
    size_t size;
} OpenHashMap;

On collision, instead of building a list, the algorithm checks the next slot over. And the next. Until it finds an empty one. This is called linear probing (there are variations like quadratic probing and Robin Hood hashing, but the core idea is the same).

The data sits packed together in memory. When the CPU pulls one slot into the cache, its neighbors come along for free. In benchmarks, open addressing tends to be 2-3x faster than chaining for large tables, almost entirely because of cache locality.


When you write map["age"] = 28, this is what runs underneath. Hashing, bit masking, probing. Most days you don't need to think about it. But when a profiler shows 40% of time spent in rehash, or when you need to choose between a HashMap and a BTreeMap, knowing what happens inside helps. And most of the time the answer is boring: one line of pre-allocation.

Comments

Loading...