Content
pmemrange: the physical memory allocator
vmmap: the virtual memory allocator
64-bit and jit
Before pmemrange
Physical memory was a list of pages.
This caused high fragmentation.
Fragmentation is bad for devices.
Before pmemrange - it gets worse
ISA and PCI devices could become starved.
ISA and PCI device can no longer make progress.
- Devices need memory to progress
- Programs block on data access
- System becomes unresponsive
Pmemrange
A new physical memory allocator:
Physical
Memory
range
- must keep device memory available
- must reduce fragmentation
- may not regress: don't be slower!
Memory allocation
- Large fragments grow
- Small fragments shrink and vanish
- Faster than best fit O(1) common case
Pmemrange turned out to be faster than the original code!
VM map
VMmap: mapping Virtual Memory
- keeps track of what is allocated where
- manages virtual memory of kernel and userspace
- performs allocation
Original VMmap
- Introduced around 1999
- First fit allocator
- Code hasn't really change since then
Address Space Layout Randomization tacked on without adapting the underlying algorithm.
Original address selection
- Randomly pick an address.
- Check if that address is free (do you feel lucky?)
- Otherwise, linear search forward.
Problems:
- Complexity grows quadratic with memory use
- double the memory, double number of entries to walk
- increase the memory, reduce the chance of getting lucky
- Forward-only search may yield false negative
- select high random address, skipping all that free space below
First replacement
Best-fit allocator.
- Select close to the best allocation (random number).
- Select random address within that allocation.
The good:
- No false negatives
- O(log n) instead of O(n**2)
First replacement failed
The bad:
- Some architectures stopped working altogether
- Introduced new bugs
The ugly:
- Webbrowsing was impossible
- Java and mono ceased to function
- Touch the code and an architecture dies
- Randomization was too predictable: not random
How did this break random?
Investigating what went wrong.
- Placement position of each allocation was random.
- Memory layout was not very random at all.
- An attacker does not need to target specific memory, he just attacks everything and sees what sticks.
Implementation bugs made this worse.
Randomization did add gaps, but was too predictable in which memory was used.
Browsers, java & mono breakage
They all use JIT (Just In Time) compilation.
- To make byte-code or javascript fast
- They all use PIC (Position Independant Code)
PIC - Position Independant Code
PIC - Position Independant Code
PIC makes code agnostic to position.
Offset is a 32-bit value.
- Sufficient for libraries
- JITs allocate both parts.
JIT - pointer clipping
Aw, Snap!
JIT - workarounds
- Linux: mmap MAP_32BIT
- it's in API, can never be removed
- Chrome: allocate 2 GB up front
- use own memory allocator in there
- Hello predictable attack surface
Smart and dumb software
- sizeof(void*) != sizeof(int)
- sizeof(array) > MAXINT
- sizeof(ptrdiff_t) != sizeof(int)
Good software is boring:
- use size_t for sizes
- always check for clipping when translating to another type
VMmap additional requirements
- must not add extra options to mmap
- isolate userland and kernel space fixes
- don't want to break architectures
- kernel memory is very special
- must not break browsers
- mmap hint requirements allow ports to rig browser workarounds
- must allow easy switching of algorithms
- small commit/backout diffs
Split mapping and allocation
- Address selectors decide where to allocate
- read access on map
- multiple implementations
- Map keeps track of what is allocated where
- informs selectors of modifications
- VMmap combines the two together
vm_map - allocation in VMmap
vm_unmap - unmapping in VMmap
Compatibility for JIT
mmap(addr, size, ...)
Posix says: "A non-zero value of addr is taken to be a suggestion of a process address near which the mapping should be placed"
- WTF? What does near mean?
- in the same machine is pretty near...
- we define near as max 1 GB away (sufficiently near for PIC)
- First 4 GB is special: we only allocate there if your hint ≤ 4 GB
Allows us to fix browsers.
Functionality implemented in hint selector.
Better random algorithm
- Random placement of memory
- make it hard to guess a good position
- Random gaps between memory
- don't make walking the memory easy
Introducing pivot allocator
Pivot algorithm
16 active pivots
- Pivots leave gaps between allocations
- Pivots expire every 1024 allocations
- Pivots walk either forward or backwards
Common case: O(1)
Pivot creation: O(log n)
Low fragmentation, hard to predict
Wrapping it up
- Devices and software are pretty similar
- both have assumptions on pointers
- devices often can't help it: PCI spec is 32-bit
- Software hasn't really improved in 40 years
- first 64-bit machine: Cray-1 (1976)
- the world isn't only i386
- software still trips on 64-bits
- designs are copied without understanding them (why else would JITs forget to cope with 64-bit distance?)
- ASLR (Address Space Layout Randomization) is difficult
- our initial implementation got it right, we were lucky