AArch64 (also called ARM64) is now the dominant architecture for mobile, embedded, and increasingly for servers and laptops. If you’ve written x86-64 assembly before, AArch64 will feel familiar in some ways and surprisingly different in others. If you’ve never written assembly, this is a decent first architecture to learn on — the instruction set is cleaner and more regular than x86-64.
I’m assuming you know C, you understand what a stack pointer is, and you’ve at least read about registers conceptually.
The register file
AArch64 has 31 general-purpose registers named x0–x30. The w prefix (w0–w30) accesses the lower 32 bits of the same register — writes to w0 zero-extend into x0. There’s also xzr/wzr (the zero register — reads always return 0, writes are discarded) and sp (stack pointer) and pc (program counter, not directly addressable in most instructions).
Key registers by convention (the AAPCS64 calling convention):
| Register | Role |
|---|---|
x0–x7 |
Arguments and return values |
x8 |
Indirect result location (large struct return) |
x9–x15 |
Temporary (caller-saved) |
x16–x17 |
Intra-procedure-call scratch (used by linker stubs) |
x18 |
Platform register (don’t touch on Darwin/iOS) |
x19–x28 |
Callee-saved |
x29 |
Frame pointer |
x30 |
Link register (return address) |
The calling convention: integer/pointer arguments go in x0–x7, return value in x0 (or x0+x1 for a 128-bit value). Callee-saved registers must be preserved across function calls — if you use x19, you must save and restore it.
A minimal function
Let’s implement int add(int a, int b) in assembly:
// int add(int a, int b)
// a is in w0, b is in w1
.global add
.text
add:
add w0, w0, w1 // w0 = w0 + w1 (result goes in w0)
ret // return (branches to x30)
ret is not like x86’s ret — it’s syntactic sugar for br x30 (branch to the address in the link register). The calling convention puts the return address in x30 before jumping to a function (via bl).
A function with a stack frame
For a function that calls other functions, you need to save and restore x29 (frame pointer) and x30 (link register):
// long factorial(long n)
.global factorial
.text
factorial:
stp x29, x30, [sp, #-16]! // push {fp, lr}; sp -= 16
mov x29, sp // frame pointer = current sp
cmp x0, #1
ble .Lbase // if n <= 1, return 1
sub x1, x0, #1 // x1 = n - 1
mov x0, x1
bl factorial // x0 = factorial(n - 1)
// now x0 = factorial(n-1), but we've lost n!
// ← this is wrong, we need to save n before the call
// (see corrected version below)
.Lbase:
mov x0, #1
ldp x29, x30, [sp], #16 // pop {fp, lr}; sp += 16
ret
That version is buggy — we clobber x0 (which held n) with the recursive call. The fix is to save n in a callee-saved register (x19) before the call:
.global factorial
.text
factorial:
stp x29, x30, [sp, #-32]! // save fp, lr; make room
stp x19, xzr, [sp, #16] // save x19 (callee-saved)
mov x29, sp
mov x19, x0 // x19 = n (preserved across call)
cmp x0, #1
ble .Lbase
sub x0, x19, #1 // arg0 = n - 1
bl factorial // x0 = factorial(n - 1)
mul x0, x0, x19 // result = factorial(n-1) * n
b .Lreturn
.Lbase:
mov x0, #1
.Lreturn:
ldp x19, xzr, [sp, #16] // restore x19
ldp x29, x30, [sp], #32 // restore fp, lr; sp += 32
ret
stp (store pair) and ldp (load pair) are the idiomatic way to push/pop on AArch64. The ! suffix on stp x29, x30, [sp, #-16]! means “pre-indexed” — update sp first, then store.
Load and store patterns
AArch64 is a load/store architecture. You can’t operate on memory directly — you load into registers, compute, then store:
// int sum_array(int *arr, int len)
// arr in x0, len in w1
.global sum_array
.text
sum_array:
mov w2, wzr // sum = 0
cbz w1, .Ldone // if len == 0, return
.Lloop:
ldr w3, [x0], #4 // w3 = *arr; arr += 4 (post-index)
add w2, w2, w3 // sum += *arr
subs w1, w1, #1 // len--; set flags
bne .Lloop // if len != 0, continue
.Ldone:
mov w0, w2 // return sum
ret
ldr w3, [x0], #4 is a post-indexed load: it reads 4 bytes from [x0] into w3, then adds 4 to x0. Very common for iterating over arrays.
subs is sub + set condition flags. bne (branch if not equal to zero) checks the Z flag.
Where to go from here
- ARM Architecture Reference Manual: The authoritative spec. Dense but comprehensive.
- Compiler Explorer (godbolt.org): Write C, see the generated AArch64 assembly. This is the fastest way to understand what the compiler does and doesn’t do.
- GDB or LLDB with
-arch arm64: Step through your assembly. There’s no substitute for watching registers change.
AArch64 is genuinely a pleasure to read compared to x86-64. The regularity of the instruction set — three-operand instructions, consistent load/store patterns, a clean calling convention — makes it worth learning even if you’ll mostly just be reading compiler output.