The x64 architecture is a backwards-compatible extension of x86. It provides a legacy 32-bit mode, which is identical to x86, and a new 64-bit mode.

The term "x64" includes both AMD 64 and Intel64. The instruction sets are close to identical.

Registers

x64 extends x86's 8 general-purpose registers to be 64-bit, and adds 8 new 64-bit registers. The 64-bit registers have names beginning with "r", so for example the 64-bit extension of eax is called rax. The new registers are named r8 through r15.

The lower 32 bits, 16 bits, and 8 bits of each register are directly addressable in operands. This includes registers, like esi, whose lower 8 bits were not previously addressable. The following table specifies the assembly-language names for the lower portions of 64-bit registers.

64-bit register Lower 32 bits Lower 16 bits Lower 8 bits

rax

eax

ax

al

rbx

ebx

bx

bl

rcx

ecx

cx

cl

rdx

edx

dx

dl

rsi

esi

si

sil

rdi

edi

di

dil

rbp

ebp

bp

bpl

rsp

esp

sp

spl

r8

r8d

r8w

r8b

r9

r9d

r9w

r9b

r10

r10d

r10w

r10b

r11

r11d

r11w

r11b

r12

r12d

r12w

r12b

r13

r13d

r13w

r13b

r14

r14d

r14w

r14b

r15

r15d

r15w

r15b

Operations that output to a 32-bit subregister are automatically zero-extended to the entire 64-bit register. Operations that output to 8-bit or 16-bit subregisters are not zero-extended (this is compatible x86 behavior).

The high 8 bits of ax, bx, cx, and dx are still addressable as ah, bh, ch, dh, but cannot be used with all types of operands.

The instruction pointer, eip, and flags register have been extended to 64 bits (rip and rflags, respectively) as well.

The x64 processor also provides several sets of floating-point registers:

  • Eight 80-bit x87 registers.

  • Eight 64-bit MMX registers. (These overlap with the x87 registers.)

  • The original set of eight 128-bit SSE registers is increased to sixteen.

Calling Conventions

Unlike the x86, the C/C++ compiler only supports one calling convention on x64. This calling convention takes advantage of the increased number of registers available on x64:

  • The first four integer or pointer parameters are passed in the rcx, rdx, r8, and r9 registers.

  • The first four floating-point parameters are passed in the first four SSE registers, xmm0-xmm3.

  • The caller reserves space on the stack for arguments passed in registers. The called function can use this space to spill the contents of registers to the stack.

  • Any additional arguments are passed on the stack.

  • An integer or pointer return value is returned in the rax register, while a floating-point return value is returned in xmm0.

  • rax, rcx, rdx, r8-r11 are volatile.

  • rbx, rbp, rdi, rsi, r12-r15 are nonvolatile.

The calling convention for C++ is very similar: the this pointer is passed as an implicit first parameter. The next three parameters are passed in registers, while the rest are passed on the stack.

 

MMX

MMX defines eight registers, called MM0 through MM7, and operations that operate on them. Each register is 64 bits wide and can be used to hold either 64-bit integers, or multiple smaller integers in a "packed" format: a single instruction can then be applied to two 32-bit integers, four 16-bit integers, or eight 8-bit integers at once.[9]

XMM / SSE

Additional XMM (SSE) registersSimilarly, the number of 128-bit XMM registers (used for Streaming SIMD instructions) is also increased from 8 to 16.The traditional x87 FPU register stack is not included in the register file size extension in 64-bit mode, compared with the XMM registers used by SSE2, which did get extended. The x87 register stack is not a simple register file although it does allow direct access to individual registers by low cost exchange operations.Larger virtual address spaceThe AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations.[11](p120) This allows up to 256 TiB (248 bytes) of virtual address space. The architecture definition allows this limit to be raised in future implementations to the full 64 bits,[11](p2)(p3)(p13)(p117)(p120) extending the virtual address space to 16 EiB (264 bytes).[16] This is compared to just 4 GiB (232 bytes) for the x86.[17]This means that very large files can be operated on by mapping the entire file into the process' address space (which is often much faster than working with file read/write calls), rather than having to map regions of the file into and out of the address space.