The article briefly describes AMD64 architecture by AMD Company and its implementation EM64T by Intel Company. The architecture's peculiarities, advantages and disadvantages are described.
Injustice - Developers Among Us (SciFiDevCon 2024)
AMD64 (EM64T) architecture
1. AMD64 (EM64T) architecture
Authors: Evgeniy Ryzhkov, Andrey Karpov
Date: 02.10.2008
Abstract
The article briefly describes AMD64 architecture by AMD Company and its implementation EM64T by
Intel Company. The architecture's peculiarities, advantages and disadvantages are described.
Introduction
Development of computer-solved tasks demands more and more from the hardware these tasks are
being solved on. The requirements to computer systems of personal-computer class have been growing
year by year for 20 years already. It happens because people wish to solve on their personal computers
more and more complex tasks which have been earlier solved only on high-performance mainframes.
What are these requirements to the personal computers for solving complex tasks? Of course, these are
requirements of main-memory size and processor's performance (don't mix up with frequency!). IA32
architecture (Intel Architecture 32) dominating during the last decade offers 4Gb (2^32) of main
memory of which only 2Gb are usually allocated to an application; different register blocks and sets of
various tricks such as branch predication block, which should increase the system's performance
without increasing such an abstract parameter as processor's frequency [1].
Modern tasks for personal computers approach 2Gb while processors' frequency increase cannot help
increase performance.
Newly-developed 64-bit architectures SPARC64 and Intel Itanium can to some extend serve to solve the
problem of modern 32-bit computers' limitations. But they are intended for hi-end systems and are not
available as cheap solutions. It is AMD64 architecture by AMD Company and its implementation EM64T
by Intel Company which are to become really popular. These architectures are twins and programs
compiled for one of them can be launched on the other as well. But it is the solution by AMD that
historically appeared first. EM64T is actually only an implementation of AMD64 by Intel. AMD64
architecture is now implemented in processors of all classes: mobiles, work-stations, servers.
Despite evident advantages of AMD64 platform (which are described in detail in this article) it doesn't
introduce anything revolutionary into computing machinery. Porting from 32 bits to 64 bits didn't lead
to quality improvements while previous porting from 16 bits to 32 bits had increased systems' safety
and performance significantly.
1. AMD64 architecture
AMD64 architecture is fully described in five documentation volumes provided by AMD Company. This
chapter provides a brief description based on the first volume [2]. Pay attention that in official
documentation this architecture is defined as AMD x86-64 what underlines its backward compatibility.
2. 1.1. The architecture's description
AMD x86-64 architecture is a simple but powerful backward compatible extension of the obsolete
64 backward-compatible
industrial architecture x86 [1]. It adds 64
]. 64-bit address space and extends register resources for
supporting more performance for recompiled 64 bit programs providing support of obsolete 16
64-bit 16-bit and
32-bit code of applications and operational systems without modifying or recompiling them.
bit
Necessity of 64-bit x86 architecture is explained by applications which need large address space. These
are high-performance servers, data managers, CAD systems and of course games. Such applications will
performance CAD-systems
gain an advantage due to 64-bit address space and more registers. Few registers available in obsolete
bit
x86 architecture limit computing task performance. More registers provide sufficient performance for
computing-task
most applications.
x86-64 architecture introduces two new peculiarities:
64
1. Extended registers (Picture 1):
• 8 general-purpose registers;
purpose
• all 16 general-purpose registers are 64
purpose 64-bit;
• 8 new 128-bit XMM registers;
bit
• a new command prefix (REX) for access to extended registers.
2. special mode "Long Mode" which is shown in Table 1:
• up to 64-bit virtual addresses;
bit
• 64-bit command pointer (RIP);
bit
• flat address space.
4. Table 1. Processor operating modes.
Table 2 contains comparison of registers' and stack's resources available to an application in different
modes. Left columns show resources provided by obsolete x86 architecture which are available only to
compatibility. Right columns show resources available in 64 bit mode. The difference between the
64-bit
modes is marked grey.
5. Table 2. Registers and stack available in different modes
As shown in Table 2 obsolete x86 architecture (this mode is called legacy mode in x86 x86-64) supports 8
general-purpose registers. But actually only 4 registers are usually used: EAX, EBX, ECX, EDX. Registers
purpose ,
EBP, ESI, EDI, ESP have a special purpose X86-64 architecture adds 8 general-
purpose. -purpose registers and
enlarges the register range from 32 bits to 64 bits. It allows compilers to increase code performance. A
64-bit compiler can use registers for storing variables more efficiently. The compiler also allows you to
bit efficiently.
minimize memory access by locating operation inside general purpose registers.
general-purpose
• x86-64 architecture supports the whole set of x86 instructions and adds some new instructions
64
for supporting long-mode. The commands are divided into several subsets:
mode.
6. • General-purpose commands. These are main x86 integer commands used in all programs. Most
of them are intended for loading, saving and processing data located in general-purpose
registers or memory. Some of these commands manage the command stream providing passage
from one program section to another.
• 128-bit media-commands. These are SSE and SSE2 (streaming SIMD extension) commands
intended for loading, saving or processing data located in 128-bit XMM registers. They perform
integer or floating-point operations over vector (packed) and scalar data types. As vector
commands can perform one operation over a data set independently they are called single-
instruction, multiple-data (SIMD) commands. They are used for media- and science applications
for processing data blocks.
• 64-bit media-commands. These are multimedia extension (MMX) and 3DNow! Commands. They
save, restore and process data located in 64-bit MMX registers. Like 128-bit commands
described before they perform integer and floating-point operations over vector (packed) and
scalar data.
• x87 commands. They are intended for working with the floating point in obsolete x87
applications. They process data in x87 registers.
Some of these commands connect two or more subsets of the commands described above. For
example, such are commands of data transmission between general-purpose registers and XMM or
MMX registers.
Let's consider in detail the operating modes shown in Table 1 supported by x86-64. In most cases
addresses' and operands' sizes can be overlayed by a command prefix.
Let's describe long-mode at first. This is an extension of the obsolete protected mode. Long-mode
consists of two submodes: 64-bit mode and compatibility mode. 64-bit mode supports all the new
possibilities and register extensions introduced into x86-64. Compatibility mode supports binary
compatibility with existing 16-bit and 32-bit code. Long-mode doesn't support obsolete real mode or
obsolete virtual-8086 mode and it also doesn't support hardware task switching.
As 64-bit mode supports 64-bit address space you need to use a new 64-bit operational system for its
work. Meanwhile, the existing applications can be launched without recompiling in compatibility mode
under the OS working in 64-bit mode. For 64-bit command addressing a 64-bit register (RIP) and a new
addressing mode with single flat address space for code, stack and data are used.
64-bit mode implements support of extended registers through a new prefix group of REX commands.
In 64-bit mode addresses' size is 64 bits on default but implementations of x86-64 may have a smaller
size. An operand's size is 32 bits on default. For most instructions the operand's size can be overlaid
using a prefix of REX-type commands.
64-bit mode provides data addressing relative to the 64-bit register RIP. X86 architecture provided
addressing relative to IP register only in control transfer commands. RIP-relative addressing increases
efficiency of position-independent code and code addressing global data.
Some opcode commands were redefined to support extended registers and 64-bit addressing.
Compatibility mode is intended for executing existing 16-bit and 32-bit programs in a 64-bit OS.
Applications are launched in compatibility mode with the use of 32- or 16-bit address space and can
7. have access to 4Gb of virtual address space. Commands' prefixes can switch 16- and 32-bit addresses
and operands' sizes.
From the application's viewpoint compatibility mode looks like the obsolete protected x86 mode but
from the viewpoint of the OS (address translation, processing of interruptions and exceptions) 64-bit
mechanisms are used.
Legacy mode provides binary compatibility not only with 16- and 32-bit applications but with 16- and
32-bit operational systems as well. It includes three modes:
• Protected mode. 16- and 32-bit programs with segmental memory organization, privilege and
virtual memory support. Address space is 4Gb.
• Virtual-8086 mode. Supports 16-bit applications launched as tasks in protected mode. Address
space is 1Mb.
• Real mode. Supports 16-bit programs with simple register addressing of segmented memory.
Virtual memory and privileges are not supported. 1Mb of memory is available.
Legacy mode is used only when 16- and 32-bit OS are operating.
1.2. The architecture's advantages
Let's outline the main advantages of AMD x86-64 architecture.
• 64-bit address space.
• Extended register set.
• Developer-habitual command set.
• Possibility of launching obsolete 32-bit applications in a 64-bit OS.
• Possibility of using a 32-bit OS.
1.3. The architecture's disadvantages
The new architecture AMD x86-64 hasn't introduced crucial disadvantages into 32-bit architecture. We
can point out only a bit increased programs' memory requirements because of the larger size of
addresses and operands. But it won't influence however significantly the code size or the requirements
to available main memory.
But the fact is that AMD x86-64 hasn't introduced anything significantly new. There is no performance
gain. On the average, you can expect 5-15% performance gain after recompiling a program.
AMD64 program model
Nearly all modern OS now have versions for AMD64 architecture. Thus, Microsoft presents Windows XP
64-bit, Windows Server 2003 64bit, Windows Vista 64bit. The leading UNIX system developers also
provide 64-bit versions, such as, for example, Linux Debian 3.1 x86-64. But it doesn't mean that the
whole code of such a system is completely 64-bit. Some OS code and many applications still can remain
32-bit as AMD64 provides backward compatibility.
64-bit Windows version, for example, uses a special mode WoW (Windows-on-Windows 64) which
translates 32-bit applications' calls to the resources of a 64-bit OS. Let's consider in detail AMD64
program model available to a programmer in 64-bit Windows [3, 4] shortly called Win64.
8. Let's begin with address space. Although a 64-bit processor can theoretically address 16 exabyte (2^64)
Win64 now supports 16 terabytes (2^44). There are several reasons for this. Existing processors can
provide access only to 1 terabyte (2^40) of actual storage. The architecture (but not the hardware part)
can extend this space up to 4 petabytes. But anyway we need a great memory size for page tables
representing memory. (see Table 3).
32-bit mode 64-bit mode
Process's general 4Gb 16Tb
address space
Address space 2Gb (3Gb if the system is 4Gb if the application is compiled with
available to a 32-bit loaded with /3GB key) /LARGEADDRESSAWARE key (2Gb otherwise)
process
Address space Impossible 8Tb
available to a 64-bit
process
Paged pool 470Mb 128Gb
Non-paged pool 256Mb 128Gb
System Page Table 660Mb - 900Mb 128Gb
(PTE)
Table 3. Main memory limitations in Windows
Like in Win32 the addressed memory range is divided into user and system addresses. Each process
receives 8Tb and 8Tb remain in the system (unlike 2Gb and 2Gb in Win32 correspondingly). Different
Windows versions have different limitations shown in Table 4.
Actual storage and number of processors 32-bit models 64-bit models
Windows XP Home 4 Gb, 1 CPU Not present
Windows XP Professional 4 Gb, 1-2 CPU 128 Gb, 1-2 CPU
Windows Server 2003, Standard 4 Gb, 1-4 CPU 32 Gb, 1-4 CPU
Windows Server 2003, Enterprise 64 Gb, 1-8 CPU 1 Tb, 1-8 CPU
Windows Server 2003, Datacenter 64 Gb, 8-32 CPU 1 Tb, 8-64 CPU
Windows Server 2008, Datacenter 64 Gb, 2-64 CPU 2 Tb, 2-64 CPU
Windows Server 2008, Enterprise 64 Gb, 1-8 CPU 2 Tb, 1-8 CPU
Windows Server 2008, Standard 4 Gb, 1-4 CPU 32 Gb, 1-4 CPU
Windows Server 2008, Web Server 4 Gb, 1-4 CPU 32 Gb, 1-4 CPU
Vista Home Basic 4 Gb, 1 CPU 8 Gb, 1 CPU
Vista Home Premium 4 Gb, 1-2 CPU 16 Gb, 1-2 CPU
Vista Business 4 Gb, 1-2 CPU 128 Gb, 1-2 CPU
Vista Enterprise 4 Gb, 1-2 CPU 128 Gb, 1-2 CPU
Vista Ultimate 4 Gb, 1-2 CPU 128 Gb, 1-2 CPU
Table 4. Limitations of different Windows versions
Like in Win32 a page's size is 4Kb. First 4Kb of address space are never shown, i.e. the least true address
is 0x10000. Unlike Win32 system DLL are loaded exceeding 4Gb.
All the processors implementing AMD64 have support for "CPU No Execution" bit which is used by
Windows for implementing the hardware technology "Data Execution Protection" (DEP) which forbids
execution of user data instead of code. It allows you to increase programs' safety excluding influence of
such errors as execution of the buffer with data as code.
The peculiarity of AMD64 compilers is that they can most efficiently implement registers for passing
parameters into functions instead of using the stack. It allowed Win64 architecture developers to get rid
9. off such a notion as calling convention. In Win32 you can use different conventions (ways of passing
parameters): __stdcall, __cdecl, __fastcall etc. In Win64 there is only one calling convention. Let's
consider an example of how four arguments of integer-type are passed in registers:
• RCX: first argument
• RDX: second argument
• R8: third argument
• R9: fourth argument
Arguments after the first four integers are passed on the stack. For float arguments XMM0-XMM3 both
the registers and the stack are used.
The difference in calling conventions leads to that you cannot use both 64-bit and 32-bit code in one
program. In other words, if an application is compiled for 64-bit mode all the used DLL libraries must be
64-bit too.
While writing 64-bit code you can get additional performance gain thanks to special optimization. This
question is considered in detail in optimizing instructions [5].
3. Porting applications on AMD64
One of the purposes of high-level languages is to reduce as far as possible the binding of program code
to the architecture and provide the most possible portability between hardware platforms. For example,
C++ programs written correctly are theoretically independent from the hardware platform. And, ideally,
to compile the corresponding 32-bit applications for AMD64 platform it is enough only to change the
compiler [ 6] and just recompile the program. But in practice everything is more complicated.
Software using Assembler code for 32-bit processors still exists. Many programs written in high-level
languages contain Assembler blocks. That's why it is often impossible just to recompile a large project.
The solution of this problem is clear. Firstly, you can refuse porting an application on a new platform. It
can be a very reasonable solution because, for example, Windows-family OS provide good backward
compatibility due to Wow64 technology. The second variant is to rewrite the program code. Moreover,
it seems reasonable to rewrite it using high-level languages. By the way, pay attention that Visual C++
compiler doesn't support compilation of Assembler blocks in 64-bit compilation mode anymore [7].
Presence of Assembler program code is not the only obstacle we face while mastering 64-bit systems.
While porting programs on 64-bit systems different errors occur relating to changing of the data model
(type dimension). What's more, some errors become apparent only while using large memory size which
was unavailable in 32-bit systems. Such errors are well described in the article "20 issues of porting C++
code on the 64-bit platform" [8].
All said above relates mostly to C/C++ applications. It is better with managed code (C#) although we can
face some small problems here as well. Unfortunately, large program complexes are often built using
libraries written in C/C++. And that's why in case of a large C# project it most likely contains C/C++
modules or libraries which can be unsafe and contain vulnerabilities.
For testing and checking program code ported on a 64-bit platform you can use different special
methods and tools [9]. For example, such static analyzers as Viva64 (for Windows systems) and PC-Lint
(for Unix systems) can provide good results. To learn more about this toolkit read the article
"Comparison of analyzers' diagnostic abilities while testing 64-bit code" [10].
10. Conclusion
Undoubtedly, AMD64 architecture offered by AMD Company turned out to be needed on market.
AMD64's advantage is that it allows you to smoothly switch to 64-bit programs without losing
compatibility with obsolete 32-bit applications. But there is nothing revolutionary in AMD64.
Migration of 32-bit programs on AMD64, as experiments demonstrate, allows you, firstly, to solve tasks
which are much more memory-demanding and, secondly, get about 10% performance gain "just so"
without changing code due to optimization of an application by the compiler for the new architecture.
We may conclude that AMD64 architecture postponed the problem of limited available main-memory
size for many years but didn't solve the problem of modern personal computers' performance gain. The
future is still with multi-core and multi-processor systems.
References
1. Intel Software Developer's Manual. Volume 1: Basic Architecture.
http://www.viva64.com/go.php?url=212
2. AMD x86-64 Architecture Programmer's Manual. Volume 1: Application Programming.
http://www.viva64.com/go.php?url=213
3. Mike Wall. Tricks for Porting Applications to 64-Bit Windows on AMD64 Architecture.
http://www.viva64.com/go.php?url=214
4. Matt Pietrek. Everything You Need To Know To Start Programming 64-Bit Windows Systems.
http://www.viva64.com/go.php?url=215
5. Software Optimization Guide for AMD Athlon 64 and AMD Opteron Processors.
http://www.viva64.com/go.php?url=59
6. Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 Platforms.
http://www.viva64.com/go.php?url=216
7. Daniel Pistelli. Moving to Windows Vista x64. http://www.viva64.com/go.php?url=217
8. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform.
http://www.viva64.com/art-1-2-599168895.html
9. Andrey Karpov. Problems of testing 64-bit applications. http://www.viva64.com/art-1-2-
1289354852.html
10. Andrey Karpov. Comparison of analyzers' diagnostic abilities while testing 64-bit code.
http://www.viva64.com/art-1-2-914146540.html