SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Linux Huge Pages
Why? How? When?
1
• What are you talking about?
• Linux kernel map
• Memory Allocation
• Paging Model
• Page Fault
• Swapping
• Why Huge Pages
• How to configure
• When to configure
• Summary
2
Agenda
• This is mainly about X86-64
(Intel and AMD CPUs produced after 2004)
• There are some differences on huge pages among different
hardware architectures that are out of our scope
• We will not explore MMU, TLB and all the internals of virtual memory
management
• Some images are outdated
(e.g.: Linux kernel 2.6 while current version is 5.5)
but it illustrates very well the aspects discussed in this presentation
3
Premises
4
What are you talking about?
5
This is the Linux
kernel map on
version 2.6.36
While it is dated
by 10 years, it
gives us the big
picture
6
Memory Allocation
.
.
.
.
.
.
7
Paging Model
8
Page Fault
9
Swapping
• As we can see, memory management is complicated process
involving many ‘round-trips’
• Huge pages is about allocating larger blocks of memory at once
Thus, cutting the ‘round-trips’ associated with small pages
• Huge Pages cannot be swapped out
• A set of 4 KB pages can turn into a single 2 MB (with PAE), 4 MB or
even 1 GB
10
Why Huge Pages
Number of Pages (4 KB) Number of Huge Pages Huge Page Equivalence
512 1 2 MB (2048 KB)
1024 1 4 MB (4096 KB)
262.144 1 1 GB (1024 MB or 1.048.576 KB)
• There are 2 huge page variants
• HugeTLB File System
• Works as a pseudo filesystem where you need to manually define the allocation
• We will use this approach
• Transparent Huge Pages
• Works transparently – Linux kernel will decide on its own if the application requires or
not huge pages but it is not recommended for latency sensitive applications
11
Why Huge Pages
• Checking if it is possible to enable huge pages
12
How to Configure
netto@bella:~$ getconf PAGESIZE
4096
netto@bella:~$ cat /proc/cpuinfo | grep 'pse|pdpe' | tail -1
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch CPUid_fault
epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2
smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln
pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
getconf returns the standard page size
for a given CPU architecture in bytes
/proc/cpuinfo contains all data related
to CPU
pse => supports huge page of 2MB
Pdpe1gb => supports huge page of 1GB
• Installing the required packages to configure huge pages as root
• WARNING: your distribution might require a slightly different setup
(e.g.: different package manager/names, less steps)
13
How to Configure
Red Hat / CentOS Debian / Ubuntu
root@bella:~$ yum -y install libhugetlbfs libhugetlbfs-utils root@bella:~$ apt-get -y install hugepages
• In the following case, we can select which huge page size is more
convenient for your application
14
How to Configure
# this is the pseudo directory where huge pages will be mapped, it needs to be an existing directory
# RedHat configuration differs a little
root@bella:~$ mkdir –p /dev/hugepages
# this can be converted to a /etc/fstab entry
root@bella:~$ mount -t hugetlbfs -o gid=<group id>, pagesize=<2M or 1G>,... none /dev/hugepages
# formula: (2 MB / 4 KB) or (1 GB / 4 KB) * size required for your scenario
# there are situations like Oracle DB where it is recommended to allocate huge pages only for SGA
vm.nr_hugepages = <number of pages>
# the same group gid on mount that must be associated with the group where your application is running
vm.hugetlb_shm_group = <group id>
• Add to sysctl.conf
• Reboot
# if huge pages are correctly setup, at least one pool will be displayed
netto@bella:~$ hugeadm --pool-list
Size Minimum Current Maximum Default
2097152 0 0 0 *
1073741824 1 1 1
# hugepages enabled if HugePages_Total is > 0
netto@bella:~$ cat /proc/meminfo
...
HugePages_Total: <huge pages pool size>
HugePages_Free: <number of huge pages that are not allocated>
HugePages_Rsvd: <number of huge pages that are reserved but not allocated>
HugePages_Surp: <maximum number of huge pages>
Hugepagesize: 2048 kB
Hugetlb: 1048576 kB
DirectMap4k: 572400 kB
DirectMap2M: 12943360 kB
DirectMap1G: 19922944 kB 15
How to Configure
16
How to Configure
Application Where Syntax
Oracle JDK/OpenJDK Command line argument –XX:+UseLargePages
MySQL my.cnf, inside the block [mysqld] large_pages=ON
PHP php.ini, opcache block opcache.huge_code_pages 1
Python Using mmap module MADV_HUGEPAGE
PostgreSQL postgresql.conf huge_pages=ON
Docker Command line argument --device=/dev/hugepages:/dev/hugepages
17
When to Configure
Advantages Disadvantages
Huge Pages can reduce pressure on TLB/MMU
Internal and external memory fragmentation will be
potentialized if not configured properly
Huge Pages are not swappable
“Swappability” avoids quick memory starvation imposing
some performance cost
Any data-intensive application that properly use mmap(),
madvise(), shmget(), shmat() and some other calls can
benefit from it
It’s a POSIX extension, other Unix like Solaris, FreeBSD and
even Windows have similar feature with a totally different
setup
Any memory-bound application can benefit from it
NUMA (non uniform memory access) systems may not
have all the benefits from an UMA system
(hardware with uniform/unified memory management)
When latency/response time is critical
Transparent Huge Pages is not recommended in general
(has very specific use cases)
• Many other advantages and disadvantages can come up but most importantly: test!
• It might be required to increase memory allocation on /etc/security/limits.conf
• Operating System Concepts
Silberschatz, Gagne, Galvin
John Wiley & Sons
• Understanding Linux Kernel
Daniel Bovet, Marco Cesati
O'Reilly Media; 3rd edition
• Professional Linux Kernel Architecture
Wolfgang Mauerer
Wrox Press
• Low level programming
Igor Zhirkov
Apress
• Systems Performance – enterprise and the cloud
Brendan Gregg
Prentice Hall
18
References
• Configuring huge pages for your PostgreSQL instance, Debian version
• Performance Tuning: HugePages In Linux
• KVM - Using Hugepages
• LinuxMM: HugePages
• Configuring HugePages for Oracle on Linux (x86-64)
• How to enable huge page support in a Dockerfile
• ZGC
• PostgreSQL and Hugepages: Working with an abundance of memory in
modern servers
• How to configure HugePage using hugeadm (RHEL/CentOS 7)
• RedHat 7 Documentation: Configuring HugeTLB HUGE PAGES
19
References
• PHP 7 - runtime configuration
• PostgreSQL 9.4 Resource Consumption
• Python mmap module
• 7 easy steps to configure HugePages for your Oracle Database Server
• Redis latency problems troubleshooting
• Wikipedia: Linux Kernel
• Interactive map of Linux Kernel
• Huge pages part 1 (Introduction)
• Huge pages part 2: Interfaces
• Huge pages part 3: Administration
• Memory part 3: Virtual Memory
20
References
21
Thank you!
Geraldo Netto
geraldo.netto@gmail.com

Mais conteúdo relacionado

Mais procurados

Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 

Mais procurados (20)

Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Linux memory
Linux memoryLinux memory
Linux memory
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Construct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoTConstruct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoT
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 

Semelhante a Linux Huge Pages

z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
Joao Galdino Mello de Souza
 
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
Pekka Männistö
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
Szymon Haly
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 

Semelhante a Linux Huge Pages (20)

PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedPGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
 
Running MySQL on Linux
Running MySQL on LinuxRunning MySQL on Linux
Running MySQL on Linux
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
os
osos
os
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
Time For D.I.M.E?
Time For D.I.M.E?Time For D.I.M.E?
Time For D.I.M.E?
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storage
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
Time For DIME
Time For DIMETime For DIME
Time For DIME
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Open Source Data Deduplication
Open Source Data DeduplicationOpen Source Data Deduplication
Open Source Data Deduplication
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 

Último

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 

Linux Huge Pages

  • 1. Linux Huge Pages Why? How? When? 1
  • 2. • What are you talking about? • Linux kernel map • Memory Allocation • Paging Model • Page Fault • Swapping • Why Huge Pages • How to configure • When to configure • Summary 2 Agenda
  • 3. • This is mainly about X86-64 (Intel and AMD CPUs produced after 2004) • There are some differences on huge pages among different hardware architectures that are out of our scope • We will not explore MMU, TLB and all the internals of virtual memory management • Some images are outdated (e.g.: Linux kernel 2.6 while current version is 5.5) but it illustrates very well the aspects discussed in this presentation 3 Premises
  • 4. 4 What are you talking about?
  • 5. 5 This is the Linux kernel map on version 2.6.36 While it is dated by 10 years, it gives us the big picture
  • 10. • As we can see, memory management is complicated process involving many ‘round-trips’ • Huge pages is about allocating larger blocks of memory at once Thus, cutting the ‘round-trips’ associated with small pages • Huge Pages cannot be swapped out • A set of 4 KB pages can turn into a single 2 MB (with PAE), 4 MB or even 1 GB 10 Why Huge Pages Number of Pages (4 KB) Number of Huge Pages Huge Page Equivalence 512 1 2 MB (2048 KB) 1024 1 4 MB (4096 KB) 262.144 1 1 GB (1024 MB or 1.048.576 KB)
  • 11. • There are 2 huge page variants • HugeTLB File System • Works as a pseudo filesystem where you need to manually define the allocation • We will use this approach • Transparent Huge Pages • Works transparently – Linux kernel will decide on its own if the application requires or not huge pages but it is not recommended for latency sensitive applications 11 Why Huge Pages
  • 12. • Checking if it is possible to enable huge pages 12 How to Configure netto@bella:~$ getconf PAGESIZE 4096 netto@bella:~$ cat /proc/cpuinfo | grep 'pse|pdpe' | tail -1 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch CPUid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d getconf returns the standard page size for a given CPU architecture in bytes /proc/cpuinfo contains all data related to CPU pse => supports huge page of 2MB Pdpe1gb => supports huge page of 1GB
  • 13. • Installing the required packages to configure huge pages as root • WARNING: your distribution might require a slightly different setup (e.g.: different package manager/names, less steps) 13 How to Configure Red Hat / CentOS Debian / Ubuntu root@bella:~$ yum -y install libhugetlbfs libhugetlbfs-utils root@bella:~$ apt-get -y install hugepages
  • 14. • In the following case, we can select which huge page size is more convenient for your application 14 How to Configure # this is the pseudo directory where huge pages will be mapped, it needs to be an existing directory # RedHat configuration differs a little root@bella:~$ mkdir –p /dev/hugepages # this can be converted to a /etc/fstab entry root@bella:~$ mount -t hugetlbfs -o gid=<group id>, pagesize=<2M or 1G>,... none /dev/hugepages # formula: (2 MB / 4 KB) or (1 GB / 4 KB) * size required for your scenario # there are situations like Oracle DB where it is recommended to allocate huge pages only for SGA vm.nr_hugepages = <number of pages> # the same group gid on mount that must be associated with the group where your application is running vm.hugetlb_shm_group = <group id> • Add to sysctl.conf • Reboot
  • 15. # if huge pages are correctly setup, at least one pool will be displayed netto@bella:~$ hugeadm --pool-list Size Minimum Current Maximum Default 2097152 0 0 0 * 1073741824 1 1 1 # hugepages enabled if HugePages_Total is > 0 netto@bella:~$ cat /proc/meminfo ... HugePages_Total: <huge pages pool size> HugePages_Free: <number of huge pages that are not allocated> HugePages_Rsvd: <number of huge pages that are reserved but not allocated> HugePages_Surp: <maximum number of huge pages> Hugepagesize: 2048 kB Hugetlb: 1048576 kB DirectMap4k: 572400 kB DirectMap2M: 12943360 kB DirectMap1G: 19922944 kB 15 How to Configure
  • 16. 16 How to Configure Application Where Syntax Oracle JDK/OpenJDK Command line argument –XX:+UseLargePages MySQL my.cnf, inside the block [mysqld] large_pages=ON PHP php.ini, opcache block opcache.huge_code_pages 1 Python Using mmap module MADV_HUGEPAGE PostgreSQL postgresql.conf huge_pages=ON Docker Command line argument --device=/dev/hugepages:/dev/hugepages
  • 17. 17 When to Configure Advantages Disadvantages Huge Pages can reduce pressure on TLB/MMU Internal and external memory fragmentation will be potentialized if not configured properly Huge Pages are not swappable “Swappability” avoids quick memory starvation imposing some performance cost Any data-intensive application that properly use mmap(), madvise(), shmget(), shmat() and some other calls can benefit from it It’s a POSIX extension, other Unix like Solaris, FreeBSD and even Windows have similar feature with a totally different setup Any memory-bound application can benefit from it NUMA (non uniform memory access) systems may not have all the benefits from an UMA system (hardware with uniform/unified memory management) When latency/response time is critical Transparent Huge Pages is not recommended in general (has very specific use cases) • Many other advantages and disadvantages can come up but most importantly: test! • It might be required to increase memory allocation on /etc/security/limits.conf
  • 18. • Operating System Concepts Silberschatz, Gagne, Galvin John Wiley & Sons • Understanding Linux Kernel Daniel Bovet, Marco Cesati O'Reilly Media; 3rd edition • Professional Linux Kernel Architecture Wolfgang Mauerer Wrox Press • Low level programming Igor Zhirkov Apress • Systems Performance – enterprise and the cloud Brendan Gregg Prentice Hall 18 References
  • 19. • Configuring huge pages for your PostgreSQL instance, Debian version • Performance Tuning: HugePages In Linux • KVM - Using Hugepages • LinuxMM: HugePages • Configuring HugePages for Oracle on Linux (x86-64) • How to enable huge page support in a Dockerfile • ZGC • PostgreSQL and Hugepages: Working with an abundance of memory in modern servers • How to configure HugePage using hugeadm (RHEL/CentOS 7) • RedHat 7 Documentation: Configuring HugeTLB HUGE PAGES 19 References
  • 20. • PHP 7 - runtime configuration • PostgreSQL 9.4 Resource Consumption • Python mmap module • 7 easy steps to configure HugePages for your Oracle Database Server • Redis latency problems troubleshooting • Wikipedia: Linux Kernel • Interactive map of Linux Kernel • Huge pages part 1 (Introduction) • Huge pages part 2: Interfaces • Huge pages part 3: Administration • Memory part 3: Virtual Memory 20 References