O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
TLB misses -
The Missing Issue of
Adaptive Radix Tree?
Petrie Wong Ziqiang Feng Wenjian Xu Eric Lo Ben Kao
Department of C...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Motivation
• In-memory databases
• H-Sto...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Why Adaptive Radix Tree
• Outperforms ex...
What is Adaptive Radix Tree
…
…
…
…
…
…
…
…
EE
…
01
02
03
04
01 02 03 04
key array pointer array
00
01
FF
FD
FE
Node256
00...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Whether TLB miss matter in
ART?
• Transl...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Whether TLB miss matter in
ART?
• Experi...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Whether TLB miss matter in
ART?
• Data
•...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Whether TLB miss matter in
ART?
• Worklo...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong 9
Very skew
Whether TLB miss matter in
A...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong 10
Uniform
Whether TLB miss matter in
AR...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong 11
up to 23%
• YES, when the
workload po...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
What are the measures that
we can take t...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
What are the measures that
we can take t...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
What is Huge Page?
• In memory allocatio...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Why Huge Page?
• if apply huge page in A...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Why Huge Page?
• page table entry
• besi...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Huge Page always Help?
• but…
• differen...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Can Huge Page Help?
• Yes
• when workloa...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
What are the measures that
we can take t...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
What is Workload-Conscious
Node-to-Page ...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Why Workload-Conscious
Node-to-Page Reor...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
How Workload-Conscious
Node-to-Page Reor...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Can Workload-Conscious Node-
to-Page Reo...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Can Workload-Conscious Node-
to-Page Reo...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Can Workload-Conscious Node-
to-Page Reo...
TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong
Summary
• TLB miss does matter when the ...
Thank you
Próximos SlideShares
Carregando em…5
×

TLB misses - The Missing Issue of Adaptive Radix Tree?

562 visualizações

Publicada em

Efficient main-memory index structures are crucial to main memory database systems. Adaptive Radix Tree (ART) is
the most recent in-memory index structure. ART is designed
to avoid cache miss, leverage SIMD data parallelism, minimize
branch mis-prediction, and have small memory footprint.
When an in-memory index structure like ART has significantly few cache misses and branch mis-predictions, it is natural to question whether misses in Translation Lookaside Buffer (TLB) matters. In this paper, we try to confirm whether this is the case and if the answer is positive, what are the measures that we can take to alleviate that and how effective they are.

Publicada em: Tecnologia
  • Login to see the comments

TLB misses - The Missing Issue of Adaptive Radix Tree?

  1. 1. TLB misses - The Missing Issue of Adaptive Radix Tree? Petrie Wong Ziqiang Feng Wenjian Xu Eric Lo Ben Kao Department of Computer Science, The University of Hong Kong Department of Computing, The Hong Kong Polytechnic University
  2. 2. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Motivation • In-memory databases • H-Store • Hekaton • Efficient in-memory index structures • Cache-Sensitive B+-Tree (CSB+-Tree) • Fast Architecture Sensitive Tree (FAST) • Adaptive Radix Tree (ART) 2
  3. 3. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Why Adaptive Radix Tree • Outperforms existing index structures • both search and update • has small memory footprint • Avoid cache miss • Leverage SIMD data parallelism • Reduce branch mis-prediction • adopt a radix tree structure 3 V. Leis, A. Kemper al et ICDE’13
  4. 4. What is Adaptive Radix Tree … … … … … … … … EE … 01 02 03 04 01 02 03 04 key array pointer array 00 01 FF FD FE Node256 00 01 02 Node256 Data 00 01 02 03 Node256 Data … 01 02 03 FF 1 2 3 48 index array child pointer Node48 … Node4 pointer array small node type (Node4) for nodes with few child pointers large node type (Node256) for nodes with many child pointers
  5. 5. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Whether TLB miss matter in ART? • Translation Look-aside Buffer (TLB) • cache for page table entries • fast way to translate virtual memory address to physical memory address • executing an instruction in CPU • in-memory index structure like ART • few cache miss, few branch mis-prediction, SIMD-friendly • whether misses in TLB would become a bottleneck • if positive • what are the measures to alleviate • how effective those measures are 5 for program for CPU
  6. 6. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Whether TLB miss matter in ART? • Experiment to show • Stall time % due to TLB miss • System specification • Intel Core i7 2630QM CPU • 2.00 GHz clock rate, 2.9 GHz turbo frequency. • Each core • 32KB L1i cache, 32KB L1d cache, 256KB unified L2 Cache • share 6MB L3 cache, 16GB 1600 RAM. 6
  7. 7. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Whether TLB miss matter in ART? • Data • 1,000,000 integer keys • Dense: from 1 to n (19MB in RAM) • Sparse: random number in 32bit domain (22MB in RAM) • cannot fit into 6MB L3 cache 7
  8. 8. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Whether TLB miss matter in ART? • Workload • 256M lookups • Varying skewness: • zipf=0 (each key is uniformly accessed) 
 
 to • zipf=3 (few very hot keys and many non-hot keys) 8
  9. 9. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong 9 Very skew Whether TLB miss matter in ART? • No, when key access is very skew (Zipf=2 to 3) • few very hot search keys • occupies very few page table entries in TLB • very few TLB misses are incurred (0% to 2% of stall time) • TLB miss doesn’t matter 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 stalltimeduetoTLBmiss/indexlookuplatency(%) Zipf Dense Sparse 0% to 2%
  10. 10. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong 10 Uniform Whether TLB miss matter in ART? • No, when workload is not skew (Zipf=0 to 1) • each key is uniformly accessed • no spatial locality • lots of cache misses • dominate the latency • TLB miss not so matters (5% to 7%) 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 stalltimeduetoTLBmiss/indexlookuplatency(%) Zipf Dense Sparse 5% to 7%
  11. 11. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong 11 up to 23% • YES, when the workload posses realistic skewness (Zipf = 1 to 2) • key access with certain spatial locality • cache miss is not high • TLB matters now (up to 23%) Whether TLB miss matter in ART? 0 5 10 15 20 25 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 stalltimeduetoTLBmiss/indexlookuplatency(%) Zipf Dense Sparse
  12. 12. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong What are the measures that we can take to alleviate? • use of huge page • workload-conscious node-to-page reorganization 12
  13. 13. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong What are the measures that we can take to alleviate? • use of huge page • workload-conscious node-to-page reorganization 13
  14. 14. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong What is Huge Page? • In memory allocation • eliminate fragmentation over the whole memory space • cutting memory space into pages • Regular page size (in most processors e.g. Intel Sandy Bridge - Xeon E5) • 4KB • OS’s default value • Huge page size (e.g. Sandy Bridge) • 2MB, 1GB • good tactic to reduce TLB misses 14
  15. 15. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Why Huge Page? • if apply huge page in ART • reduce # of pages spanned by ART nodes • reduce the pressure on the TLB • fewer TLB miss • throughput increase 15
  16. 16. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Why Huge Page? • page table entry • besides being stored in TLB • occupy space in L1/L2/L3 cache and RAM • So… fewer page table entries • occupy fewer space in processor’s cache • fewer cache misses • throughput increase 16 Page Table Entries ART Data L2 Cache when using regular page Others Page Table Entries ART Data Others L2 Cache when using huge page
  17. 17. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Huge Page always Help? • but… • different # of TLB entries for different page sizes • # of huge page entries are fewer than that of regular page entries • In Xeon E5, • 64 DTLB and 512 STLB entires for regular pages • 32 DTLB entires for huge pages • fewer TLB entries available for huge page • throughput may decrease 17
  18. 18. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Can Huge Page Help? • Yes • when workload is uniform and quite skew (Zipf < 2) • reduce TLB miss and cache miss • throughput increase as expected • when workload extreme skew (Zipf > 2) • very few TLB miss and cache miss • no further improvement 18 0 5 10 15 20 25 30 35 40 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 ThroughputImprovement(%) Zipf Dense Sparse
  19. 19. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong What are the measures that we can take to alleviate? • use of huge page • workload-conscious node-to-page reorganization 19
  20. 20. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong What is Workload-Conscious Node-to-Page Reorganization? • tree nodes in ART are allocated • dynamic memory allocation • OS’s default scheme • eliminate fragmentation over the whole memory space • workload-conscious allocation (R. Stoica and A. Ailamaki et al. DaMoN’13) • takes over OS’s control • organize the hot ART nodes into the same page. 20
  21. 21. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Why Workload-Conscious Node-to-Page Reorganization • OLTP workload is skew • some keys are hot and accessed frequently • if putting all hot nodes into one (huge) page • page table entry of the hot page will be kept in TLB • no TLB miss when accessing hot keys • Throughput increase 21
  22. 22. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong How Workload-Conscious Node-to-Page Reorganization • When query execution • log key accesses • analyzing access logs • sort the keys by their access frequencies • node-to-page reorganization • according to access frequencies • hot nodes will be placed in same page 22 P1 P2 Phot Pcold
  23. 23. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Can Workload-Conscious Node- to-Page Reorganization Help? • Yes, when • data is sparse and workload is skew • sparse data • each node contain few children • small nodes (Node4, size is 36 byte) are used • many nodes, not so condense • more space, more pages • more page table entries • TLB miss matters 23 0 5M 10M 15M 20M 25M 30M 35M 40M 45M 50M 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 Throughput(lookup/s) Zipf ART with reorganization ART … … … … … … … … EE … 01 02 03 04 01 02 03 04 key array pointer array 00 01 FF FD FE Node256 00 01 02 Node256 Data 00 01 02 03 Node256 Data … 01 02 03 FF 1 2 3 48 index array child pointer Node48 … Node4 pointer array
  24. 24. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Can Workload-Conscious Node- to-Page Reorganization Help? • sparse data • when workload-conscious reorganization applied • all hot nodes can put into few pages • fewer page table entries need to be cached (for hot nodes) • fewer TLB miss and throughput increase 24 0 5M 10M 15M 20M 25M 30M 35M 40M 45M 50M 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 Throughput(lookup/s) Zipf ART with reorganization ART
  25. 25. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Can Workload-Conscious Node- to-Page Reorganization Help? • No, when • data is dense • huge page is used • be few pages needed. • all page table entries can stay in TLB • giving almost no TLB miss • make node-to-page reorganization immaterial 25 0 5M 10M 15M 20M 25M 30M 35M 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 Throughput(lookup/s) Zipf ART with reorganization ART
  26. 26. TLB misses - the Missing Issue of Adaptive Radix Tree? - presented by Petrie Wong Summary • TLB miss does matter when the access workload possess realistic skew • the use of huge page provides 1-32% positive lookup throughput improvement over the use of regular page • workload-conscious node-to-page reorganization does help when the data to be indexed is sparse 26
  27. 27. Thank you

×