SlideShare a Scribd company logo
1 of 23
Download to read offline
Faster persistent data structures through hashing

                    Johan Tibell
               johan.tibell@gmail.com


                    2011-02-15
Motivating problem: Twitter data analys



   “I’m computing a communication graph from Twitter data and
   then scan it daily to allocate social capital to nodes behaving in a
   good karmic manner. The graph is culled from 100 million tweets
   and has about 3 million nodes.”

   We need a data structure that is
       fast when used with string keys, and
       doesn’t use too much memory.
Persistent maps in Haskell




       Data.Map is the most commonly used map type.
       It’s implemented using size balanced trees.
       Keys can be of any type, as long as values of the type can be
       ordered.
Real world performance of Data.Map




      Good in theory: no more than O(log n) comparisons.
      Not great in practice: up to O(log n) comparisons!
      Many common types are expensive to compare e.g String,
      ByteString, and Text.
      Given a string of length k, we need O(k ∗ log n) comparisons
      to look up an entry.
Hash tables




      Hash tables perform well with string keys: O(k) amortized
      lookup time for strings of length k.
      However, we want persistent maps, not mutable hash tables.
Milan Straka’s idea: Patricia trees as sparse arrays




       We can use hashing without using hash tables!
       A Patricia tree implements a persistent, sparse array.
       Patricia trees are as twice fast as size balanced trees, but only
       work with Int keys.
       Use hashing to derive an Int from an arbitrary key.
Implementation tricks




      Patricia trees implement a sparse, persistent array of size 232
      (or 264 ).
      Hashing using this many buckets makes collisions rare: for 224
      entries we expect about 32,000 single collisions.
      Linked lists are a perfectly adequate collision resolution
      strategy.
First attempt at an implementation

  −− D e f i n e d i n t h e c o n t a i n e r s p a c k a g e .
  data IntMap a
      = Nil
      | Tip {−# UNPACK #−} ! Key a
      | Bin {−# UNPACK #−} ! P r e f i x
                  {−# UNPACK #−} ! Mask
                  ! ( IntMap a ) ! ( IntMap a )

   type P r e f i x = I n t
   type Mask        = Int
   type Key         = Int

   newtype HashMap k v = HashMap ( IntMap [ ( k , v ) ] )
A more memory efficient implementation

   data HashMap k v
       = Nil
       | Tip {−# UNPACK #−}        ! Hash
             {−# UNPACK #−}        ! ( FL . F u l l L i s t k v )
       | Bin {−# UNPACK #−}        ! Prefix
             {−# UNPACK #−}        ! Mask
             ! ( HashMap k v )     ! ( HashMap k v )

   type P r e f i x = I n t
   type Mask        = Int
   type Hash        = Int

   data F u l l L i s t k v = FL ! k ! v ! ( L i s t k v )
   data L i s t k v = N i l | Cons ! k ! v ! ( L i s t k v )
Reducing the memory footprint




      List k v uses 2 fewer words per key/value pair than [( k, v )] .
       FullList can be unpacked into the Tip constructor as it’s a
      product type, saving 2 more words.
      Always unpack word sized types, like Int, unless you really
      need them to be lazy.
Benchmarks




  Keys: 212 random 8-byte ByteStrings

             Map    HashMap
    insert   1.00     0.43
   lookup    1.00     0.28

  delete performs like insert.
Can we do better?




      We still need to perform O(min(W , log n)) Int comparisons,
      where W is the number of bits in a word.
      The memory overhead per key/value pair is still quite high.
Borrowing from our neighbours




      Clojure uses a hash-array mapped trie (HAMT) data structure
      to implement persistent maps.
      Described in the paper “Ideal Hash Trees” by Bagwell (2001).
      Originally a mutable data structure implemented in C++.
      Clojure’s persistent version was created by Rich Hickey.
Hash-array mapped tries in Clojure




      Shallow tree with high branching factor.
      Each node, except the leaf nodes, contains an array of up to
      32 elements.
      5 bits of the hash are used to index the array at each level.
      A clever trick, using bit population count, is used to represent
      spare arrays.
The Haskell definition of a HAMT

   data HashMap k v
       = Empty
       | BitmapIndexed
                {−# UNPACK #−} ! Bitmap
                {−# UNPACK #−} ! ( Array ( HashMap k v ) )
       | L e a f {−# UNPACK #−} ! ( L e a f k v )
       | F u l l {−# UNPACK #−} ! ( Array ( HashMap k v ) )
       | C o l l i s i o n {−# UNPACK #−} ! Hash
                           {−# UNPACK #−} ! ( Array ( L e a f k v ) )

   type Bitmap = Word

   data Array a = Array ! ( Array# a )
                        {−# UNPACK #−} ! I n t
Making it fast




       The initial implementation by Edward Z. Yang: correct but
       didn’t perform well.
       Improved performance by
           replacing use of Data.Vector by a specialized array type,
           paying careful attention to strictness, and
           using GHC’s new INLINABLE pragma.
Benchmarks




  Keys: 212 random 8-byte ByteStrings

             Map    HashMap    HashMap (HAMT)
    insert   1.00     0.43          1.21
   lookup    1.00     0.28          0.21
Where is all the time spent?




       Most time in insert is spent copying small arrays.
       Array copying is implemented using indexArray# and
       writeArray#, which results in poor performance.
       When cloning an array, we are force to first fill the new array
       with dummy elements, followed by copying over the elements
       from the old array.
A better array copy




      Daniel Peebles have implemented a set of new primops for
      copying array in GHC.
      The first implementation showed a 20% performance
      improvement for insert.
      Copying arrays is still slow so there might be room for big
      improvements still.
Other possible performance improvements



      Even faster array copying using SSE instructions, inline
      memory allocation, and CMM inlining.
      Use dedicated bit population count instruction on
      architectures where it’s available.
      Clojure uses a clever trick to unpack keys and values directly
      into the arrays; keys are stored at even positions and values at
      odd positions.
      GHC 7.2 will use 1 less word for Array.
Optimize common cases



      In many cases maps are created in one go from a sequence of
      key/value pairs.
      We can optimize for this case by repeatedly mutating the
      HAMT and freeze it when we’re done.

  Keys: 212 random 8-byte ByteStrings

      fromList/pure     1.00
   fromList/mutating    0.50
Abstracting over collection types




       We will soon have two map types worth using (one ordered
       and one unordered).
       We want to write function that work with both types, without
       a O(n) conversion cost.
       Use type families to abstract over different concrete
       implementation.
Summary




     Hashing allows us to create more efficient data structures.
     There are new interesting data structures out there that have,
     or could have, an efficient persistent implementation.

More Related Content

What's hot

Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1Jatin Miglani
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)PyData
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmayTanmay 'Unsinkable'
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYAPYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYAMaulik Borsaniya
 
Real World Haskell: Lecture 5
Real World Haskell: Lecture 5Real World Haskell: Lecture 5
Real World Haskell: Lecture 5Bryan O'Sullivan
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesAndrew Ferlitsch
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsChristopher Conlan
 
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a categorysamthemonad
 
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonGraph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonChristopher Conlan
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义leejd
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Functional Programming by Examples using Haskell
Functional Programming by Examples using HaskellFunctional Programming by Examples using Haskell
Functional Programming by Examples using Haskellgoncharenko
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Alexander Hendorf
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 

What's hot (19)

iPython
iPythoniPython
iPython
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
Application of hashing in better alg design tanmay
Application of hashing in better alg design tanmayApplication of hashing in better alg design tanmay
Application of hashing in better alg design tanmay
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYAPYTHON-Chapter 4-Plotting and Data Science  PyLab - MAULIK BORSANIYA
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
 
Real World Haskell: Lecture 5
Real World Haskell: Lecture 5Real World Haskell: Lecture 5
Real World Haskell: Lecture 5
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Algorithms 101 for Data Scientists
Algorithms 101 for Data ScientistsAlgorithms 101 for Data Scientists
Algorithms 101 for Data Scientists
 
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a category
 
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahonGraph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
Graph Algorithms, Sparse Algebra, and the GraphBLAS with Janice McMahon
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Functional Programming by Examples using Haskell
Functional Programming by Examples using HaskellFunctional Programming by Examples using Haskell
Functional Programming by Examples using Haskell
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 

Viewers also liked

Efficient Immutable Data Structures (Okasaki for Dummies)
Efficient Immutable Data Structures (Okasaki for Dummies)Efficient Immutable Data Structures (Okasaki for Dummies)
Efficient Immutable Data Structures (Okasaki for Dummies)Tom Faulhaber
 
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakProbabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakPyData
 
Security in the Real World - JavaOne 2013
Security in the Real World - JavaOne 2013Security in the Real World - JavaOne 2013
Security in the Real World - JavaOne 2013MattKilner
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationJafar Nesargi
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashingAmi Ranjit
 
Best for b trees
Best for b treesBest for b trees
Best for b treesDineshRaaja
 
BTree, Data Structures
BTree, Data StructuresBTree, Data Structures
BTree, Data StructuresJibrael Jos
 
Haskell is Not For Production and Other Tales
Haskell is Not For Production and Other TalesHaskell is Not For Production and Other Tales
Haskell is Not For Production and Other TalesKatie Ots
 
Concept of hashing
Concept of hashingConcept of hashing
Concept of hashingRafi Dar
 
B trees in Data Structure
B trees in Data StructureB trees in Data Structure
B trees in Data StructureAnuj Modi
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2SHAKOOR AB
 

Viewers also liked (20)

Efficient Immutable Data Structures (Okasaki for Dummies)
Efficient Immutable Data Structures (Okasaki for Dummies)Efficient Immutable Data Structures (Okasaki for Dummies)
Efficient Immutable Data Structures (Okasaki for Dummies)
 
Computer
ComputerComputer
Computer
 
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakProbabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
 
Security in the Real World - JavaOne 2013
Security in the Real World - JavaOne 2013Security in the Real World - JavaOne 2013
Security in the Real World - JavaOne 2013
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Hashing
HashingHashing
Hashing
 
Ds 8
Ds 8Ds 8
Ds 8
 
Chapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organizationChapter 4 record storage and primary file organization
Chapter 4 record storage and primary file organization
 
Hashing
HashingHashing
Hashing
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
Best for b trees
Best for b treesBest for b trees
Best for b trees
 
BTree, Data Structures
BTree, Data StructuresBTree, Data Structures
BTree, Data Structures
 
File organisation
File organisationFile organisation
File organisation
 
Haskell is Not For Production and Other Tales
Haskell is Not For Production and Other TalesHaskell is Not For Production and Other Tales
Haskell is Not For Production and Other Tales
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
Concept of hashing
Concept of hashingConcept of hashing
Concept of hashing
 
b+ tree
b+ treeb+ tree
b+ tree
 
B trees dbms
B trees dbmsB trees dbms
B trees dbms
 
B trees in Data Structure
B trees in Data StructureB trees in Data Structure
B trees in Data Structure
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 

Similar to Faster persistent data structures through hashing and Patricia trees

Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashingchidabdu
 
Structures de données exotiques
Structures de données exotiquesStructures de données exotiques
Structures de données exotiquesSamir Bessalah
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptxAgonySingh
 
K mer index of dna sequence based on hash
K mer index of dna sequence based on hashK mer index of dna sequence based on hash
K mer index of dna sequence based on hashijcsa
 
Ctcompare: Comparing Multiple Code Trees for Similarity
Ctcompare: Comparing Multiple Code Trees for SimilarityCtcompare: Comparing Multiple Code Trees for Similarity
Ctcompare: Comparing Multiple Code Trees for SimilarityDoctorWkt
 
An important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfAn important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfARORACOCKERY2111
 
computer notes - Data Structures - 35
computer notes - Data Structures - 35computer notes - Data Structures - 35
computer notes - Data Structures - 35ecomputernotes
 
Map WorksheetTeamMap methodNodeMapSortedMap.docx
Map WorksheetTeamMap methodNodeMapSortedMap.docxMap WorksheetTeamMap methodNodeMapSortedMap.docx
Map WorksheetTeamMap methodNodeMapSortedMap.docxinfantsuk
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
introduction to trees,graphs,hashing
introduction to trees,graphs,hashingintroduction to trees,graphs,hashing
introduction to trees,graphs,hashingAkhil Prem
 
Paper description of "Reformer"
Paper description of "Reformer"Paper description of "Reformer"
Paper description of "Reformer"GenkiYasumoto
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
similarity1 (6).ppt
similarity1 (6).pptsimilarity1 (6).ppt
similarity1 (6).pptssuserf11a32
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxjainaaru59
 
Class 9: Consistent Hashing
Class 9: Consistent HashingClass 9: Consistent Hashing
Class 9: Consistent HashingDavid Evans
 
Hashing.pptx
Hashing.pptxHashing.pptx
Hashing.pptxkratika64
 
Computer notes - Hashing
Computer notes - HashingComputer notes - Hashing
Computer notes - Hashingecomputernotes
 

Similar to Faster persistent data structures through hashing and Patricia trees (20)

Sienna 9 hashing
Sienna 9 hashingSienna 9 hashing
Sienna 9 hashing
 
I1
I1I1
I1
 
Structures de données exotiques
Structures de données exotiquesStructures de données exotiques
Structures de données exotiques
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
K mer index of dna sequence based on hash
K mer index of dna sequence based on hashK mer index of dna sequence based on hash
K mer index of dna sequence based on hash
 
Programming Hp33s talk v3
Programming Hp33s talk v3Programming Hp33s talk v3
Programming Hp33s talk v3
 
Ctcompare: Comparing Multiple Code Trees for Similarity
Ctcompare: Comparing Multiple Code Trees for SimilarityCtcompare: Comparing Multiple Code Trees for Similarity
Ctcompare: Comparing Multiple Code Trees for Similarity
 
An important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdfAn important part of electrical engineering is PCB design. One impor.pdf
An important part of electrical engineering is PCB design. One impor.pdf
 
computer notes - Data Structures - 35
computer notes - Data Structures - 35computer notes - Data Structures - 35
computer notes - Data Structures - 35
 
Map WorksheetTeamMap methodNodeMapSortedMap.docx
Map WorksheetTeamMap methodNodeMapSortedMap.docxMap WorksheetTeamMap methodNodeMapSortedMap.docx
Map WorksheetTeamMap methodNodeMapSortedMap.docx
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
introduction to trees,graphs,hashing
introduction to trees,graphs,hashingintroduction to trees,graphs,hashing
introduction to trees,graphs,hashing
 
Paper description of "Reformer"
Paper description of "Reformer"Paper description of "Reformer"
Paper description of "Reformer"
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
similarity1 (6).ppt
similarity1 (6).pptsimilarity1 (6).ppt
similarity1 (6).ppt
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Class 9: Consistent Hashing
Class 9: Consistent HashingClass 9: Consistent Hashing
Class 9: Consistent Hashing
 
Hashing.pptx
Hashing.pptxHashing.pptx
Hashing.pptx
 
Computer notes - Hashing
Computer notes - HashingComputer notes - Hashing
Computer notes - Hashing
 

Recently uploaded

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Recently uploaded (20)

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Faster persistent data structures through hashing and Patricia trees

  • 1. Faster persistent data structures through hashing Johan Tibell johan.tibell@gmail.com 2011-02-15
  • 2. Motivating problem: Twitter data analys “I’m computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes.” We need a data structure that is fast when used with string keys, and doesn’t use too much memory.
  • 3. Persistent maps in Haskell Data.Map is the most commonly used map type. It’s implemented using size balanced trees. Keys can be of any type, as long as values of the type can be ordered.
  • 4. Real world performance of Data.Map Good in theory: no more than O(log n) comparisons. Not great in practice: up to O(log n) comparisons! Many common types are expensive to compare e.g String, ByteString, and Text. Given a string of length k, we need O(k ∗ log n) comparisons to look up an entry.
  • 5. Hash tables Hash tables perform well with string keys: O(k) amortized lookup time for strings of length k. However, we want persistent maps, not mutable hash tables.
  • 6. Milan Straka’s idea: Patricia trees as sparse arrays We can use hashing without using hash tables! A Patricia tree implements a persistent, sparse array. Patricia trees are as twice fast as size balanced trees, but only work with Int keys. Use hashing to derive an Int from an arbitrary key.
  • 7. Implementation tricks Patricia trees implement a sparse, persistent array of size 232 (or 264 ). Hashing using this many buckets makes collisions rare: for 224 entries we expect about 32,000 single collisions. Linked lists are a perfectly adequate collision resolution strategy.
  • 8. First attempt at an implementation −− D e f i n e d i n t h e c o n t a i n e r s p a c k a g e . data IntMap a = Nil | Tip {−# UNPACK #−} ! Key a | Bin {−# UNPACK #−} ! P r e f i x {−# UNPACK #−} ! Mask ! ( IntMap a ) ! ( IntMap a ) type P r e f i x = I n t type Mask = Int type Key = Int newtype HashMap k v = HashMap ( IntMap [ ( k , v ) ] )
  • 9. A more memory efficient implementation data HashMap k v = Nil | Tip {−# UNPACK #−} ! Hash {−# UNPACK #−} ! ( FL . F u l l L i s t k v ) | Bin {−# UNPACK #−} ! Prefix {−# UNPACK #−} ! Mask ! ( HashMap k v ) ! ( HashMap k v ) type P r e f i x = I n t type Mask = Int type Hash = Int data F u l l L i s t k v = FL ! k ! v ! ( L i s t k v ) data L i s t k v = N i l | Cons ! k ! v ! ( L i s t k v )
  • 10. Reducing the memory footprint List k v uses 2 fewer words per key/value pair than [( k, v )] . FullList can be unpacked into the Tip constructor as it’s a product type, saving 2 more words. Always unpack word sized types, like Int, unless you really need them to be lazy.
  • 11. Benchmarks Keys: 212 random 8-byte ByteStrings Map HashMap insert 1.00 0.43 lookup 1.00 0.28 delete performs like insert.
  • 12. Can we do better? We still need to perform O(min(W , log n)) Int comparisons, where W is the number of bits in a word. The memory overhead per key/value pair is still quite high.
  • 13. Borrowing from our neighbours Clojure uses a hash-array mapped trie (HAMT) data structure to implement persistent maps. Described in the paper “Ideal Hash Trees” by Bagwell (2001). Originally a mutable data structure implemented in C++. Clojure’s persistent version was created by Rich Hickey.
  • 14. Hash-array mapped tries in Clojure Shallow tree with high branching factor. Each node, except the leaf nodes, contains an array of up to 32 elements. 5 bits of the hash are used to index the array at each level. A clever trick, using bit population count, is used to represent spare arrays.
  • 15. The Haskell definition of a HAMT data HashMap k v = Empty | BitmapIndexed {−# UNPACK #−} ! Bitmap {−# UNPACK #−} ! ( Array ( HashMap k v ) ) | L e a f {−# UNPACK #−} ! ( L e a f k v ) | F u l l {−# UNPACK #−} ! ( Array ( HashMap k v ) ) | C o l l i s i o n {−# UNPACK #−} ! Hash {−# UNPACK #−} ! ( Array ( L e a f k v ) ) type Bitmap = Word data Array a = Array ! ( Array# a ) {−# UNPACK #−} ! I n t
  • 16. Making it fast The initial implementation by Edward Z. Yang: correct but didn’t perform well. Improved performance by replacing use of Data.Vector by a specialized array type, paying careful attention to strictness, and using GHC’s new INLINABLE pragma.
  • 17. Benchmarks Keys: 212 random 8-byte ByteStrings Map HashMap HashMap (HAMT) insert 1.00 0.43 1.21 lookup 1.00 0.28 0.21
  • 18. Where is all the time spent? Most time in insert is spent copying small arrays. Array copying is implemented using indexArray# and writeArray#, which results in poor performance. When cloning an array, we are force to first fill the new array with dummy elements, followed by copying over the elements from the old array.
  • 19. A better array copy Daniel Peebles have implemented a set of new primops for copying array in GHC. The first implementation showed a 20% performance improvement for insert. Copying arrays is still slow so there might be room for big improvements still.
  • 20. Other possible performance improvements Even faster array copying using SSE instructions, inline memory allocation, and CMM inlining. Use dedicated bit population count instruction on architectures where it’s available. Clojure uses a clever trick to unpack keys and values directly into the arrays; keys are stored at even positions and values at odd positions. GHC 7.2 will use 1 less word for Array.
  • 21. Optimize common cases In many cases maps are created in one go from a sequence of key/value pairs. We can optimize for this case by repeatedly mutating the HAMT and freeze it when we’re done. Keys: 212 random 8-byte ByteStrings fromList/pure 1.00 fromList/mutating 0.50
  • 22. Abstracting over collection types We will soon have two map types worth using (one ordered and one unordered). We want to write function that work with both types, without a O(n) conversion cost. Use type families to abstract over different concrete implementation.
  • 23. Summary Hashing allows us to create more efficient data structures. There are new interesting data structures out there that have, or could have, an efficient persistent implementation.