SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Hash Functions                                                                                        FTW*
Fast Hashing, Bloom Filters & Hash-Oriented Storage



                                                            Sunny Gleason


* For   the win (see urbandictionary FTW[1]); this expression has nothing to do with hash functions
What’s in this Presentation

• Hash Function Survey
• Hash Performance
• Bloom Filters
• HashFile : Hash Storage
Hash Functions
int getIntHash(byte[] data); // 32-bit
long getLongHash(byte[] data) // 64-bit

int v1 = hash(“foo”); int v2 = hash(“goo”);

int hash(byte[] value) { // a simple hash
    int h = 0;
    for (byte b: value) { h = (h<<5) ^ (h>>27) ^ b; }
    return h % PRIME;
}
Hash Functions

• Goal : v1 has many bit differences from v2
• Desirable Properties:
 • Uniform Distribution - no collisions
 • Very Fast Computation
Hash Applications
Goal: O(1) access
   • Hash Table
   • Hash Set
   • Bloom Filter
Popular Hash Functions
• FNV Hash
• DJB Hash
• Jenkins Hash
• Murmur2
• New (Promising?): CrapWow
• Awesome & Slow: SHA-1, MD5 etc.
Evaluating Hash Functions
• Hash Function “Zoo”
• Quality of: CRC32 DJB    Jenkins FNV
  Murmur2 SHA1
• Performance:            !"#$%&'()*(+",-'%./%0'/%1',23$%
  (MM ops/s)       '#"
                   '!"
                   &#"
                   &!"
                   %#"                                      *+,-.,/"
                   %!"                                      012312%"
                   $#"                                      456$"
                   $!"
                    #"
                    !"
                           %#("      ('"        )"
A Strawman “Set”
• N keys, K bytes per key
• Allocate array of size K * N bytes
• Utilize array storage as:
 • a heap or tree: O(lg N) insert/delete/
    remove
  • a hash: O(1) insert/delete/remove
• What if we don’t have room for K*N
  bytes?
Bloom Filter
• Key Point: give up on storing all the keys
• Store r bits per key instead of K bytes
• Allocate bit vector of size: M = r * N,
  where N is expected number of entries
• Use multiple hash functions of key to
  determine which bits to set
• Premise: if hash functions are well-
  distributed, few collisions, high accuracy
Bloom Filter
Tuning Bloom Filters
Let r = M bits / N keys (r: num bits/key)
Let k = 0.7 * r      (k: num hashes to use)
Let p = 0.6185 ** r (p: probability of false positives)

Working backwards, we can use desired false
positive rate p to tune the data structure space
consumption:

r = 8, p = 2.1e-2      r = 16, p = 4.5e-4
r = 24, p = 9.8e-6     r = 32, p = 2.1e-7
r = 40, p = 4.5e-9     r = 48, p = 9.6e-11
Bloom Filter Performance
  100MM entries, 8bits/key :    833k ops/s
  100MM entries, 32bits/key :   256k ops/s
  1BN entries, 8bits/key :      714k ops/s
  1BN entries, 32bits/key :     185k ops/s

  Hypothesis : difference between 100MM and
  1BN is due to locality of memory access in
  smaller bit vector
Hash-Oriented Storage
•   HashFile : 64-bit clone of djb’s constant db
    “CDB”

•   Plain ol’ Key/Value storage:
     add(byte[] k, byte[] v), byte[] lookup(byte[] k)

•   Constant aka “Immutable” Data Store
     create(), add(k, v) ... , build() ... before lookup(k)

•   Use properties of hash table to achieve
    O(1) disk seeks per lookup
HashFile Structure
• Header (fixed width): table pointers,
  contains offests of hash tables and count of
  elements per table
• Body (variable width): contains
  concatenation of all keys and values (with
  data lengths)
• Footer (fixed width): hash “tables”
  containing long hash values of keys
  alongside long offsets into body
HashFile Diagram
    HEADER                    BODY                      FOOTER
p1s3p2s4p3s2p4s1   k1v1k2v2k3v3k4v4k5v5k6v6k7v7   hk7o7hk3o3hk4o4hk1o1




       •   Create: initialize empty header, start appending
           keys/values while recording offsets and hash values
           of keys

       •   Build: take list of hash values and offsets and turn
           them into hash tables, backfill header with values

       •   Lookup: compute hash(key), compute offset into
           table (hash modulo size of table), use table to find
           offset into body, return the value from body
HashFile Performance
• Spec: ≤ 2 disk seeks per lookup
• Number of seeks independent of number
  of entries
• X25E SSD: 1BN 8-byte keys, values (41GB):
  650μs lookup w/ cold cache, up to 700x
  faster as filesystem cache warms, 0.9μs
  when in-memory
• With 100MM entries (4GB), cold cache is
  ~600μs (from locality), 0.6μs warm
Conclusions

• Be aware of different Hash Functions and
  their collision / performance tradeoffs
• Bloom Filters are extremely useful for fast,
  large-scale set membership
• HashFile provides excellent performance in
  cases where a static K/V store suffices
Future Work
• Implement cWow hash in Java
• Extend HashFile with configurable hash,
  pointer, and key/value lengths to conserve
  space (reduce 24 bytes-per-KV overhead)
• Implement a read-write (non-constant)
  version of HashFile
• Bloom Filter that spills to SSD
Thank You!
...Any questions? :)
References
• GitHub Project: g414-hash (hash
  function, bloom filter, HashFile
  implementations)
• Wikipedia: Hash Function, Bloom Filter
• Non-Cryptographic Hash Function Zoo
• DJB CDB, sg-cdb (java implementation)

Mais conteúdo relacionado

Mais procurados

Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisRizky Abdilah
 
Weather of the Century: Visualization
Weather of the Century: VisualizationWeather of the Century: Visualization
Weather of the Century: VisualizationMongoDB
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011Chris Westin
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
A survey on Heap Exploitation
A survey on Heap Exploitation A survey on Heap Exploitation
A survey on Heap Exploitation Alireza Karimi
 
Cloud flare jgc bigo meetup rolling hashes
Cloud flare jgc   bigo meetup rolling hashesCloud flare jgc   bigo meetup rolling hashes
Cloud flare jgc bigo meetup rolling hashesCloudflare
 
The Ring programming language version 1.10 book - Part 45 of 212
The Ring programming language version 1.10 book - Part 45 of 212The Ring programming language version 1.10 book - Part 45 of 212
The Ring programming language version 1.10 book - Part 45 of 212Mahmoud Samir Fayed
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programmingchriseidhof
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Altinity Ltd
 
Analysis of Algorithms-Heapsort
Analysis of Algorithms-HeapsortAnalysis of Algorithms-Heapsort
Analysis of Algorithms-HeapsortReetesh Gupta
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドTatsuya Sasaki
 
Python Development (MongoSF)
Python Development (MongoSF)Python Development (MongoSF)
Python Development (MongoSF)Mike Dirolf
 
The Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationThe Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationMongoDB
 
The Weather of the Century
The Weather of the CenturyThe Weather of the Century
The Weather of the CenturyMongoDB
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOAltinity Ltd
 

Mais procurados (20)

Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Haskell
HaskellHaskell
Haskell
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Heap sort
Heap sort Heap sort
Heap sort
 
Weather of the Century: Visualization
Weather of the Century: VisualizationWeather of the Century: Visualization
Weather of the Century: Visualization
 
MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
A survey on Heap Exploitation
A survey on Heap Exploitation A survey on Heap Exploitation
A survey on Heap Exploitation
 
Cloud flare jgc bigo meetup rolling hashes
Cloud flare jgc   bigo meetup rolling hashesCloud flare jgc   bigo meetup rolling hashes
Cloud flare jgc bigo meetup rolling hashes
 
The Ring programming language version 1.10 book - Part 45 of 212
The Ring programming language version 1.10 book - Part 45 of 212The Ring programming language version 1.10 book - Part 45 of 212
The Ring programming language version 1.10 book - Part 45 of 212
 
Functional Programming
Functional ProgrammingFunctional Programming
Functional Programming
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
 
Analysis of Algorithms-Heapsort
Analysis of Algorithms-HeapsortAnalysis of Algorithms-Heapsort
Analysis of Algorithms-Heapsort
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
How does one go from binary data to HDF files efficiently?
How does one go from binary data to HDF files efficiently?How does one go from binary data to HDF files efficiently?
How does one go from binary data to HDF files efficiently?
 
Python Development (MongoSF)
Python Development (MongoSF)Python Development (MongoSF)
Python Development (MongoSF)
 
The Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: VisualizationThe Weather of the Century Part 3: Visualization
The Weather of the Century Part 3: Visualization
 
The Weather of the Century
The Weather of the CenturyThe Weather of the Century
The Weather of the Century
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 

Destaque

Hokusai - Sketching streams in real time
Hokusai - Sketching streams in real timeHokusai - Sketching streams in real time
Hokusai - Sketching streams in real timeSergiy Matusevych
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQLsunnygleason
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Destaque (6)

Ada-Sketch and friends
Ada-Sketch and friendsAda-Sketch and friends
Ada-Sketch and friends
 
InnoDB Magic
InnoDB MagicInnoDB Magic
InnoDB Magic
 
Hokusai - Sketching streams in real time
Hokusai - Sketching streams in real timeHokusai - Sketching streams in real time
Hokusai - Sketching streams in real time
 
Accelerating NoSQL
Accelerating NoSQLAccelerating NoSQL
Accelerating NoSQL
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Semelhante a Hash Functions FTW

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hash table
Hash tableHash table
Hash tableVu Tran
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011Mandi Walls
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Block ciphers &amp; public key cryptography
Block ciphers &amp; public key cryptographyBlock ciphers &amp; public key cryptography
Block ciphers &amp; public key cryptographyRAMPRAKASHT1
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Hashing In Data Structure Download PPT i
Hashing In Data Structure Download PPT iHashing In Data Structure Download PPT i
Hashing In Data Structure Download PPT icajiwol341
 
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingWebinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingMongoDB
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Yandex
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovNikolay Samokhvalov
 
ImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_DoinImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_DoinJonny Doin
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go ProgrammingLin Yo-An
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 

Semelhante a Hash Functions FTW (20)

Deep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDBDeep Dive: Amazon DynamoDB
Deep Dive: Amazon DynamoDB
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
 
Hash table
Hash tableHash table
Hash table
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011R for Pirates. ESCCONF October 27, 2011
R for Pirates. ESCCONF October 27, 2011
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Block ciphers &amp; public key cryptography
Block ciphers &amp; public key cryptographyBlock ciphers &amp; public key cryptography
Block ciphers &amp; public key cryptography
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Hashing In Data Structure Download PPT i
Hashing In Data Structure Download PPT iHashing In Data Structure Download PPT i
Hashing In Data Structure Download PPT i
 
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based ShardingWebinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
ImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_DoinImplementingCryptoSecurityARMCortex_Doin
ImplementingCryptoSecurityARMCortex_Doin
 
Modern C++
Modern C++Modern C++
Modern C++
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 

Último

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Hash Functions FTW

  • 1. Hash Functions FTW* Fast Hashing, Bloom Filters & Hash-Oriented Storage Sunny Gleason * For the win (see urbandictionary FTW[1]); this expression has nothing to do with hash functions
  • 2. What’s in this Presentation • Hash Function Survey • Hash Performance • Bloom Filters • HashFile : Hash Storage
  • 3. Hash Functions int getIntHash(byte[] data); // 32-bit long getLongHash(byte[] data) // 64-bit int v1 = hash(“foo”); int v2 = hash(“goo”); int hash(byte[] value) { // a simple hash int h = 0; for (byte b: value) { h = (h<<5) ^ (h>>27) ^ b; } return h % PRIME; }
  • 4. Hash Functions • Goal : v1 has many bit differences from v2 • Desirable Properties: • Uniform Distribution - no collisions • Very Fast Computation
  • 5. Hash Applications Goal: O(1) access • Hash Table • Hash Set • Bloom Filter
  • 6. Popular Hash Functions • FNV Hash • DJB Hash • Jenkins Hash • Murmur2 • New (Promising?): CrapWow • Awesome & Slow: SHA-1, MD5 etc.
  • 7. Evaluating Hash Functions • Hash Function “Zoo” • Quality of: CRC32 DJB Jenkins FNV Murmur2 SHA1 • Performance: !"#$%&'()*(+",-'%./%0'/%1',23$% (MM ops/s) '#" '!" &#" &!" %#" *+,-.,/" %!" 012312%" $#" 456$" $!" #" !" %#(" ('" )"
  • 8. A Strawman “Set” • N keys, K bytes per key • Allocate array of size K * N bytes • Utilize array storage as: • a heap or tree: O(lg N) insert/delete/ remove • a hash: O(1) insert/delete/remove • What if we don’t have room for K*N bytes?
  • 9. Bloom Filter • Key Point: give up on storing all the keys • Store r bits per key instead of K bytes • Allocate bit vector of size: M = r * N, where N is expected number of entries • Use multiple hash functions of key to determine which bits to set • Premise: if hash functions are well- distributed, few collisions, high accuracy
  • 11. Tuning Bloom Filters Let r = M bits / N keys (r: num bits/key) Let k = 0.7 * r (k: num hashes to use) Let p = 0.6185 ** r (p: probability of false positives) Working backwards, we can use desired false positive rate p to tune the data structure space consumption: r = 8, p = 2.1e-2 r = 16, p = 4.5e-4 r = 24, p = 9.8e-6 r = 32, p = 2.1e-7 r = 40, p = 4.5e-9 r = 48, p = 9.6e-11
  • 12. Bloom Filter Performance 100MM entries, 8bits/key : 833k ops/s 100MM entries, 32bits/key : 256k ops/s 1BN entries, 8bits/key : 714k ops/s 1BN entries, 32bits/key : 185k ops/s Hypothesis : difference between 100MM and 1BN is due to locality of memory access in smaller bit vector
  • 13. Hash-Oriented Storage • HashFile : 64-bit clone of djb’s constant db “CDB” • Plain ol’ Key/Value storage: add(byte[] k, byte[] v), byte[] lookup(byte[] k) • Constant aka “Immutable” Data Store create(), add(k, v) ... , build() ... before lookup(k) • Use properties of hash table to achieve O(1) disk seeks per lookup
  • 14. HashFile Structure • Header (fixed width): table pointers, contains offests of hash tables and count of elements per table • Body (variable width): contains concatenation of all keys and values (with data lengths) • Footer (fixed width): hash “tables” containing long hash values of keys alongside long offsets into body
  • 15. HashFile Diagram HEADER BODY FOOTER p1s3p2s4p3s2p4s1 k1v1k2v2k3v3k4v4k5v5k6v6k7v7 hk7o7hk3o3hk4o4hk1o1 • Create: initialize empty header, start appending keys/values while recording offsets and hash values of keys • Build: take list of hash values and offsets and turn them into hash tables, backfill header with values • Lookup: compute hash(key), compute offset into table (hash modulo size of table), use table to find offset into body, return the value from body
  • 16. HashFile Performance • Spec: ≤ 2 disk seeks per lookup • Number of seeks independent of number of entries • X25E SSD: 1BN 8-byte keys, values (41GB): 650μs lookup w/ cold cache, up to 700x faster as filesystem cache warms, 0.9μs when in-memory • With 100MM entries (4GB), cold cache is ~600μs (from locality), 0.6μs warm
  • 17. Conclusions • Be aware of different Hash Functions and their collision / performance tradeoffs • Bloom Filters are extremely useful for fast, large-scale set membership • HashFile provides excellent performance in cases where a static K/V store suffices
  • 18. Future Work • Implement cWow hash in Java • Extend HashFile with configurable hash, pointer, and key/value lengths to conserve space (reduce 24 bytes-per-KV overhead) • Implement a read-write (non-constant) version of HashFile • Bloom Filter that spills to SSD
  • 20. References • GitHub Project: g414-hash (hash function, bloom filter, HashFile implementations) • Wikipedia: Hash Function, Bloom Filter • Non-Cryptographic Hash Function Zoo • DJB CDB, sg-cdb (java implementation)