DHT

•

3 gostaram•472 visualizações

davidkftam

Presentations slides for DBISP2P Workshop 2003.

Tecnologia Negócios

Introduction:
Publish/Subscribe Systems

Push model (a.k.a. event-notification)
subscribe publish match
Applications:
stock-market, auction, eBay, news
2 types
topic-based: ≈ Usenet newsgroup topics
content-based: attribute-value pairs
e.g. (attr1 = value1) ∧ (attr2 = value2) ∧ (attr3 > value3)

The Problem:
Content-Based Publish/Subscribe

Traditionally centralized
scalability?
More recently: distributed
e.g. SIENA
small set of brokers
not P2P
How about fully distributed?
exploit P2P
1000s of brokers

Proposed Solution:
Use Distributed Hash Tables
DHTs
hash buckets are mapped to P2P nodes
Why DHTs?
scalable, fault-tolerance, load-balancing
Challenges
distributed but co-ordinated and light-weight:
subscribing
publishing
matching is difficult

Basic Scheme
A matching publisher & subscriber must come up with
the same hash keys based on the content

Distributed Hash Table
buckets

distributed publish/subscribe system

Basic Scheme
A matching publisher & subscriber must come up with
the same hash keys based on the content

Distributed Hash Table
buckets

home node

subscriber

subscription

Naïve Approach
Subscription
Publication
Attr1: value1 Attr1: value1
Attr2: value2 Attr2:
Attr3: value3 Attr3:
Key
Hash Key Hash
Attr4: value4 Attr4: value4
Function Function
Attr5: value5 Attr5:
Attr6: value6 Attr6: value6
Attr7: value7 Attr7:

Publisher must produce keys for all possible attribute
combinations:
2N keys for each publication
Bottleneck at hash bucket node
subscribing, publishing, matching

Our Approach
Domain Schema
N
eliminates 2 problem
similar to RDBMS schema
set of attribute names
set of value constraints
set of indices
create hash keys for indices only
choose group of attributes that are common
but combination of values rare
well-known

Hash Key Composition
Indices: {attr1}, {attr1, attr4}, {attr6, attr7}
Publication Subscription
Key1 Key1
Hash Hash
Attr1: value1 Attr1: value1
Function Function

Attr2: value2 Attr2:
Attr3: value3 Attr3:
Key2 Key2
Hash Hash
Function Function
Attr4: value4 Attr4: value4
Attr5: value5 Attr5:
Attr6: value6 Attr6: value6
Key3
Hash
Attr7: value7 Attr7:
Function

Possible false-positives
because partial matching
filtered by system
Possible duplicate notifications
because multiple subscription keys

Our Approach (cont’d)
Multicast Trees
eliminates bottleneck at hash bucket nodes
distributed subscribing, publishing, matching
Home node
(hash bucket node)

Existing subscribers

New subscriber

Handling Range Queries
Hash function ruins locality
1 2 3 4
input
hash( )
output
40 90 120 170

Divide range of values into intervals
hash on interval labels
e.g. RAM attribute
0 128 256 512 ∞

Label: A B C D
For RAM > 384, submit hash keys:
RAM = C
RAM = D
intervals can be sized according to probability distribution

Implementation & Evaluation
Main Goal: scalability
Metric: message traffic
Built using:
Pastry DHT
Scribe multicast trees
Workload Generator: uniformly random distributions

Event Scalability: 1000 nodes

Need well-designed schema with low false-positives

Range Query Scalability: 1000 nodes

Multicast tree benefits
e.g. 1 range vs 0 range, at 40000 subs, pubs
Expected 2.33 × msgs, but got 1.6
subscription costs decrease

Range Query Scalability: 40000 subs, pubs

Conclusion
Method: DHT + domain schema
Scales to 1000s of nodes
Multicast trees are important
Interesting point in design space
some restrictions on expression of content
must adhere to domain schema

Future Work
range query techniques
examine multicast tree in detail
locality-sensitive workload distributions
real-world workloads
detailed modelling of P2P network
fault-tolerance

Mais conteúdo relacionado

Semelhante a DHT

Ts project Hash based inventory system

Hbase jdd

Hash Functions FTW

Ch_07 (1).pptx

Data Analytics using R.pptx

CheatMe

Realtime Sentiment Analysis Application Using Hadoop and HBase

Hbase.pptx

HBase.pptx

Parquet overview

HBase lon meetup

Ted Malaska (Cloudera), Jean-Marc Spaggiari (Cloudera), Zhan Zhang (Hortonworks) The integration of Spark and HBase is becoming more popular in online data analytics. In this session, we briefly walk through the current offering of the HBase-Spark module in HBase at an abstract level and for RDD and DataFrames (digging into some real-world implementations and code examples), and then discuss future work.

Apache Spark on Apache HBase: Current and Future

HBaseCon

HBase.pptx

vijayapraba1

Apache Cassandra @Geneva JUG 2013.02.26

Benoit Perroud

unit-1-dsa-hashing-2022_compressed-1-converted.pptx

BabaShaikh3

Doug Cutting on the State of the Hadoop Ecosystem

Cloudera, Inc.

Hashing

grahamwell

Hashing

grahamwell

Hbase Quick Review Guide for Interviews

Ravindra kumar

Many Organizations are currently processing various types of data and in different formats. Most often this data will be in free form, As the consumers of this data growing it’s imperative that this free-flowing data needs to adhere to a schema. It will help data consumers to have an expectation of about the type of data they are getting and also they will be able to avoid immediate impact if the upstream source changes its format. Having a uniform schema representation also gives the Data Pipeline a really easy way to integrate and support various systems that use different data formats. SchemaRegistry is a central repository for storing, evolving schemas. It provides an API & tooling to help developers and users to register a schema and consume that schema without having any impact if the schema changed. Users can tag different schemas and versions, register for notifications of schema changes with versions etc. In this talk, we will go through the need for a schema registry and schema evolution and showcase the integration with Apache NiFi, Apache Kafka, Apache Storm.

Schema Registry - Set you Data Free

DataWorks Summit/Hadoop Summit

ElephantDB

nathanmarz

Semelhante a DHT (20)

Ts project Hash based inventory system

Hbase jdd

Hash Functions FTW

Ch_07 (1).pptx

Data Analytics using R.pptx

Realtime Sentiment Analysis Application Using Hadoop and HBase

Hbase.pptx

HBase.pptx

Parquet overview

HBase lon meetup

Apache Spark on Apache HBase: Current and Future

HBase.pptx

Apache Cassandra @Geneva JUG 2013.02.26

unit-1-dsa-hashing-2022_compressed-1-converted.pptx

Doug Cutting on the State of the Hadoop Ecosystem

Hashing

Hbase Quick Review Guide for Interviews

Schema Registry - Set you Data Free

ElephantDB

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

panagenda

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

apidays

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Architecting Cloud Native Applications

WSO2

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

apidays

ICT role in 21st century education and its challenges

rafiqahmad00786416

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

GenAI Risks & Security Meetup 01052024.pdf

lior mazor

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

DHT

1. Building Content-Based Publish/Subscribe Systems with Distributed Hash Tables David Tam, Reza Azimi, Hans-Arno Jacobsen University of Toronto, Canada September 8, 2003

2. Introduction: Publish/Subscribe Systems Push model (a.k.a. event-notification) subscribe publish match Applications: stock-market, auction, eBay, news 2 types topic-based: ≈ Usenet newsgroup topics content-based: attribute-value pairs e.g. (attr1 = value1) ∧ (attr2 = value2) ∧ (attr3 > value3)

3. The Problem: Content-Based Publish/Subscribe Traditionally centralized scalability? More recently: distributed e.g. SIENA small set of brokers not P2P How about fully distributed? exploit P2P 1000s of brokers

4. Proposed Solution: Use Distributed Hash Tables DHTs hash buckets are mapped to P2P nodes Why DHTs? scalable, fault-tolerance, load-balancing Challenges distributed but co-ordinated and light-weight: subscribing publishing matching is difficult

5. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets distributed publish/subscribe system

6. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscriber subscription

7. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription subscriber publisher publication

8. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription publication subscriber publisher

9. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription subscriber publisher publication

10. Naïve Approach Subscription Publication Attr1: value1 Attr1: value1 Attr2: value2 Attr2: Attr3: value3 Attr3: Key Hash Key Hash Attr4: value4 Attr4: value4 Function Function Attr5: value5 Attr5: Attr6: value6 Attr6: value6 Attr7: value7 Attr7: Publisher must produce keys for all possible attribute combinations: 2N keys for each publication Bottleneck at hash bucket node subscribing, publishing, matching

11. Our Approach Domain Schema N eliminates 2 problem similar to RDBMS schema set of attribute names set of value constraints set of indices create hash keys for indices only choose group of attributes that are common but combination of values rare well-known

12. Hash Key Composition Indices: {attr1}, {attr1, attr4}, {attr6, attr7} Publication Subscription Key1 Key1 Hash Hash Attr1: value1 Attr1: value1 Function Function Attr2: value2 Attr2: Attr3: value3 Attr3: Key2 Key2 Hash Hash Function Function Attr4: value4 Attr4: value4 Attr5: value5 Attr5: Attr6: value6 Attr6: value6 Key3 Hash Attr7: value7 Attr7: Function Possible false-positives because partial matching filtered by system Possible duplicate notifications because multiple subscription keys

13. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers New subscriber

14. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers Non-subscribers New subscriber

15. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers New subscribers

16. Handling Range Queries Hash function ruins locality 1 2 3 4 input hash( ) output 40 90 120 170 Divide range of values into intervals hash on interval labels e.g. RAM attribute 0 128 256 512 ∞ Label: A B C D For RAM > 384, submit hash keys: RAM = C RAM = D intervals can be sized according to probability distribution

17. Implementation & Evaluation Main Goal: scalability Metric: message traffic Built using: Pastry DHT Scribe multicast trees Workload Generator: uniformly random distributions

18. Event Scalability: 1000 nodes Need well-designed schema with low false-positives

19. Node Scalability: 40000 subs, pubs

20. Range Query Scalability: 1000 nodes Multicast tree benefits e.g. 1 range vs 0 range, at 40000 subs, pubs Expected 2.33 × msgs, but got 1.6 subscription costs decrease

21. Range Query Scalability: 40000 subs, pubs

22. Conclusion Method: DHT + domain schema Scales to 1000s of nodes Multicast trees are important Interesting point in design space some restrictions on expression of content must adhere to domain schema Future Work range query techniques examine multicast tree in detail locality-sensitive workload distributions real-world workloads detailed modelling of P2P network fault-tolerance

DHT

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a DHT

Semelhante a DHT (20)

Último

Último (20)

DHT