SlideShare uma empresa Scribd logo
1 de 38
Scaling Cloud-Scale Translytics Workloads
with Omid and Phoenix
Ohad Shacham
Yahoo Research
Edward Bortnikov
Yahoo Research
RESEARCH
Yonatan Gottesman
Yahoo Research
Agenda
2
Translytics = Transactions + Analytics
Cloud-Scale Use Cases
Doing it in the HBase-Omid-Phoenix World
Omid and Phoenix Deep Dive
Real-Time Data Processing on the Rise
3
The Applications Perspective
4
Event-to-action/insight latency becomes king
Stream processing, asynchronous execution
Data consistency becomes nontrivial
Complex processing patterns (online reporting to AI)
Data integration across multiple feeds and schemas
OLTP World
Analytics World
Translytics Platforms Vision
5
The best of all worlds: OLTP and Analytics all-in-one
Enable complex, consistent, real-time data processing
Simple API’s with strong guarantees
Built to scale on top of NoSQL data platforms
OLTP Coming to NoSQL
6
Traditional NoSQL guarantees row-level atomicity
Translytics applications often bundle reads and writes
Asynchronous design patterns drive concurrency
Without ACID guarantees, chaos rules!
ACID transactions
Multiple data accesses in a single logical operation
Atomic
“All or nothing” – no partial effect observable
Consistent
The DB transitions from one valid state to another
Isolated
Appear to execute in isolation
Durable
Committed data cannot disappear
Use Case: Audience Targeting for Ads
8
Advertisers optimize campaigns to reach the right user audiences
Ad-tech platforms build and sell audience segments (identity sets)
Segmentation is based on user features (demographics, behavior, …)
Algorithms vary from rule-based heuristics to AI classification
Timeliness directly affects revenue
Real-Time Targeting Platform
9
Storm for Compute
Audience segmentation algorithms embedded in bolts
HBase for Storage
User Profiles (U), Segments (S), and U ↔ S relationships
Kafka for Messaging
Scale: trillions of touchpoints/month
Challenge: Keeping the Data Consistent
10
Shared data is accessed in parallel by multiple bolts
Access patterns are complex
User profile update: read+compute+write
User↔Segment mapping update: two writes
Segment query (scan): read multiple rows
HBase read/write API does not provide atomic guarantees
Omid Comes to Help
11
Transaction Processing layer for Apache HBase
Apache Incubation (started 2015, graduation planned 2019)
Easy-to-use API (good old NoSQL)
Popular consistency model (snapshot isolation)
Battle tested (in prod @Yahoo since 2015, new customers onboarding)
Omid Programming
12
TransactionManager tm = HBaseTransactionManager.newInstance();
TTable txTable = new TTable("MY_TX_TABLE”);
Transaction tx = tm.begin(); // Control path
Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1"));
row1.add(family, qualifier, Bytes.toBytes("val1"));
txTable.put(tx, row1); // Data path
Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2"));
row2.add(family, qualifier, Bytes.toBytes("val2"));
txTable.put(tx, row2); // Data path
tm.commit(tx); // Control path
SQL Coming to NoSQL
13
NoSQL API is simple but crude and non-standardized
Hard to manage complex schemas (low-level data abstraction)
Hard to implement analytics queries (low-level access primitives)
Hard to optimize for speed (server-side programming required)
Hard to integrate with relational data sources
Use Case: Real-Time Ad Inventory Ingestion
14
Advertisers deploy campaign content & metadata in the marketplace
SQL-speaking external client
Complex schema (many campaign types and optimization goals)
High scalability (growing market)
Campaign operations run multidimensional inventory analytics
Aggregate queries by advertiser, product, time, etc.
ML pipeline learns recommendation models for new campaigns
NoSQL-style access to data
Phoenix comes to Help
15
OLTP and Real-Time Analytics for HBase
Query optimizer transforms SQL to native HBase API calls
Standard SQL interface with JDBC API’s
High level data abstractions (e.g., secondary indexes)
High performance (leverages server-side coprocessors)
Phoenix/Omid Integration
16
Phoenix is designed for public-cloud scale (>10K query servers)
Omid is extremely scalable (>600k tps), low-latency (<5ms), and HA
New Omid release (1.0.1) - SQL features, improved performance
Supports secondary indexes, extended Snapshot Isolation, downstream
filters
Phoenix releases 4.15 and 5.1 include Omid as Phoenix Tps
Phoenix refactored to support multiple TP backends (Omid is default)
Phoenix/Omid Integration performance
17
1M initial inserts, 1Kb each row
Omid in Sync post commit mode
Why do we care?
18
SQL transactions
SELECT * FROM my_table; -- This will start a transaction
UPSERT INTO my_table VALUES (1,'A’);
SELECT count(*) FROM my_table WHERE k=1;
DELETE FROM my_other_table WHERE k=2;
!commit -- Other transactions will now see your updates and you will see theirs
Why do we care?
1919
Non-transactional secondary index update might breaks consistency
(k1, [v1,v2,v3])
Table Index
(v1, k1)
Write (k1, [v1,v2,v3])
Why do we care?
20
Updating the secondary index fails
Out of handlers
Many jiras discuss this issue
20
(k1, [v1,v2,v3])
Table Index
Write (k1, [v1,v2,v3])
Transactions and snapshot isolation
Aborts only on write-write conflicts
Read
point
Write
point
begin commitread(x) write(y) write(x) read(y)
Omid architecture
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
Conflict
Detection
22
Transaction
Manager
Results/Timestamp
Omid low latency (LL) architecture
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
23
Transaction
Manager
Results/Timestamp
Client
Begin
Data Data Data
Commit
Table
t1
Write (k1, v1, t1) Write (k2, v2, t1)
Read (k’, last committed t’ < t1)
(k1, v1, t1) (k2, v2, t1)
Execution example
tr = t1
Transaction
Manager
24
Client
Commit: t1, {k1, k2}
Data Data Data
Commit
Table
t2
(k1, v1, t1) (k2, v2, t1)
Write (t1, t2)
(t1, t2)
Execution example
tr = t1
tc = t2
25
Transaction
Manager
Client
Data Data Data
Commit
Table
Read (k1, t3)
(k1, v1, t1) (k2, v2, t1)
Read (t1)
Execution example
tr = t3
26
Bottleneck!
TSO
(t1, t2)
Client
Data Data Data
Commit
Table
t2
(k1,v1,t1,t2) (k2,v2,t1,t2)
Delete(t1)
Post-Commit
tr = t1
tc = t2
Update
commit
cells
27
TSO
(t1, t2)
Data Data Data
Commit
Table
Read (k1, t3)
Using Commit Cells
Client
tr = t3
28
TSO
(k1,v1,t1,t2) (k2,v2,t1,t2)
Durability
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
29
Transaction
Manager
Results/Timestamp
HBase
table
What about high availability?
Client
Begin/Commit
Data Data Data
Commit
Table
Persist
Commit
Verify commitRead/Write
Single
point of
failure
30
Transaction
Manager
Results/Timestamp
High availability
Client
Begin/Commit
Data Data Data
Commit
Table
Verify commitRead/Write
31
Results/Timestamp
Transaction
Manager
(TSO)
Transaction
Manager
(TSO)
Recovery
state
Force abortPersist
Commit
Benchmark: single-write transaction workload
Easily scales beyond 500K tps
Latency problem solved
TSO latency
bottleneck!TSO latency
bottleneck!
New scenarios for Omid
33
Secondary Indexes
Atomic Updates
How can we update metadata?
On-the-Fly Index Creation
What should we do with inflight transaction?
Extended Snapshot Isolation
Read-Your-Own-Writes Queries
Does not match to snapshot isolation
Secondary index: creation and maintenance
34
T1
T2
T3
CREATE
INDEX
started
T4
CREATE
INDEX
complete
T5
T6
Secondary index: creation and maintenance
35
T1
T2
T3
CREATE
INDEX
started
T4
CREATE
INDEX
complete
T5
T6
Bulk-Insert
into index
Abort
(enforced
upon
commit)
Added by
a
coproces
sor
Added by
a
coproces
sor
Index
update
(stored
procedure)
Extended snapshot isolation
36
BEGIN;
INSERT INTO T
SELECT ID+10 FROM T;
INSERT INTO T
SELECT ID+100 FROM T;
COMMIT;
CREATE TABLE T (ID INT);
...
Moving snapshot implementation
37
Checkpoint for
Statement 1
Checkpoint for
Statement 2
Writes by
Statement 1
Timestamps allocated by TM in blocks.
Client promotes the checkpoint.
Summary
38
Apache Phoenix is a relational database layer for HBase
Apache Phoenix need a scalable and HA Tps
Omid is Battle-Tested, Highly Scalable, Low-Latency Tps
Phoenix-Omid integration provides an efficient OLTP for Hadoop
Cloud-scale use cases in Yahoo

Mais conteúdo relacionado

Mais procurados

Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaAbrar Sheikh
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...DataStax
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsNavina Ramesh
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...Data Con LA
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentDataWorks Summit/Hadoop Summit
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analyticsmason_s
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, EuropeFlip Kromer
 
Change Data Capture in Scylla
Change Data Capture in ScyllaChange Data Capture in Scylla
Change Data Capture in ScyllaScyllaDB
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa HBaseCon
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentHostedbyConfluent
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on KubernetesHBaseCon
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzaAbhishek Shivanna
 

Mais procurados (20)

Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into Kafka
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentQuerying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data AnalyticsSupersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
 
Change Data Capture in Scylla
Change Data Capture in ScyllaChange Data Capture in Scylla
Change Data Capture in Scylla
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa Rolling Out Apache HBase for Mobile Offerings at Visa
Rolling Out Apache HBase for Mobile Offerings at Visa
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
Principles in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, ConfluentPrinciples in Data Stream Processing | Matthias J Sax, Confluent
Principles in Data Stream Processing | Matthias J Sax, Confluent
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 
stream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samzastream-processing-at-linkedin-with-apache-samza
stream-processing-at-linkedin-with-apache-samza
 

Semelhante a Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
 
Omid: Scalable and Highly Available Transaction Processing for Phoenix
Omid: Scalable and Highly Available Transaction Processing for PhoenixOmid: Scalable and Highly Available Transaction Processing for Phoenix
Omid: Scalable and Highly Available Transaction Processing for PhoenixEdward Bortnikov
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixDataWorks Summit
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processingconfluent
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Spark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Continuent
 
Sql on hadoop the secret presentation.3pptx
Sql on hadoop  the secret presentation.3pptxSql on hadoop  the secret presentation.3pptx
Sql on hadoop the secret presentation.3pptxPaulo Alonso
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 

Semelhante a Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix (20)

Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
 
Omid: Scalable and Highly Available Transaction Processing for Phoenix
Omid: Scalable and Highly Available Transaction Processing for PhoenixOmid: Scalable and Highly Available Transaction Processing for Phoenix
Omid: Scalable and Highly Available Transaction Processing for Phoenix
 
Omid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache PhoenixOmid: scalable and highly available transaction processing for Apache Phoenix
Omid: scalable and highly available transaction processing for Apache Phoenix
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Omid: A transactional Framework for HBase
Omid: A transactional Framework for HBaseOmid: A transactional Framework for HBase
Omid: A transactional Framework for HBase
 
Linux capacity planning
Linux capacity planningLinux capacity planning
Linux capacity planning
 
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...Oracle HA, DR, data warehouse loading, and license reduction through edge app...
Oracle HA, DR, data warehouse loading, and license reduction through edge app...
 
Sql on hadoop the secret presentation.3pptx
Sql on hadoop  the secret presentation.3pptxSql on hadoop  the secret presentation.3pptx
Sql on hadoop the secret presentation.3pptx
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 

Último

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

  • 1. Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix Ohad Shacham Yahoo Research Edward Bortnikov Yahoo Research RESEARCH Yonatan Gottesman Yahoo Research
  • 2. Agenda 2 Translytics = Transactions + Analytics Cloud-Scale Use Cases Doing it in the HBase-Omid-Phoenix World Omid and Phoenix Deep Dive
  • 4. The Applications Perspective 4 Event-to-action/insight latency becomes king Stream processing, asynchronous execution Data consistency becomes nontrivial Complex processing patterns (online reporting to AI) Data integration across multiple feeds and schemas OLTP World Analytics World
  • 5. Translytics Platforms Vision 5 The best of all worlds: OLTP and Analytics all-in-one Enable complex, consistent, real-time data processing Simple API’s with strong guarantees Built to scale on top of NoSQL data platforms
  • 6. OLTP Coming to NoSQL 6 Traditional NoSQL guarantees row-level atomicity Translytics applications often bundle reads and writes Asynchronous design patterns drive concurrency Without ACID guarantees, chaos rules!
  • 7. ACID transactions Multiple data accesses in a single logical operation Atomic “All or nothing” – no partial effect observable Consistent The DB transitions from one valid state to another Isolated Appear to execute in isolation Durable Committed data cannot disappear
  • 8. Use Case: Audience Targeting for Ads 8 Advertisers optimize campaigns to reach the right user audiences Ad-tech platforms build and sell audience segments (identity sets) Segmentation is based on user features (demographics, behavior, …) Algorithms vary from rule-based heuristics to AI classification Timeliness directly affects revenue
  • 9. Real-Time Targeting Platform 9 Storm for Compute Audience segmentation algorithms embedded in bolts HBase for Storage User Profiles (U), Segments (S), and U ↔ S relationships Kafka for Messaging Scale: trillions of touchpoints/month
  • 10. Challenge: Keeping the Data Consistent 10 Shared data is accessed in parallel by multiple bolts Access patterns are complex User profile update: read+compute+write User↔Segment mapping update: two writes Segment query (scan): read multiple rows HBase read/write API does not provide atomic guarantees
  • 11. Omid Comes to Help 11 Transaction Processing layer for Apache HBase Apache Incubation (started 2015, graduation planned 2019) Easy-to-use API (good old NoSQL) Popular consistency model (snapshot isolation) Battle tested (in prod @Yahoo since 2015, new customers onboarding)
  • 12. Omid Programming 12 TransactionManager tm = HBaseTransactionManager.newInstance(); TTable txTable = new TTable("MY_TX_TABLE”); Transaction tx = tm.begin(); // Control path Put row1 = new Put(Bytes.toBytes("EXAMPLE_ROW1")); row1.add(family, qualifier, Bytes.toBytes("val1")); txTable.put(tx, row1); // Data path Put row2 = new Put(Bytes.toBytes("EXAMPLE_ROW2")); row2.add(family, qualifier, Bytes.toBytes("val2")); txTable.put(tx, row2); // Data path tm.commit(tx); // Control path
  • 13. SQL Coming to NoSQL 13 NoSQL API is simple but crude and non-standardized Hard to manage complex schemas (low-level data abstraction) Hard to implement analytics queries (low-level access primitives) Hard to optimize for speed (server-side programming required) Hard to integrate with relational data sources
  • 14. Use Case: Real-Time Ad Inventory Ingestion 14 Advertisers deploy campaign content & metadata in the marketplace SQL-speaking external client Complex schema (many campaign types and optimization goals) High scalability (growing market) Campaign operations run multidimensional inventory analytics Aggregate queries by advertiser, product, time, etc. ML pipeline learns recommendation models for new campaigns NoSQL-style access to data
  • 15. Phoenix comes to Help 15 OLTP and Real-Time Analytics for HBase Query optimizer transforms SQL to native HBase API calls Standard SQL interface with JDBC API’s High level data abstractions (e.g., secondary indexes) High performance (leverages server-side coprocessors)
  • 16. Phoenix/Omid Integration 16 Phoenix is designed for public-cloud scale (>10K query servers) Omid is extremely scalable (>600k tps), low-latency (<5ms), and HA New Omid release (1.0.1) - SQL features, improved performance Supports secondary indexes, extended Snapshot Isolation, downstream filters Phoenix releases 4.15 and 5.1 include Omid as Phoenix Tps Phoenix refactored to support multiple TP backends (Omid is default)
  • 17. Phoenix/Omid Integration performance 17 1M initial inserts, 1Kb each row Omid in Sync post commit mode
  • 18. Why do we care? 18 SQL transactions SELECT * FROM my_table; -- This will start a transaction UPSERT INTO my_table VALUES (1,'A’); SELECT count(*) FROM my_table WHERE k=1; DELETE FROM my_other_table WHERE k=2; !commit -- Other transactions will now see your updates and you will see theirs
  • 19. Why do we care? 1919 Non-transactional secondary index update might breaks consistency (k1, [v1,v2,v3]) Table Index (v1, k1) Write (k1, [v1,v2,v3])
  • 20. Why do we care? 20 Updating the secondary index fails Out of handlers Many jiras discuss this issue 20 (k1, [v1,v2,v3]) Table Index Write (k1, [v1,v2,v3])
  • 21. Transactions and snapshot isolation Aborts only on write-write conflicts Read point Write point begin commitread(x) write(y) write(x) read(y)
  • 22. Omid architecture Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write Conflict Detection 22 Transaction Manager Results/Timestamp
  • 23. Omid low latency (LL) architecture Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write 23 Transaction Manager Results/Timestamp
  • 24. Client Begin Data Data Data Commit Table t1 Write (k1, v1, t1) Write (k2, v2, t1) Read (k’, last committed t’ < t1) (k1, v1, t1) (k2, v2, t1) Execution example tr = t1 Transaction Manager 24
  • 25. Client Commit: t1, {k1, k2} Data Data Data Commit Table t2 (k1, v1, t1) (k2, v2, t1) Write (t1, t2) (t1, t2) Execution example tr = t1 tc = t2 25 Transaction Manager
  • 26. Client Data Data Data Commit Table Read (k1, t3) (k1, v1, t1) (k2, v2, t1) Read (t1) Execution example tr = t3 26 Bottleneck! TSO (t1, t2)
  • 27. Client Data Data Data Commit Table t2 (k1,v1,t1,t2) (k2,v2,t1,t2) Delete(t1) Post-Commit tr = t1 tc = t2 Update commit cells 27 TSO (t1, t2)
  • 28. Data Data Data Commit Table Read (k1, t3) Using Commit Cells Client tr = t3 28 TSO (k1,v1,t1,t2) (k2,v2,t1,t2)
  • 29. Durability Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write 29 Transaction Manager Results/Timestamp HBase table
  • 30. What about high availability? Client Begin/Commit Data Data Data Commit Table Persist Commit Verify commitRead/Write Single point of failure 30 Transaction Manager Results/Timestamp
  • 31. High availability Client Begin/Commit Data Data Data Commit Table Verify commitRead/Write 31 Results/Timestamp Transaction Manager (TSO) Transaction Manager (TSO) Recovery state Force abortPersist Commit
  • 32. Benchmark: single-write transaction workload Easily scales beyond 500K tps Latency problem solved TSO latency bottleneck!TSO latency bottleneck!
  • 33. New scenarios for Omid 33 Secondary Indexes Atomic Updates How can we update metadata? On-the-Fly Index Creation What should we do with inflight transaction? Extended Snapshot Isolation Read-Your-Own-Writes Queries Does not match to snapshot isolation
  • 34. Secondary index: creation and maintenance 34 T1 T2 T3 CREATE INDEX started T4 CREATE INDEX complete T5 T6
  • 35. Secondary index: creation and maintenance 35 T1 T2 T3 CREATE INDEX started T4 CREATE INDEX complete T5 T6 Bulk-Insert into index Abort (enforced upon commit) Added by a coproces sor Added by a coproces sor Index update (stored procedure)
  • 36. Extended snapshot isolation 36 BEGIN; INSERT INTO T SELECT ID+10 FROM T; INSERT INTO T SELECT ID+100 FROM T; COMMIT; CREATE TABLE T (ID INT); ...
  • 37. Moving snapshot implementation 37 Checkpoint for Statement 1 Checkpoint for Statement 2 Writes by Statement 1 Timestamps allocated by TM in blocks. Client promotes the checkpoint.
  • 38. Summary 38 Apache Phoenix is a relational database layer for HBase Apache Phoenix need a scalable and HA Tps Omid is Battle-Tested, Highly Scalable, Low-Latency Tps Phoenix-Omid integration provides an efficient OLTP for Hadoop Cloud-scale use cases in Yahoo