Mais conteĂșdo relacionado Semelhante a Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite (20) Mais de VMware Tanzu (20) Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite2. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Christian Tzolov
Spring Software Engineer at Pivotal
Spring Cloud Data Flow
Apache Committer, Crunch PMC member
Apache {Crunch, HAWQ, Zeppelin, Calcite...}
2
blog.tzolov.net
@christzolov
@tzolov
3. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
âWhat happens if an established NoSQL database
decides to implement a reasonably standard SQL;
The only predictable outcome for such an
eventuality is plenty of argument.â
Martin Fowler, P.J.Sadalage, NoSQL Distilled, 2012
3
5. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Safe Harbor Statement
The following is intended to outline the general direction of Pivotal's offerings. It is
intended for information purposes only and may not be incorporated into any
contract. Any information regarding pre-release of Pivotal offerings, future updates
or other planned modifications is subject to ongoing evaluation by Pivotal and is
subject to change. This information is provided without warranty or any kind,
express or implied, and is not a commitment to deliver any material, code, or
functionality, and should not be relied upon in making purchasing decisions
regarding Pivotal's offerings. These purchasing decisions should only be based on
features currently available. The development, release, and timing of any features
or functionality described for Pivotal's offerings in this presentation remain at the
sole discretion of Pivotal. Pivotal has no obligation to update forward looking
information in this presentation.
5
6. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Demo?
In a minute ;)
6
8. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
8
9. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
One Size Fit All*
âThe traditional DBMS architecture has been used to support many
data-centric applications with widely varying characteristics and
requirements⊠We argue that this concept is no longer applicable to the
database market, and that the commercial world will fracture into a
collection of independent database engines, some of which may be
unified by a common front-end parser.â
Michael Stonebraker, 2005
9
10. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Collection of Independent Databases
10
How to ingest and keep
consistent across DBs?
How to access the data
across multiple DBs?
12. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Federation: Unified Query Interface
12
13. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Federation: Two Mapping Tasks
13
15. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Geode Model Mapping
âą PDX serialization
âą Primitive Types
âą Arrays (employ[0])
âą Nested Objects
(parent[âchild.childChildâŠâ])
15
Cache
Region 1
Region K
ValKey
v1k1
v2k2
Schema
Table 1
Table K
C
o
l
1
C
o
l
2
C
o
l
N
V(M,1)
R
o
w
M
V(M,2)
V(M,N
)
V(2,1)
R
o
w
2
V(2,2) V(2,N)
V(1,1)
R
o
w
1
V(1,2) V(1,N)
2. Regions into Tables
1. Cache into Schema
3. Key/Value into
Table Row
4. Column Types inferred
from PDX fields
17. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SQL for NoSQL?
âą Extended Relational Algebra - already present in most NoSql data
âą Relational Expression Optimization â Desirable but hard to implement
17
18. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Query = Relational Algebra
18
SELECT b.totalPrice, c.firstName
FROM BookOrder AS b
INNER JOIN BookCustomer AS c ON b.customerNumber = c.customerNumber
WHERE b.totalPrice > 0
âŠ
}
Scan Scan
Join
Filter
Project
BookCustomer [c] BookOrder [b]
(on customerNumber)
(b.totalPrice > 0)
(c.firstName, b.totalPrice)
Scan Scan
Join
Project
BookCustomer [c] BookOrder [b]
(on
customerNumber)
(totalPrice > 0)
(c.firstName, b.totalPrice)
Project
(firstName,
customerNumber)
Filter
(totalPrice,
customerNumber)Project
<<Optimization>>
<<SQLParsing>>
19. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
âNativeâ Geode Operations
19
Relational Operation Supported on Geode
SORT/LIMIT YES (no Offset)
PROJECT YES
FILTER YES
AGGREGATE
SUM, AVG, MAX, MIN,
COUNT, GROUP BY,
DISTINCT
JOIN NO
21. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache Calcite
âą Java Framework
âą SQL Query Parser, Validator and Optimizer(s)
âą JDBC drivers - local and remote
âą Agnostic to data storage and processing
âą Powered by Calcite:
21
22. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache Geode Adapter
22
Geode API and OQL
SQL/JDBC
Convert relational expressions into OQL queriesGeode Adapter
(Geode Client)
Geode ServerGeode ServerGeode Server
Data Data Data
Push down all supported expressions to Geode OQL
and fall back to Calcite Enumerable for the rest
Enumerable
Adapter
Apache Calcite Parse SQL into relational expression and optimizes
Calcite-Geode-Embedded(e.g.JDBC.jar)
23. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Apache Geode Concepts
Cache Server
Cache
Region 1
Region N
ValKey
v1k1
v2k2
Cache - In-memory collection of
Regions
Region - consistent, distributed Map
(key-value),
Partitioned or Replicated
CacheServer â process
connected to the distributed
system with created Cache
ClientLocator
Client âread and modify the content
of the distributed system
Locator â tracks system members and
provides membership information
âŠ
Listeners
FunctionsFunctions â distributed,
concurrent data processing
Listener â event handler.
Registers for one or more events
and notified when they occur
24. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Key/Value vs. OQL
âto get extreme scale we use GemFire for what it does best: key-value
storage. Then, when we canât design our way to using what GemFire does
best, we can use OQL and treat it like an object data-baseâ
Mike Stolz, âScaling Data Services with Pivotal GemFireÂźâ, 2018
25. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Query Execution Flow
25
Enumerable
Interpreter
Prepare
SQL,
Relational,
Planner
Geode
Adapter
Binder
JDBC
Geode Cluster
1
2
3
4
5
6 7
7
7
2. Parse SQL, convert to rel. expressions. Validate
and Optimize them
3. Start building a physical plan from the relation
expressions
4. Implement the Geode relations and encode
them as Expression tree
5. Pass the Expression tree to the Interpreter to
generate Java code
6. Generate and Compile a Binder instance that
on âbind()â call runs Geodesâ query method
1. On new SQL query JDBC delegates to Prepare
to prepare the query execution
7. JDBC uses the newly compiled Binder to
perform the query on the Geode Cluster
Calcite Framework
Geode Adapter
2
26. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Query Mapping - Example
26
SELECT author
FROM BookMaster
WHERE retailCost > 0
LIMIT 1
âŠ
}
SELECT author
FROM BookMaster
WHERE retailCost > 0
LIMIT 1 OFFSET 1
âŠ
}
28. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
DB Tools Integration
28
Access Apache Geode /
GemFire over SQL/JDBC
Explore Geode & GemFire
Data with IntelliJ
SQL/Database tool
29. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Analytics
29
Advanced Apache
Geode Data Analytics
with Apache Zeppelin
over SQL/JDBC
30. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Data Federation
30
{Geode|Greenplum|...} Data Federation
31. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Spatial (OpenGIS /PostGIS)
31
32. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SQL Stream (in progress...)
32
https://calcite.apache.org/docs/stream.html
33. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
SQL Stream (2)
33
â How many rows come out for each row in?
â Does each incoming value appear in one total, or more?
â What defines the âwindowâ, the set of rows that contribute to a given output row?
â Is the result a stream or a relation?
â tumbling window (GROUP BY)
â hopping window (multi GROUP BY)
â sliding window (window functions)
â cascading window (window
functions)
35. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Pros & Cons
âą Ad-hoc data exploration
âą JDBC integration with 3rd
party
tools
âą Data Federation, correlate Geode
with other data sources
âą SQL Streaming as CQ++
âą No-intrusive and extensible
approach
âą What happened with SQLFire?
âą Geode == Transactional System!
SQL+Geode <> Analytical System!
âą Key/Value vs. Full Scan
âą Overhead: SQL > OQL > Functions
âą Data at Rest (Table) vs. Data at
Motion (Stream)
36. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Future work
âą Add Geode Adapter to Calcite Project: [CALCITE-2059] â VOTE! ;)
âą Table Materialization
âą SQL Streaming (https://calcite.apache.org/docs/stream.html)
âą Pre-defined vs. Inferred Schema
âą âNativeâ JOIN operation implementation
âą Geode Indexes and Query statistics as Calcite metadata. Hint Indexes
âą Mixing Key/Value and OQL data access
âą Explore Geode functions as an alternative to OQL
âą Explore TX support
âą Performance Benchmark
âą Calcite Spatial (https://calcite.apache.org/docs/spatial.html)
37. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Credits
to Dan Baskette for inspiring this work
and to Roman Shaposhnik for helping
spread the word
37
38. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
References
38
[1 ] Apache Geode: http://geode.apache.org
[2] Geode Object Query Language (OQL): http://bit.ly/2BfXmNR
[3] Apache Calcite: https://calcite.apache.org
[4] Calcite - supported SQL language: https://calcite.apache.org/docs/reference.html
[5] Apache Geode Adapter for Apache Calcite: https://github.com/tzolov/calcite/tree/geode-1.3
[6] Relational Algebra Operations: http://bit.ly/2zX8cMc
[7] Apache Phoenix - http://phoenix.apache.org
[8] Big Data is Four Different Problems, 2016, M.Stonebraker (video): http://bit.ly/2jDmKpq
[9] Architecture of a Database System, 2007 (J.M. Hellerstein, M. Stonebraker, J. Hamilton):
http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
[10] One Size Fits All (Michael Stonebraker): http://ieeexplore.ieee.org/document/1410100/?denied
39. Unless otherwise indicated, these slides are © 2013-2016 Pivotal Software, Inc. and licensed under a Creative Commons
Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Bonus: Geode Adapter Implementation
39
GeodeAdapterRel
+ implement(implContext)
GeodeAdapterConvention
Convention.Impl(âMyAdapterâ)
Common interface for all MyAdapter Relation
Expressions. Provides implementation callback
method called as part of physical plan
implementation
ImplContext
+ implParm1
+ implParm2 âŠ
RelNode
GeodeAdapterTable
+ toRel(optTable)
+ asQueryable(provider,âŠ)
GeodeAdapterQueryable
+ myQuery(params) :
Enumetator
TranslatableTable
<<instance of>>
AbstractQueryableTable
AbstractTableQueryable <<create>>
Can convert
queries in
Expression
myQuery() implements the call to your DB
It is called by the auto generated code. It must
return an Enumberable instance
GeodeAdapterScan
+ register(planer) {
Registers all MyAdapter Rules
}
<<create>>
GeodeAdapterToEnumerableConvertorRule
operands: (RelNode.class,
MyAdapterConvention, EnumerableConvention) ConverterRue
TableScan
GeodeAdapterToEnumerableConvertor
+ implement(EnumerableRelImplementor) {
ctx = new MyAdapterRel.ImplContext()
getImputs().implement(ctx)
return BlockBuild.append( MY_QUERY_REF,
Expressions.constant(ctx.implParms1),
Expressions.constant(ctx.implParms2) âŠ
EnumerableRel
ConvertorImpl
<<create on match >>
GeodeAdapterProject
GeodeAdapterFilter
GeodeAdapterXXX
RelOptRule
GeodeAdapterProjectRu
GeodeAdapterFilterRule
GeodeAdapterXXXRule
<<create on match >>
Recursively call the implement on each MyAdapter
Relation Expression
Encode the myQuery(params) call as
Expressions
MY_QUERY_REF = Types.lookupMethod(
MyAdapterQueryable.class,
âmyQueryâ,
String.class
String.class);
1
3
4
5
2
6
7
8
9
Calcite Framework
GeodeAdapter components
40. Learn More. Stay Connected.
Vote for [CALCITE-2059] and donât miss:
Simplifying Apache Geode with Spring Data
Exploring Data-Driven, Cognitive Capabilities in Pivotal Cloud Foundry
Orchestrating Data Microservices with Spring Cloud Data Flow
40
#springone@s1p