SlideShare uma empresa Scribd logo
1 de 23
Budapest University of Technology and Economics
Department of Measurement and Information Systems
Optimization of Incremental Queries in
the Cloud
József Makai, Gábor Szárnyas, Ákos Horváth,
István Ráth, Dániel Varró
Budapest University of Technology and Economics
Fault Tolerant Systems Research Group
INCQUERY-D: DISTRIBUTED
INCREMENTAL MODEL QUERIES
Incremental Query Evaluation by RETE
 AUTOSAR well-formedness validation rule
Communication
channel
Logical signal Mapping Physical signal
Invalid model fragment
 Instance model
Valid model fragment
Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes
Read the changes in the
result set (deltas)
Incremental Query Evaluation by RETE
join
join
antijoin
Result set
Communication
channel
Logical signal Mapping Physical signal
Goals of IncQuery-D
 Objectives
o Distributed incremental pattern matching
o Adaptation of IncQuery tooling to graph DBs
o Executed over cloud infrastructure (COTS hardware)
 Achieve scalability by avoiding memory bottleneck
o Sharding separately
• Data
• Indexers
• Query network
o In memory:
• Index + Query
Assumptions
• All Rete nodes fit on a server node
• Indexers can be filled efficiently
• Modification size ≪ model size
• The application requires the complete result
set of the query (opposed to just one match)
Database
shard 0
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
Server 0
Rete net
Indexer
layer
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing,
notification
Distributed persistent
storage
Distributed production network
• Each intermediate node can be allocated
to a different host
• Remote internode communication
INCQUERY-D Architecture
Server 1
Database
shard 1
Server 2
Database
shard 2
Server 3
Database
shard 3
Transaction
In-memory
EMF model
Database
shard 0
Server 0
Indexer
layer
INCQUERY-D
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
Akka
Triple store (4store),
Document DB (Mongo),
RDF over Column family
(Cumulus)
RETE Deployment Process
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
pattern routeSensor(sensor: Sensor) = {
TrackElement.sensor(switch,sensor);
Switch(switch);
SwitchPosition. switch(sp, switch);
SwitchPosition(sp);
Route.switchPosition(route, sp);
Route(route);
neg find head(route, sensor);
}
pattern head(R, Sen) = {
Route.routeDefinition(R, Sen);
}
route: Route sp: SwitchPosition
Switch:Switchsensor:Sensor
switchPosition
switch
sensor
routeDefinition
RETE Deployment Process
 Construct language-
independent constraints
 Resolution of
o syntactic sugar
o type information
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
Variables route sp switch
Parameter sensor
Constraints
Edge: SwitchPosition.switch
Edge: TrackElement.sensor
Edge: Route.switchPosition
Negation: head
RETE Deployment Process
 Construct RETE structure
(platform independently)
 Optimizations:
o Model statistics
o Expected usage profile
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
join
join
join
RETE Deployment Process
 Architecture model
(Cloud infrastructure)
o Virtual Machines
• Memory limits
• CPU speed
• Storage capacity
o Communication Channels
• Bandwidth
 Specified by a textual DSL
(Xtext)
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
1 2
3 4
RETE Deployment Process
Machine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
1 2
3 4
Join1
Join3
Join2
In1 In2 In3 In4
Allocation can be optimized for
query performance and other
beneficial system characteristics!
RETE Deployment Process
 Configuration scripts for
o Deployment
o Communication
middleware
 Derived by automated
code generation
o Using Eclipse technology:
EMF-IncQuery + Xtend
Query
Language
Query
Predicates
RETE
Structure
Platform
Description
Allocation /
Mapping
Deployment
Descriptor
ALLOCATION OPTIMIZATION IN
INCQUERY-D
Motivation for Allocation Optimization
 Considering data-intensive
systems
o Over usage of resources
o Cost of the system
o Overhead of network
communication
Job Job
t
Local job
execution time
t’
Data transmission time
is significant component
in global execution time
~
Job
Job
Job
Network links can have
different capacities
4000 MB
Process
2000 MB
Process
500 MB
Process
2400 MB
$$$
Poor utilization leads
to expensive system
The Allocation Problem
 Inputs
 Allocation constraints
 Output: Valid allocation
 Optimization targets
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
5000 MB6000 MB
1 2
• Rete network for the query
organized to processes
• Resource consumption
Available infrastructure with
important resource parameters
Opt. Target: Communication Minimization
1 × 1,000,000
3 × 200,000 3 × 200,000
Communication = 2,200,000
6000 MB
5000 MB
1
2500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1,000,000200,000
200,000
1 2
3
4
3 × 1,000,000
1 × 200,000
1 × 200,000
Communication = 3,400,000
5000 MB
6000 MB
1
2
Largest volume of data is
sent through faster local link
Opt. Target: Cost Minimization
500 MB
3200 MB
2400 MB600 MB
Worker node
Input nodeInput node
Production node
1 2
3
4
4000 MB
$5
4000 MB
$5
6500 MB
$7
1
2
3
Cost = 10
4000 MB
$5
4000 MB
$5
6500 MB
$7
1
2
3
Cost = 12
Heuristics in Optimization
Worker node
Production
node
Input node
Worker node
Input nodeInput node
Worker node
Production
node
Production
node
Worker node
Model
database
Number of model
elements
?? MB
Input node
Memory consumption of
Rete nodes and processes
1 1 1
1 1 1
1
Memory usage of Input
nodes can be estimated
Communication
intensity of network
communication
channels2 2
2
2
2 2
3 3
3
3 3
4 4
Performance Impact of Optimization
61K 213K 867K 3M 13M
Model size (number of elements)
Time(sec)
First evaluation time of a complex query
28
45
72
114
182
290
463
739
Max. memory
Naive
optimization
Communication
optimization
739
616
194
144
2 minutes gain!
This approach
doesn’t work for
larger models!
Network Traffic Statistics
300
349 371
1020
248 280
347
875
14
2
74
90
24
20
190
234
0
200
400
600
800
1000
1200
vm0 vm1 vm2 total vm0 vm1 vm2 total
Network Traffic in Megabytes
Remote Local
Unoptimized Optimized
 Unoptimized:
o Remote Traffic:
1020
o Local Traffic: 90
o Total Traffic: 1110
 Optimized:
o Remote Traffic:
875
o Local Traffic: 234
o Total Traffic: 1109
Conclusion and Future Work
 Results
o Novel approach for application-specific resource allocation optimization for
distributed Rete
o CPLEX-based implementation for IncQuery-D
o Preliminary evaluation results
• Significant improvements for local resource management
• Performance gains especially over slow / inhomogeneous networks
• Efficient optimization execution (supported by runtime cutoff in CPLEX)
 Future work
o Hadoop / YARN support (new IncQuery-D developments)
• Support configuration optimization for other Hadoop-based cloud apps
o Static allocation  Dynamic reallocation
• Take existing configuration as a starting constraint set
• Optimize for changed workload conditions
New INCQUERY-D Architecture
Docker container 1
Database
shard 1
Docker container 2
Database
shard 2
Docker container 3
Database
shard 3
Transaction
In-memory
EMF model
Database
shard 0
Docker container 0
Indexer
layer
New INCQUERY-D: “Hadoop over Docker”
Indexer Indexer Indexer Indexer
Join
Join
Antijoin
• YARN resource
management
• ZooKeeper
monitoring
Akka actors
embedded into long-
running Hadoop jobs

Mais conteúdo relacionado

Semelhante a Optimization of Incremental Queries CloudMDE2015

RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsVishal Sharma, Ph.D.
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesGábor Szárnyas
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Vishal Sharma, Ph.D.
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolHenry Muccini
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesGábor Szárnyas
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCIRJET Journal
 
Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Louis Abalu
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Deepak Shankar
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsiigeeks1234
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsiigeeks1234
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiigeeks1234
 
P9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalP9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalAamir Habib
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 

Semelhante a Optimization of Incremental Queries CloudMDE2015 (20)

RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) ToolsA Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
A Survey of Recent Advances in Network Planning/Traffic Engineering (TE) Tools
 
IncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph QueriesIncQuery-D: Distributed Incremental Graph Queries
IncQuery-D: Distributed Incremental Graph Queries
 
Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?Network Planning & Design: An Art or a Science?
Network Planning & Design: An Art or a Science?
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
Sharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph QueriesSharded Joins for Scalable Incremental Graph Queries
Sharded Joins for Scalable Incremental Graph Queries
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
 
Design and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoCDesign and Implementation of JPEG CODEC using NoC
Design and Implementation of JPEG CODEC using NoC
 
Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)Abstract + Poster (MSc Thesis)
Abstract + Poster (MSc Thesis)
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
 
P9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_finalP9 addressing signal_integrity_ in_ew_2015_final
P9 addressing signal_integrity_ in_ew_2015_final
 
Features
FeaturesFeatures
Features
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 

Último

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 

Último (20)

Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 

Optimization of Incremental Queries CloudMDE2015

  • 1. Budapest University of Technology and Economics Department of Measurement and Information Systems Optimization of Incremental Queries in the Cloud József Makai, Gábor Szárnyas, Ákos Horváth, István Ráth, Dániel Varró Budapest University of Technology and Economics Fault Tolerant Systems Research Group
  • 3. Incremental Query Evaluation by RETE  AUTOSAR well-formedness validation rule Communication channel Logical signal Mapping Physical signal Invalid model fragment  Instance model Valid model fragment
  • 4. Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changes Read the changes in the result set (deltas) Incremental Query Evaluation by RETE join join antijoin Result set Communication channel Logical signal Mapping Physical signal
  • 5. Goals of IncQuery-D  Objectives o Distributed incremental pattern matching o Adaptation of IncQuery tooling to graph DBs o Executed over cloud infrastructure (COTS hardware)  Achieve scalability by avoiding memory bottleneck o Sharding separately • Data • Indexers • Query network o In memory: • Index + Query Assumptions • All Rete nodes fit on a server node • Indexers can be filled efficiently • Modification size ≪ model size • The application requires the complete result set of the query (opposed to just one match)
  • 6. Database shard 0 INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction Server 0 Rete net Indexer layer INCQUERY-D Distributed query evaluation network Distributed indexer Model access adapter Distributed indexing, notification Distributed persistent storage Distributed production network • Each intermediate node can be allocated to a different host • Remote internode communication
  • 7. INCQUERY-D Architecture Server 1 Database shard 1 Server 2 Database shard 2 Server 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Server 0 Indexer layer INCQUERY-D Indexer Indexer Indexer Indexer Join Join Antijoin Akka Triple store (4store), Document DB (Mongo), RDF over Column family (Cumulus)
  • 8. RETE Deployment Process Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); } pattern head(R, Sen) = { Route.routeDefinition(R, Sen); } route: Route sp: SwitchPosition Switch:Switchsensor:Sensor switchPosition switch sensor routeDefinition
  • 9. RETE Deployment Process  Construct language- independent constraints  Resolution of o syntactic sugar o type information Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor Variables route sp switch Parameter sensor Constraints Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head
  • 10. RETE Deployment Process  Construct RETE structure (platform independently)  Optimizations: o Model statistics o Expected usage profile Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor join join join
  • 11. RETE Deployment Process  Architecture model (Cloud infrastructure) o Virtual Machines • Memory limits • CPU speed • Storage capacity o Communication Channels • Bandwidth  Specified by a textual DSL (Xtext) Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4
  • 12. RETE Deployment Process Machine Allocated Nodes 1 In1, In2, Join2 2 In3 3 In4 4 Join1, Join3 Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor 1 2 3 4 Join1 Join3 Join2 In1 In2 In3 In4 Allocation can be optimized for query performance and other beneficial system characteristics!
  • 13. RETE Deployment Process  Configuration scripts for o Deployment o Communication middleware  Derived by automated code generation o Using Eclipse technology: EMF-IncQuery + Xtend Query Language Query Predicates RETE Structure Platform Description Allocation / Mapping Deployment Descriptor
  • 15. Motivation for Allocation Optimization  Considering data-intensive systems o Over usage of resources o Cost of the system o Overhead of network communication Job Job t Local job execution time t’ Data transmission time is significant component in global execution time ~ Job Job Job Network links can have different capacities 4000 MB Process 2000 MB Process 500 MB Process 2400 MB $$$ Poor utilization leads to expensive system
  • 16. The Allocation Problem  Inputs  Allocation constraints  Output: Valid allocation  Optimization targets 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 5000 MB6000 MB 1 2 • Rete network for the query organized to processes • Resource consumption Available infrastructure with important resource parameters
  • 17. Opt. Target: Communication Minimization 1 × 1,000,000 3 × 200,000 3 × 200,000 Communication = 2,200,000 6000 MB 5000 MB 1 2500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1,000,000200,000 200,000 1 2 3 4 3 × 1,000,000 1 × 200,000 1 × 200,000 Communication = 3,400,000 5000 MB 6000 MB 1 2 Largest volume of data is sent through faster local link
  • 18. Opt. Target: Cost Minimization 500 MB 3200 MB 2400 MB600 MB Worker node Input nodeInput node Production node 1 2 3 4 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 10 4000 MB $5 4000 MB $5 6500 MB $7 1 2 3 Cost = 12
  • 19. Heuristics in Optimization Worker node Production node Input node Worker node Input nodeInput node Worker node Production node Production node Worker node Model database Number of model elements ?? MB Input node Memory consumption of Rete nodes and processes 1 1 1 1 1 1 1 Memory usage of Input nodes can be estimated Communication intensity of network communication channels2 2 2 2 2 2 3 3 3 3 3 4 4
  • 20. Performance Impact of Optimization 61K 213K 867K 3M 13M Model size (number of elements) Time(sec) First evaluation time of a complex query 28 45 72 114 182 290 463 739 Max. memory Naive optimization Communication optimization 739 616 194 144 2 minutes gain! This approach doesn’t work for larger models!
  • 21. Network Traffic Statistics 300 349 371 1020 248 280 347 875 14 2 74 90 24 20 190 234 0 200 400 600 800 1000 1200 vm0 vm1 vm2 total vm0 vm1 vm2 total Network Traffic in Megabytes Remote Local Unoptimized Optimized  Unoptimized: o Remote Traffic: 1020 o Local Traffic: 90 o Total Traffic: 1110  Optimized: o Remote Traffic: 875 o Local Traffic: 234 o Total Traffic: 1109
  • 22. Conclusion and Future Work  Results o Novel approach for application-specific resource allocation optimization for distributed Rete o CPLEX-based implementation for IncQuery-D o Preliminary evaluation results • Significant improvements for local resource management • Performance gains especially over slow / inhomogeneous networks • Efficient optimization execution (supported by runtime cutoff in CPLEX)  Future work o Hadoop / YARN support (new IncQuery-D developments) • Support configuration optimization for other Hadoop-based cloud apps o Static allocation  Dynamic reallocation • Take existing configuration as a starting constraint set • Optimize for changed workload conditions
  • 23. New INCQUERY-D Architecture Docker container 1 Database shard 1 Docker container 2 Database shard 2 Docker container 3 Database shard 3 Transaction In-memory EMF model Database shard 0 Docker container 0 Indexer layer New INCQUERY-D: “Hadoop over Docker” Indexer Indexer Indexer Indexer Join Join Antijoin • YARN resource management • ZooKeeper monitoring Akka actors embedded into long- running Hadoop jobs

Notas do Editor

  1. Ez szuper jól bemutatja azokat a fogalmakat, amivel mi is dolgozunk a végén, szóval ezt hasznos lenne bemutatni.
  2. Kulcsgondolatok: Erőforrások túlhasználását el kell kerülni, de a rossz kihasználtság meg drága rendszerhez vezet Adatküldés ideje jelentős összetevő a globális végrehajtási időben, hálózati linkek is különböző sebességűek lehetnek  erre optimalizálunk
  3. Ennél el kell majd mondani mit jelentenek a számok az egyes “éleken”.
  4. Normalized tuple-t használjuk becsléshez. Egy node-nál ez következőképpen néz ki: megnézzük mennyi adat várható bemeneti csatornákon (először input node-nál, ahol biztosan tudjuk is azt) Abból közelítjük memória fogyasztást processeknek lineáris regresszióval Kiszámoljuk node típusa és bemeneti adat mennyiségének függvényében a kimenő csatornákra jutó adat mennyiségét (mindegyiken ugyanannyi lesz), input node-ra ezt is tudjuk tutira, mert mindent továbbít Ezt végezzük szintről szintre, háló szélességi bejárásával Ezt kellene itt összefoglalni