SlideShare uma empresa Scribd logo
1 de 18
DAW: Duplicate-AWare Federated Query
Processing over the Web of Data
Muhammad Saleem1 , Axel-Cyrille Ngonga Ngomo1, Josiane
Xavier Parreira2 , Helena F. Deus3 , Manfred Hauswirth2
1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany
lastname@informatik.uni-leipzig.de
2Digital Enterprise Research Institute(DERI), National University of Ireland.,Galway
firstname.lastnameg@deri.org
International Semantic Web Conference (ISWC), October 21-25 , 2013, Sydney, Australia
Motivation
S1 S2 S3 S4
RDF RDF RDF RDF
Parser
Source Selection
Federator Optimzer
Integrator
Get Individual
Triple Patterns
Identify capable
source against
Individual Triple
Patterns
Generate
optimized sub-
query Exe. Plan
Integrate sub-
queries results
Execute sub-
queries
Motivation
SELECT ?v1 ?v2
WHERE
{
?uri <p1> ?v1. // Triple Pattern 1 (TP1)
?uri <p2> ?v2. // Triple Pattern 2 (TP2)
}
S1
RDF
Source Selection Algorithm
S2
RDF
S3
RDF
S4
RDF
Triple pattern-wise source selection
S1 S2 S3TP1 =
S4TP2 = S2S1
Total triple pattern-wise selected sources = 6
Motivation
Retrieved results for TP1 (?uri <p1> ?v1) Retrieved results for TP2 (?uri <p2> ?v2)
Triple pattern-wise source selection and skipping
S1 S2 S3TP1 =
Total triple pattern-wise selected sources = 4
S1 S2TP2 = S4
Min. number of new triples (threshold) = 20
Total triple pattern-wise skipped sources = 2
Problem Statement
• Data duplication in LOD datasets
– E.g. DrugBank and Neurocommons are duplicated at
DERI health Care and Life Sciences Knowledge Base
• Duplicate results retrieval increase the query
execution time and network traffic
• How to estimate the overlap between data
sources before sub-queries federation?
Sketches
• Data structures that provide dataset summaries
– Min-wise Independent Permutations (MIPs)
– Bloom filters
• Estimate overlap among different ID sets
• MIPs provide good tradeoff between estimation
error and space requirements
• MIPs of different lengths can be compared
• Sketches all alone cannot be used in SPARQL
federation
– SPARQL queries are highly selective when subject,
predicate, or object becomes bound in a triple pattern
Min-wise Independent Permutations
48 24 36 18 820
21 3 12 24 877
9 21 15 24 4640
21 18 45 30 339
h1 = (7x + 3) mod 51
h2 = (5x + 6) mod 51
hN = (3x + 9) mod 51
8
9
9
Apply Permutations to all ID’s
ID set
Create MIP
Vector from
Minima of
Permutation
s
8
9
30
24
36
9
8
24
20
48
36
13
MIPs estimated operations
h(concat(s,o))
T4(s,p,o) T5(s,p,o) T6(s,p,o)
T1(s,p,o) T2(s,p,o) T3(s,p,o)
Triples
VA VB
8
9
20
24
36
9
Union (VA , VB)
Resemblance (VA , VB ) = 2/6 => 0.33
Overlap (VA , VB ) =
0.33*(6+6) / (1+0.33) => 3
hi = ai∗x + bimod U
𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 (𝑆𝐴, 𝑆 𝐵) =
𝑆 𝐴⋂𝑆𝐵
𝑆 𝐴⋃𝑆𝐵
≈
|VA⋂VB|
𝑁
Overlap (𝑆𝐴, 𝑆 𝐵)≈
𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 𝑉 𝐴
,𝑉 𝐵
×( 𝑆 𝐴
+ 𝑆 𝐵
)
(𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 𝑉 𝐴
,𝑉𝐵 +1)
𝐸𝑟𝑟𝑜𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 𝑂(1 𝑁)
𝑆′𝑖 =
𝑆𝑖 𝑖𝑓 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑛𝑜𝑟 𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝑆𝑖 × 𝑎𝑣𝑔𝑆𝑏𝑗𝑆𝑒𝑙 𝑆 𝑝 𝑖𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝑆𝑖 × 𝑎𝑣𝑔𝑂𝑏𝑗𝑆𝑒𝑙 𝑆 𝑝 𝑖𝑓𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
DAW
• A combination of MIPs with compact data
summaries
• Use average selectivities values for bound
subject and objects
• Can be combined with any existing SPARQL
endpoint federation system
• Can be used for partial result retrieval
DAW Index
[] a sd:Service ;
sd:endpointUrl <http://localhost:8890/sparql> ;
sd:capability [
sd:predicate diseasome:name ;
sd:totalTriples 147 ;
sd:avgSbjSel ``0.0068'' ;
sd:avgObjSel ``0.0069'' ;
sd:MIPs ``-6908232 -7090543 -6892373 -7064247 ...''; ] ;
sd:capability [
sd:predicate diseasome:chromosomalLocation ;
sd:totalTtriples 160 ;
sd:avgSbjSel ``0.0062'' ;
sd:avgObjSel ``0.0072'' ;
sd:MIPs ``-7056448 -7056410 -6845713 -6966021 ...''; ] ;
Triple Pattern-wise source ranking and skipping
Evaluation Setup
Dataset
Total Size
(MB)
Index Size
(bytes) No of Slice Discrepancy
No of Dup.
Slices
Index Gen.
Time (sec)
Diseasome 18.62 0.17 10 1500 1 4
Geo 274.14 1.63 10 50000 2 133
LinkedMDB 448.93 1.66 10 100000 1 201
Publication 39.07 0.2 10 2500 1 6
Queries Distribution
Dataset STP S-1 S-2 P-1 P-2 P-3 Total
Diseasome 5 5 5 4 5 2 26
Geo 5 5 5 - - - 15
LinkedMDB 5 - - - - - 5
Publication 5 5 5 7 7 4 33
Total 20 15 15 11 12 6 79
EndPoint CPU(GHz) RAM Hard Disk
12.2. i3 4GB 300GB
22.9. i7 16GB 256GB SSD
32.6. i5 4GB 150GB
42.53. i5 4GB 300GB
52.3. i5 4GB 500GB
62.53. i5 4GB 300GB
72.9. i7 8GB 450GB
82.6. i5 8GB 400GB
92.6. i5 8GB 400GB
102.9. i7 16GB 500GB
• Slice generator tool [1] for random slicing and
duplicates
• We have extended FedX, SPLENDID, DARQ with
DAW
[1] http://goo.gl/trjGSJ
Triple Pattern-wise sources skipped
DARQ
Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall
Diseasome 14(35) 30(77) 40(107) 35(65) 65(125) 30(50) 214(459) 100%
Geo 22(40) 23(55) 37(101) - - -82(196) 99.99%
LinkedMDB 22(38) - - - - -22(38) 100%
Publication 9(30) 10(37) 15(86) 14(60) 21(120) 32(102) 101(435) 100%
Total 67(143) 63(169) 92(294) 49(294) 86(245) 62(152) 419(1128)
FedX and SPLENDID
Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall
Diseasome 7(28) 30(77) 40(107) 35(65) 65(125) 30(50) 207(452) 100%
Geo 19(37) 23(55) 37(101) - - -79(193) 99.99%
LinkedMDB 15(31) - - - - -15(31) 100%
Publication 3(24) 10(37) 15(86) 14(60) 21(120) 32(102) 95(429) 100%
Total 44(120) 63(169) 92(294) 49(125) 86(245) 62(152) 396(1105)
Triple Pattern-wise sources skipped
DARQ
Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall
Diseasome 14(35) 30(77) 40(107) 35(65) 65(125) 30(50) 214(459) 100%
Geo 22(40) 23(55) 37(101) - - -82(196) 99.99%
LinkedMDB 22(38) - - - - -22(38) 100%
Publication 9(30) 10(37) 15(86) 14(60) 21(120) 32(102) 101(435) 100%
Total 67(143) 63(169) 92(294) 49(294) 86(245) 62(152) 419(1128)
FedX and SPLENDID
Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall
Diseasome 7(28) 30(77) 40(107) 35(65) 65(125) 30(50) 207(452) 100%
Geo 19(37) 23(55) 37(101) - - -79(193) 99.99%
LinkedMDB 15(31) - - - - -15(31) 100%
Publication 3(24) 10(37) 15(86) 14(60) 21(120) 32(102) 95(429) 100%
Total 44(120) 63(169) 92(294) 49(125) 86(245) 62(152) 396(1105)
FedX Extension with DAW
0
1
2
3
4
5
6
STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP
Diseasome Publication Geo Data Movie
Executiontime(sec)
FedX
DAW
Over all performance Evaluation
Diseasome Publication Geo Data Movie Overall
Average Gain % Average Gain % Average Gain % Average Gain % Average Gain %
FedX 2.44
18.79
1.48
-12.38
4.60
14.71
1.74
7.59
2.44
9.76
DAW 1.98 1.67 3.92 1.61 2.20
SPLENDID Extension with DAW
0
1
2
3
4
5
6
7
8
9
10
STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP
Diseasome Publication Geo Movie
Executiontime(sec)
SPLENDID
DAW
Over all performance Evaluation
Diseasome Publication Geo Data Movie Overall
Average Gain % Average Gain % Average Gain % Average Gain % Average Gain %
SPLENDID 3.78 19.48 2.18 -8.94 7.27 14.40 1.9 11.16 3.71 11.11
DAW 3.04 2.37 6.22 1.688 3.30
DARQ Extension with DAW
0
5
10
15
20
25
30
35
40
STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP
Diseasome Publication Geo Movie
Executiontime(sec)
DARQ
DAW
Over all performance Evaluation
Diseaso
me
Publicati
on Geo Data Movie Overall
Average Gain % Average Gain % Average Gain % Average Gain % Average Gain %
DARQ 8.27
23.34
5.26
6.14
23.44
16.31
1.96
13.88
9.59
16.46
DAW 6.34 4.94 19.62 1.688 8.01
Source Ranking vs Recall
0
20
40
60
80
100
120
Recallin%
Ranked Sources
Optimal
DAW
0
20
40
60
80
100
120
Recallin%
Ranked Sources
Optimal
DAW
Diseasome Publication
Conclusion and Future Work
• A sub-query can retrieve results that are already retrieved by another query
– Resources are wasted
– Query runtime is increased
– Extra traffic is generated
• Sketches all alone cannot be used due to expressive nature of SPARQL queries
• We used MIPs applied to RDF predicates along with compact data summaries
• Performance improvement
– FedX : 9.76 %
– SPLENDID: 11.11 %
– DAW: 16.76 %
• The effect of MIPs sizes and threshold values to find the optimal trade-off
between execution time and recall will be explored
saleem@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany

Mais conteúdo relacionado

Mais procurados

05 Analysis of Algorithms: Heap and Quick Sort - Corrected
05 Analysis of Algorithms: Heap and Quick Sort - Corrected05 Analysis of Algorithms: Heap and Quick Sort - Corrected
05 Analysis of Algorithms: Heap and Quick Sort - CorrectedAndres Mendez-Vazquez
 
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)Hansol Kang
 
Performance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part IIPerformance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part IIGeorge Markomanolis
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)Hansol Kang
 
Heaps
HeapsHeaps
HeapsIIUM
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNetAI Frontiers
 
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Cheng Chen
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
K10692 control theory
K10692 control theoryK10692 control theory
K10692 control theorysaagar264
 
PyCon Ukraine 2017: Operational Transformation
PyCon Ukraine 2017: Operational Transformation PyCon Ukraine 2017: Operational Transformation
PyCon Ukraine 2017: Operational Transformation Max Klymyshyn
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningMax Kleiner
 

Mais procurados (18)

05 Analysis of Algorithms: Heap and Quick Sort - Corrected
05 Analysis of Algorithms: Heap and Quick Sort - Corrected05 Analysis of Algorithms: Heap and Quick Sort - Corrected
05 Analysis of Algorithms: Heap and Quick Sort - Corrected
 
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
딥러닝 중급 - AlexNet과 VggNet (Basic of DCNN : AlexNet and VggNet)
 
Efficient Programs
Efficient ProgramsEfficient Programs
Efficient Programs
 
rgDefense
rgDefensergDefense
rgDefense
 
002 ray modeling dynamic systems
002 ray modeling dynamic systems002 ray modeling dynamic systems
002 ray modeling dynamic systems
 
Cs262 2006 lecture6
Cs262 2006 lecture6Cs262 2006 lecture6
Cs262 2006 lecture6
 
Performance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part IIPerformance Analysis with Scalasca, part II
Performance Analysis with Scalasca, part II
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
Heaps
HeapsHeaps
Heaps
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
 
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
 
CLUSTERGRAM
CLUSTERGRAMCLUSTERGRAM
CLUSTERGRAM
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Ac cuda c_6
Ac cuda c_6Ac cuda c_6
Ac cuda c_6
 
K10692 control theory
K10692 control theoryK10692 control theory
K10692 control theory
 
PyCon Ukraine 2017: Operational Transformation
PyCon Ukraine 2017: Operational Transformation PyCon Ukraine 2017: Operational Transformation
PyCon Ukraine 2017: Operational Transformation
 
EKON22 Introduction to Machinelearning
EKON22 Introduction to MachinelearningEKON22 Introduction to Machinelearning
EKON22 Introduction to Machinelearning
 

Semelhante a DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文doboncho
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformaticsShanker Trivedi
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Algorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationAlgorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationFederico Cerutti
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojurePaul Lam
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming languageAshwini Mathur
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
GDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしようGDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしようSatoshi Noda
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorJinho Lee
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RDr. Volkan OBAN
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumertirlukachaitanya
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascaleinside-BigData.com
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesBobby Curtis
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Altinity Ltd
 

Semelhante a DAW: Duplicate-AWare Federated Query Processing over the Web of Data (20)

Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Algorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions EnumerationAlgorithm Selection for Preferred Extensions Enumeration
Algorithm Selection for Preferred Extensions Enumeration
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming language
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
GDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしようGDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしよう
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 
2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer2015-10-23_wim_davis_r_slides.pptx on consumer
2015-10-23_wim_davis_r_slides.pptx on consumer
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Achitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and ExascaleAchitecture Aware Algorithms and Software for Peta and Exascale
Achitecture Aware Algorithms and Software for Peta and Exascale
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 

Mais de Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...Muhammad Saleem
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetMuhammad Saleem
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesMuhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseMuhammad Saleem
 

Mais de Muhammad Saleem (19)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data CubesSAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Último

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 

Último (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

  • 1. DAW: Duplicate-AWare Federated Query Processing over the Web of Data Muhammad Saleem1 , Axel-Cyrille Ngonga Ngomo1, Josiane Xavier Parreira2 , Helena F. Deus3 , Manfred Hauswirth2 1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany lastname@informatik.uni-leipzig.de 2Digital Enterprise Research Institute(DERI), National University of Ireland.,Galway firstname.lastnameg@deri.org International Semantic Web Conference (ISWC), October 21-25 , 2013, Sydney, Australia
  • 2. Motivation S1 S2 S3 S4 RDF RDF RDF RDF Parser Source Selection Federator Optimzer Integrator Get Individual Triple Patterns Identify capable source against Individual Triple Patterns Generate optimized sub- query Exe. Plan Integrate sub- queries results Execute sub- queries
  • 3. Motivation SELECT ?v1 ?v2 WHERE { ?uri <p1> ?v1. // Triple Pattern 1 (TP1) ?uri <p2> ?v2. // Triple Pattern 2 (TP2) } S1 RDF Source Selection Algorithm S2 RDF S3 RDF S4 RDF Triple pattern-wise source selection S1 S2 S3TP1 = S4TP2 = S2S1 Total triple pattern-wise selected sources = 6
  • 4. Motivation Retrieved results for TP1 (?uri <p1> ?v1) Retrieved results for TP2 (?uri <p2> ?v2) Triple pattern-wise source selection and skipping S1 S2 S3TP1 = Total triple pattern-wise selected sources = 4 S1 S2TP2 = S4 Min. number of new triples (threshold) = 20 Total triple pattern-wise skipped sources = 2
  • 5. Problem Statement • Data duplication in LOD datasets – E.g. DrugBank and Neurocommons are duplicated at DERI health Care and Life Sciences Knowledge Base • Duplicate results retrieval increase the query execution time and network traffic • How to estimate the overlap between data sources before sub-queries federation?
  • 6. Sketches • Data structures that provide dataset summaries – Min-wise Independent Permutations (MIPs) – Bloom filters • Estimate overlap among different ID sets • MIPs provide good tradeoff between estimation error and space requirements • MIPs of different lengths can be compared • Sketches all alone cannot be used in SPARQL federation – SPARQL queries are highly selective when subject, predicate, or object becomes bound in a triple pattern
  • 7. Min-wise Independent Permutations 48 24 36 18 820 21 3 12 24 877 9 21 15 24 4640 21 18 45 30 339 h1 = (7x + 3) mod 51 h2 = (5x + 6) mod 51 hN = (3x + 9) mod 51 8 9 9 Apply Permutations to all ID’s ID set Create MIP Vector from Minima of Permutation s 8 9 30 24 36 9 8 24 20 48 36 13 MIPs estimated operations h(concat(s,o)) T4(s,p,o) T5(s,p,o) T6(s,p,o) T1(s,p,o) T2(s,p,o) T3(s,p,o) Triples VA VB 8 9 20 24 36 9 Union (VA , VB) Resemblance (VA , VB ) = 2/6 => 0.33 Overlap (VA , VB ) = 0.33*(6+6) / (1+0.33) => 3 hi = ai∗x + bimod U 𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 (𝑆𝐴, 𝑆 𝐵) = 𝑆 𝐴⋂𝑆𝐵 𝑆 𝐴⋃𝑆𝐵 ≈ |VA⋂VB| 𝑁 Overlap (𝑆𝐴, 𝑆 𝐵)≈ 𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 𝑉 𝐴 ,𝑉 𝐵 ×( 𝑆 𝐴 + 𝑆 𝐵 ) (𝑅𝑒𝑠𝑒𝑚𝑏𝑙𝑎𝑛𝑐𝑒 𝑉 𝐴 ,𝑉𝐵 +1) 𝐸𝑟𝑟𝑜𝑟 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛 = 𝑂(1 𝑁) 𝑆′𝑖 = 𝑆𝑖 𝑖𝑓 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑛𝑜𝑟 𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑 𝑆𝑖 × 𝑎𝑣𝑔𝑆𝑏𝑗𝑆𝑒𝑙 𝑆 𝑝 𝑖𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑 𝑆𝑖 × 𝑎𝑣𝑔𝑂𝑏𝑗𝑆𝑒𝑙 𝑆 𝑝 𝑖𝑓𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
  • 8. DAW • A combination of MIPs with compact data summaries • Use average selectivities values for bound subject and objects • Can be combined with any existing SPARQL endpoint federation system • Can be used for partial result retrieval
  • 9. DAW Index [] a sd:Service ; sd:endpointUrl <http://localhost:8890/sparql> ; sd:capability [ sd:predicate diseasome:name ; sd:totalTriples 147 ; sd:avgSbjSel ``0.0068'' ; sd:avgObjSel ``0.0069'' ; sd:MIPs ``-6908232 -7090543 -6892373 -7064247 ...''; ] ; sd:capability [ sd:predicate diseasome:chromosomalLocation ; sd:totalTtriples 160 ; sd:avgSbjSel ``0.0062'' ; sd:avgObjSel ``0.0072'' ; sd:MIPs ``-7056448 -7056410 -6845713 -6966021 ...''; ] ;
  • 10. Triple Pattern-wise source ranking and skipping
  • 11. Evaluation Setup Dataset Total Size (MB) Index Size (bytes) No of Slice Discrepancy No of Dup. Slices Index Gen. Time (sec) Diseasome 18.62 0.17 10 1500 1 4 Geo 274.14 1.63 10 50000 2 133 LinkedMDB 448.93 1.66 10 100000 1 201 Publication 39.07 0.2 10 2500 1 6 Queries Distribution Dataset STP S-1 S-2 P-1 P-2 P-3 Total Diseasome 5 5 5 4 5 2 26 Geo 5 5 5 - - - 15 LinkedMDB 5 - - - - - 5 Publication 5 5 5 7 7 4 33 Total 20 15 15 11 12 6 79 EndPoint CPU(GHz) RAM Hard Disk 12.2. i3 4GB 300GB 22.9. i7 16GB 256GB SSD 32.6. i5 4GB 150GB 42.53. i5 4GB 300GB 52.3. i5 4GB 500GB 62.53. i5 4GB 300GB 72.9. i7 8GB 450GB 82.6. i5 8GB 400GB 92.6. i5 8GB 400GB 102.9. i7 16GB 500GB • Slice generator tool [1] for random slicing and duplicates • We have extended FedX, SPLENDID, DARQ with DAW [1] http://goo.gl/trjGSJ
  • 12. Triple Pattern-wise sources skipped DARQ Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall Diseasome 14(35) 30(77) 40(107) 35(65) 65(125) 30(50) 214(459) 100% Geo 22(40) 23(55) 37(101) - - -82(196) 99.99% LinkedMDB 22(38) - - - - -22(38) 100% Publication 9(30) 10(37) 15(86) 14(60) 21(120) 32(102) 101(435) 100% Total 67(143) 63(169) 92(294) 49(294) 86(245) 62(152) 419(1128) FedX and SPLENDID Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall Diseasome 7(28) 30(77) 40(107) 35(65) 65(125) 30(50) 207(452) 100% Geo 19(37) 23(55) 37(101) - - -79(193) 99.99% LinkedMDB 15(31) - - - - -15(31) 100% Publication 3(24) 10(37) 15(86) 14(60) 21(120) 32(102) 95(429) 100% Total 44(120) 63(169) 92(294) 49(125) 86(245) 62(152) 396(1105)
  • 13. Triple Pattern-wise sources skipped DARQ Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall Diseasome 14(35) 30(77) 40(107) 35(65) 65(125) 30(50) 214(459) 100% Geo 22(40) 23(55) 37(101) - - -82(196) 99.99% LinkedMDB 22(38) - - - - -22(38) 100% Publication 9(30) 10(37) 15(86) 14(60) 21(120) 32(102) 101(435) 100% Total 67(143) 63(169) 92(294) 49(294) 86(245) 62(152) 419(1128) FedX and SPLENDID Dataset STP S-1 S-2 P-1 P-2 P-3 Total Recall Diseasome 7(28) 30(77) 40(107) 35(65) 65(125) 30(50) 207(452) 100% Geo 19(37) 23(55) 37(101) - - -79(193) 99.99% LinkedMDB 15(31) - - - - -15(31) 100% Publication 3(24) 10(37) 15(86) 14(60) 21(120) 32(102) 95(429) 100% Total 44(120) 63(169) 92(294) 49(125) 86(245) 62(152) 396(1105)
  • 14. FedX Extension with DAW 0 1 2 3 4 5 6 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP Diseasome Publication Geo Data Movie Executiontime(sec) FedX DAW Over all performance Evaluation Diseasome Publication Geo Data Movie Overall Average Gain % Average Gain % Average Gain % Average Gain % Average Gain % FedX 2.44 18.79 1.48 -12.38 4.60 14.71 1.74 7.59 2.44 9.76 DAW 1.98 1.67 3.92 1.61 2.20
  • 15. SPLENDID Extension with DAW 0 1 2 3 4 5 6 7 8 9 10 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP Diseasome Publication Geo Movie Executiontime(sec) SPLENDID DAW Over all performance Evaluation Diseasome Publication Geo Data Movie Overall Average Gain % Average Gain % Average Gain % Average Gain % Average Gain % SPLENDID 3.78 19.48 2.18 -8.94 7.27 14.40 1.9 11.16 3.71 11.11 DAW 3.04 2.37 6.22 1.688 3.30
  • 16. DARQ Extension with DAW 0 5 10 15 20 25 30 35 40 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 P-1 P-2 P-3 STP S-1 S-2 STP Diseasome Publication Geo Movie Executiontime(sec) DARQ DAW Over all performance Evaluation Diseaso me Publicati on Geo Data Movie Overall Average Gain % Average Gain % Average Gain % Average Gain % Average Gain % DARQ 8.27 23.34 5.26 6.14 23.44 16.31 1.96 13.88 9.59 16.46 DAW 6.34 4.94 19.62 1.688 8.01
  • 17. Source Ranking vs Recall 0 20 40 60 80 100 120 Recallin% Ranked Sources Optimal DAW 0 20 40 60 80 100 120 Recallin% Ranked Sources Optimal DAW Diseasome Publication
  • 18. Conclusion and Future Work • A sub-query can retrieve results that are already retrieved by another query – Resources are wasted – Query runtime is increased – Extra traffic is generated • Sketches all alone cannot be used due to expressive nature of SPARQL queries • We used MIPs applied to RDF predicates along with compact data summaries • Performance improvement – FedX : 9.76 % – SPLENDID: 11.11 % – DAW: 16.76 % • The effect of MIPs sizes and threshold values to find the optimal trade-off between execution time and recall will be explored saleem@informatik.uni-leipzig.de AKSW, University of Leipzig, Germany