Quick Start Guide: Online Registration with Event Manager for Golf

1. 2nd SEALS Yards-cks for Ontology Management

2. 2nd SEALS Yards-cks for Ontology Management •  Conformance and interoperability results •  Scalability results •  Conclusions 2

3. Conformance evalua-on •  Ontology language conformance –  The ability to adhere to exis-ng ontology language speciﬁca-ons •  Goal: to evaluate the conformance of seman-c technologies with regards to ontology representa-on languages Tool X O1 O1’ O1’’ Step 1: Import + Export O1 = O1’’ + α - α’3

4. Metrics •  Execu9on informs about the correct execu-on: –  OK. No execu-on problem –  FAIL. Some execu-on problem –  Pla+orm Error (P.E.) PlaKorm excep-on •  Informa9on added or lost in terms of triples, axioms, etc. Oi = Oi’ + α - α’•  Conformance informs whether the ontology has been processed correctly with no addi-on or loss of informa-on: –  SAME if Execuon is OK and Informaon added and Informaon lost are void –  DIFFERENT if Execuon is OK but Informaon added or Oi = Oi’ ? Informaon lost are not void –  NO if Execuon is FAIL or P.E. 4

5. Interoperability evalua-on •  Ontology language interoperability –  The ability to interchange ontologies and use them •  Goal: to evaluate the interoperability of seman-c technologies in terms of the ability that such technologies have to interchange ontologies and use them Tool X Tool Y O1 O1’ O1’’ O1’’’ O1’’’’ Step 1: Import + Export Step 2: Import + Export O1 = O1’’ + α - α’ O1’’=O1’’’’ + β - β’ Interchange O1 = O1’’’’ + α - α’ + β - β’5

6. Metrics •  Execu9on informs about the correct execu-on: –  OK. No execu-on problem –  FAIL. Some execu-on problem –  Pla+orm Error (P.E.) PlaKorm excep-on –  Not Executed. (N.E.) Second step not executed •  Informa9on added or lost in terms of triples, axioms, etc. Oi = Oi’ + α - α’•  Interchange informs whether the ontology has been interchanged correctly with no addi-on or loss of informa-on: –  SAME if Execuon is OK and Informaon added and Informaon lost are void –  DIFFERENT if Execuon is OK but Informaon added or Informaon lost are not void Oi = Oi’ ? –  NO if Execuon is FAIL, N.E., or P.E. 6

7. Test suites used Name Deﬁni9on Nº Tests RDF(S) Import Test Suite Manual 82 OWL Lite Import Test Suite Manual 82 OWL DL Import Test Suite Keyword-‐driven generator 561 OWL Full Import Test Suite Manual 90 OWL Content PaXern Expressive generator 81 OWL Content PaXern Expressive Expressive generator 81 OWL Content PaXern Full Expressive Expressive generator 81 7

8. Tools evaluated 1st Evalua-on Campaign 2nd Evalua-on Campaign 8

9. Evalua-on Execu-on •  Evalua-ons automa-cally performed with the SEALS PlaKorm –  hXp://www.seals-‐project.eu/ SEALS•  Evalua-on materials available Test Suite Test Suite Test Suite Raw Result –  Test Data –  Results Test Suite Interpretation –  Metadata Conformance Interoperability Scalability9

10. Dynamic result visualiza-on 10

11. RDF(S) conformance results •  Jena and Sesame behave iden-cally (no problems) •  The behaviour of the OWL API-‐ based tools (NeOn Toolkit, OWL API and Protégé 4) has signiﬁcantly changed –  Transform ontologies to OWL 2 –  Some problems •  Less in newer versions •  Protégé OWL improves 11

12. OWL Lite conformance results •  Jena and Sesame behave iden-cally (no problems) •  The OWL API-‐based tools (NeOn Toolkit, OWL API and Protégé 4) improve –  Transform ontologies to OWL 2 •  Protégé OWL improves 12

13. OWL DL conformance results •  Jena and Sesame behave iden-cally (no problems) •  OWL API and Protégé 4 improve •  NeOn Toolkit worsenes •  Protégé OWL behaves iden-cally •  Robustness increases 13

14. Content paXern conformance results •  New issues iden-ﬁed in the OWL API-‐based tools (NeOn Toolkit, OWL API and Protégé 4) •  New issue iden-ﬁed in Protégé 4 •  No new issues 14

15. Interoperability results 1st Evalua-on 2nd Evalua-on Campaign Campaign •  Same analysis as in conformance •  OWL DL: New issue found in interchanges from Protégé 4 to Protégé OWL •  Conclusions: –  RDF-‐based tool have no interoperability problems –  OWL-‐based tools have no interoperability problems with OWL Lite but have some with OWL DL. –  Tools based on the OWL API cannot interoperate using RDF(S) (they convert ontologies into OWL 2) 04.08.2010 15

17. Scalability evalua-on Tool X O1 O1’ O1’’ Step 1: Import + Export O1 = O1’’ + α - α’17

18. Execu-on se]ngs Test suites: •  Real World. Complex ontologies from biological and medical domains •  Real World NCI. Thesaurus subsets (1.5-‐2 -mes bigger) •  LUBM. Synthe-c ontologies Execu9on Environment: •  Win7-‐64bit, Intel Core 2 Duo CPU, 2.40GHz, 4.00 GB RAM (Real World Ontologies Test Collecons) •  WinServer-‐64bit, AMD Dual Core, 2.60 GHz (4 Processors), 8.00 GB RAM (LUBM Ontologies Test Collecon) Constraint: •  30 min threshold per test case 18

19. Real World Scalability Test Suite Test Size Triples Protégé Protégé4 Protégé OWL API OWL API Neon Neon Jena v. Sesame MB OWL v.41 4 v.42 v.310 v.324 v.232 v.252 270 v.265 RO1 0.2 3K 5 (sec) 2 2 2 2 3 2 3 2 RO2 0.6 4K 2 2 2 2 2 2 2 3 1 RO3 1 11K 11 3 4 12 5 7 7 8 2 RO4 3 31K 4 5 5 5 4 5 5 5 3 RO5 4 82K 8 8 10 7 7 12 7 8 4 RO6 6 92K 8 9 12 9 9 11 14 9 4 RO7 10 135K 10 11 11 11 10 13 11 10 4 RO8 10 167K 14 9 8 8 9 11 11 12 4 RO9 20 270K 22 20 24 18 16 19 19 18 7 R10 24 315K 68 21 24 19 18 26 20 19 8 R11 26 346K 162 25 19 22 21 27 22 22 9 R12 40 407K -‐ 24 22 26 23 28 30 26 9 R13 44 646K -‐ 36 33 35 34 44 40 37 13 R14 46 671K -‐ 30 27 28 28 35 37 41 13 R15 84 864K -‐ 34 26 32 26 36 33 69 21 R16 117 1623K -‐ -‐ -‐ -‐ -‐ -‐ -‐ 102 33 19

20. Real World NCI Scalability Test Suite Test Size Triples Protégé Protégé4 Protégé4 OWL API OWL API NTK v. NTK v. Jena v. Sesame MB OWL v.41 v.42 v.310 v.324 232 252 270 v.265 NO1 0.5 3.6K 10 (sec) 5 6 4 3 4 4 4 2 NO2 0.6 4.3K 4 3 3 3 3 3 3 3 2 NO3 1 11K 5 4 4 4 4 4 4 3 2 NO4 4 31K 9 5 8 5 5 6 5 5 3 NO5 11 82K 13 7 10 8 8 9 8 9 5 NO6 14 109K 17 8 10 9 10 10 10 10 5 NO7 18 135K 19 9 12 10 10 12 12 11 5 NO8 23 167K 23 10 14 11 11 13 13 14 7 NO9 38 270K 37 15 16 15 13 18 17 20 9 N10 44 314K 74 16 18 16 17 21 19 23 10 N11 48 347K 136 17 19 16 18 21 20 24 10 N12 56 407K -‐ 20 22 19 19 26 24 30 13 N13 89 646K -‐ 29 28 28 29 39 35 47 18 N14 92 671K -‐ 28 32 28 29 39 35 49 21 N15 118 864K -‐ 34 36 34 36 48 45 63 26 N16 211 1540K -‐ 61 61 62 71 83 100 282 41 20

21. LUBM Test Suite Test Size Protégé Protégé4 Protégé4 OWL API OWL API NTK v. NTK v. Jena v. Sesame MB OWL v.41 v.42 v.310 v.324 232 252 270 v.265 LO1 8 29 20 25 15 29 11 16 17 5 LO2 19 1M52 19 30 18 30 16 22 30 8 LO3 28 2M59 17 28 27 40 20 26 42 10 LO4 39 4M05 24 33 33 41 28 39 47 12 LO5 51 17M27 36 40 -‐ 54 -‐ 54 59 14 LO6 60 22M43 41 45 -‐ 60 -‐ 1M04 1M03 16 LO7 72 26M32 1M1 53 -‐ 1M18 -‐ 1M28 1M17 19 LO8 82 -‐ 1M16 59 -‐ 1M3 -‐ -‐ 1M27 20 LO9 92 -‐ 1M37 1M8 -‐ 2M12 -‐ -‐ 1M39 23 L10 105 -‐ 2M2 1M31 -‐ 2M53 -‐ -‐ 1M48 27 L11 116 -‐ 3M18 -‐ -‐ -‐ -‐ -‐ 2M02 33 L12 129 -‐ 4M59 -‐ -‐ -‐ -‐ -‐ 2M15 35 L13 143 -‐ 7M21 -‐ -‐ -‐ -‐ -‐ 2M33 40 L14 153 -‐ 9M07 -‐ -‐ -‐ -‐ -‐ 2M4 42 L15 162 -‐ 11M23 -‐ -‐ -‐ -‐ -‐ 2M52 43 L16 174 -‐ 14M09 -‐ -‐ -‐ -‐ -‐ 3M02 44 L17 184 -‐ 17M -‐ -‐ -‐ -‐ -‐ 3M2 46 L18 197 -‐ 23M05 -‐ -‐ -‐ -‐ -‐ 3M34 51 L19 251 -‐ 27M21 -‐ -‐ -‐ -‐ -‐ 3M49 1M12 21

22. LUBM Test Suite (II) Test Size , Protégé4 Jena v. Sesame Test Size , Sesame v. Test Size , Sesame v. MB v.41 270 v.265 MB 265 MB 265 L20 263 -‐ 4M05 1M11 L36 412 1M44 Le51 1,105 -‐ L21 284 -‐ 4M17 1M03 L37 421 1M45 Le52 1,205 -‐ L22 242 -‐ 4M18 1M07 L38 430 1M49 Le53 1,302 -‐ L23 251 -‐ 4M36 1M03 L39 441 1M49 Le54 1,404 -‐ L24 263 -‐ 4M56 1M07 L40 453 1M55 Le55 1,514 -‐ L25 284 -‐ 5M31 1M17 L41 467 2M05 L26 297 -‐ 5M35 1M18 L42 480 2M04 L27 307 -‐ 5M46 1M22 L43 489 2M14 L28 317 -‐ 6M09 1M27 L44 498 2M13 L29 330 -‐ 6M13 1M3 L45 510 2M23 L30 340 -‐ 6M23 1M3 LUBM EXTENDED TEST SUITE L31 354 -‐ 8M03 1M35 Le46 598 2M49 L32 363 -‐ 8M07 1M31 16M58 Le47 705 L33 375 -‐ 9M19 1M33 Le48 802 -‐ L34 386 -‐ -‐ 1M3 Le49 906 -‐ L35 399 -‐ -‐ 1M39 Le50 1,001 -‐ 22

24. Conclusions – Test data •  Test suites are not exhaus-ve –  The new test suites helped detec-ng new issues •  A more expressive test suite does not imply detec-ng more issues •  We used exis-ng ontologies as input for the test data generator –  Requires a previous analysis of the ontologies to detect defects –  We found ontologies with issues that we had to correct 24

25. Conclusions -‐ Results •  Tools have improved their conformance, interoperability, and robustness •  High influence of development decisions –  the OWL API radically changed the way of dealing with RDF ontologies •  need tools for easy evalua-on •  need stronger regression tes-ng •  The automated genera-or defined test cases that a person would have never though about but which iden-fied new tool issues •  using bigger ontologies for conformance and interoperability tes-ng makes much more difficult to find problems in the tools 25

26. Evaluating Storage and Reasoning Systems

27. Index•  Evaluation scenarios•  Evaluation descriptions•  Test data•  Tools•  Results•  Conclusion

28. Advanced reasoning system •  Descrip-on logic based system (DLBS) •  Standard reasoning services –  Classifica-on –  Class sa-sfiability –  Ontology sa-sfiability –  Logical entailment

29. Exis-ng evalua-ons •  Datasets –  Synthe-c genera-on –  Hand crajed ontologies –  Real-‐world ontologies •  Evalua-ons –  KRSS benchmark –  TANCS benchmark –  Gardiner dataset 04.08.201029

30. Evaluation criteria•  Interoperability –  the capability of the software product to interact with one or more specified systems –  a system must •  conform to the standard input formats •  be able to perform standard inference services•  Performance –  the capability of the software to provide appropriate performance, relative to the amount of resources used, under stated conditions

31. Evaluation metrics•  Interoperability –  Number of tests passed without parsing errors –  Number of inference tests passed•  Performance –  Loading time –  Inference time

32. Class satisfiability evaluation•  Standard inference service that is widely used in ontology engineering•  The goal: to assess both DLBS s interoperability and performance•  Input –  OWL ontology –  One or several class IRIs•  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe

33. Class satisfiability evaluation

34. Ontology satisfiability evaluation•  Standard inference service typically carried out before performing any other reasoning task•  The goal: to assess both DLBS s interoperability and performance•  Input –  OWL ontology•  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe

35. Ontology satisfiability evaluation

36. Classification evaluation•  Inference service that is typically carried out after testing ontology satisfiability and prior to performing any other reasoning task•  The goal: to assess both DLBS s interoperability and performance•  Input –  OWL ontology•  Output –  OWL ontology –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe

37. Classification evaluation

38. Logical entailment evaluation•  Standard inference service that is the basis for query answering•  The goal: to assess both DLBS s interoperability and performance•  Input –  2 OWL ontologies•  Output –  TRUE the evaluation outcome coincide with expected result –  FALSE the evaluation outcome differ from expected outcome –  ERROR indicates IO error –  UNKNOWN indicates that the system is unable to compute inference in the given timeframe

39. Logical entailment

40. Storage and reasoning systems evaluation component•  SRS component is intended to evaluate the description logic based systems (DLBS) –  Implementing OWL-API 3 de-facto standard for DLBS –  Implementing SRS SEALS DLBS interface•  SRS supports test data in all syntactic formats supported by OWL-API 3•  SRS saves the evaluation results and interpretations in MathML 3 format

41. DLBS interface•  Java methods to be implemented by system developers –  OWLOntology loadOntology(IRI iri) –  boolean isSatisfiable(OWLOntology onto, OWLClass class) –  boolean isSatisfiable(OWLOntology onto) –  OWLOntology classifyOntology(OWLOntology onto) –  URI saveOntology(OWLOntology onto, IRI iri) –  boolean entails(OWLOntology onto1, OWLOntology onto2)

42. Testing Data•  The ontologies from the Gardiner evaluation suite. –  Over 300 ontologies of varying expressivity and size.•  Various versions of the GALEN ontology•  Various ontologies that have been created in EU funded projects, such as SEMINTEC, VICODI and AEO•  155 entailment tests from OWL 2 test cases repository

43. Evaluation setup•  3 DLBSs –  FaCT++ C++ implementa-on of FaCT OWL DL reasoner –  HermiT Java based OWL DL reasoner u-lizing novel hypertableau algorithms –  Jcel Java based OWL 2 EL reasoner –  FaCT++C evaluated without OWL prepareReasoner() call –  HermiTC evaluated without OWL prepareReasoner() call •  2 AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ machines with 2GB of main memory –  DLBSs were allowed to allocate up to 1 GB

44. Evaluation results: Classification FaCT++ HermiT jcelALT, ms 68 506 856ART, ms 15320 167808 2144TRUE 160 145 16FALSE 0 0 0ERROR 47 33 4UNKNOWN 3 32 0

45. Evaluation results: Class satisfiability FaCT++ HermiT jcelALT, ms 1047 255 438ART, ms 21376 517043 1113TRUE 157 145 15FALSE 1 0 0ERROR 36 35 5UNKNOWN 16 30 0

46. Evaluation results: Ontology satisfiability FaCT++ HermiT jcelALT, ms 1315 410 708ART, ms 25175 249802 1878TRUE 134 146 16FALSE 0 0 0ERROR 45 33 4UNKNOWN 0 31 0

47. Evaluation results: Entailment FaCT++ HermiTALT, ms 14 33ART, ms 1 20673TRUE 46 119FALSE 67 14ERROR 34 9UNKNOWN 0 3

48. Evaluation results: Non entailment FaCT++ HermiTALT, ms 47 92ART, ms 5 127936TRUE 7 7FALSE 0 1ERROR 3 1UNKNOWN 0 1

49. Comparative evaluation: Classification FaCT++C HermiTCALT, ms 309 207ART, ms 3994 2272TRUE 112 112

50. Comparative evaluation: Class satisfiability FaCT++C HermiTCALT, ms 333 225ART, ms 216 391TRUE 113 113

51. Comparative evaluation: Ontology satisfiability FaCT++C HermiTCALT, ms 333 225ART, ms 216 391TRUE 113 113

52. Comparative evaluation: Entailment FaCT++C HermiTCALT, ms 7 7ART, ms 2 24TRUE 1 1

53. Comparative evaluation: Non- Entailment FaCT++C HermiTCALT, ms 22 18ART, ms 2 43TRUE 4 4

54. Comparative evaluation: Classification FaCT++C HermiTC FaCT++ HermiT jcelALT, ms 398 355 1471 771 856ART, ms 11548 1241 36650 2817 2144TRUE 16 16 16 16 16

55. Comparative evaluation: Class satisfiability FaCT++C HermiTC FaCT++ HermiT jcelALT, ms 382 342 532 1062 438ART, ms 159 223 7603 3437 1113TRUE 15 15 15 15 15

56. Comparative evaluation: Ontology satisfiability FaCT++C HermiTC FaCT++ HermiT jcelALT, ms 360 365 1389 1262 708ART, ms 11548 202 36650 4790 1878TRUE 16 16 16 16 16

57. Challenging ontologies: ClassificationOntology Mosquito GALEN mged go worm- -anatomy anatomyClasses 1864 2749 229 19528 6731Relations 2 413 102 1 5FaCT++C,LT ms 3760 663 189 4362 783FaCT++C,RT ms 9568 9970 355 28041 45739HermiTC,LT ms 510 609 273 4328 973HermiTC,RT ms 944 12623 27974 12698 2491

58. Challenging ontologies: ClassificationOntology plans information human Fly- emap anato myClasses 118 121 8342 6326 13731Relations 263 197 1 3 1FaCT++C, LT ms 67 106 3186 662 1965FaCT++C, RT ms 661 126 132607 5016 156714HermiTC, LT ms 67 95 1192 746 1311HermiTC, RT ms 115576 7064 3842 6564 7097

59. Challenging ontologies: Class satisfiabilityOntology not GALEN mged go plans GALENClass Digestion Trimetho Thing GO_0042 schedule prim 447Classes 3087 2749 229 19528 118Relations 413 413 102 1 263FaCT++C, LT 1130 652 174 4351 78FaCT++C, RT 3215 1065 160 1465 79HermiTC, LT 1087 680 358 3961 67HermiTC, RT 11210 9108 4333 2776 3459

60. Challenging ontologies: Ontology satisfiabilityOntology not GALEN mged go plans GALENClasses 3087 2749 229 19528 118Relations 413 413 102 1 263FaCT++C, LT 992 618 189 4383 67FaCT++C, RT 3047 1057 170 1413 74HermiTC, LT 1166 590 346 4371 69HermiTC, RT 11562 9408 3197 2687 1827

61. Conclusion•  Errors: –  datatypes not supported in the systems –  syntax related : a system was unable to register a role or a concept –  expressivity errors•  Execution time is dominated by small number of hard problems

62. SEALS Ontology Matching Evalua-on campaign … also known as OAEI 2011.5 6/6/1262

63. Ontology Matching Person People Author Author < Author, Author, =, 0.97 > writes CommiXeeMember < Paper, Paper, =, 0.94 > Reviewer < reviews, reviews, =, 0.91 > < writes, writes, =, 0.7 > PCMember < Person, People, =, 0.8 > reviews < Document, Doc, =, 0.7 > < Reviewer, Review, =, 0.6 > reviews … Doc Document Paper Paper writes Review 6/6/12 63

64. OAEI & SEALS •  OAEI : Ontology Alignment Evalua-on Ini-a-ve –  Organized as annual campaign from 2005 to 2012 –  Included in Ontology Matching workshop at ISWC –  Diﬀerent tracks (evalua-on scenarios) organized by diﬀerent researchers •  Star-ng in 2010: Support from SEALS –  OAEI 2010, OAEI 2011, and OAEI 2011.5 6/6/1264

65. OAEI 2011.5 par-cipants 6/6/1265

66. Jose Aguirre OAEI tracks Jerome Euzenat INRIA Grenoble •  Benchmark –  Matching diﬀerent versions of the same ontology –  Scalability: Size  run-mes •  Conference •  Mul-Farm •  Anatomy •  Large BioMed 6/6/1266

67. Ondřej Šváb-‐Zamazal OAEI tracks Vojtěch Svátek Prague University of Economics •  Benchmark •  Conference –  Same domain, diﬀerent ontology –  Manually generated reference alignment •  Mul-Farm •  Anatomy •  Large BioMed 6/6/1267

68. Chris-an Meilicke, OAEI tracks Cassia Trojahn University Mannheim INRIA Grenoble •  Benchmark •  Conference •  Mul-Farm: Mul-lingual Ontology Matching –  Based on Conference –  Testcases for Spanish, German, French, Russian, Portuguese, Czech, Dutch, Chinese •  Anatomy •  Large BioMed 6/6/1268

69. Chris-an Meilicke, OAEI tracks Heiner Stuckenschmidt University Mannheim •  Benchmark •  Conference •  Mul-Farm •  Anatomy –  Matching mouse on human anatomy –  Run-mes •  Large BioMed 6/6/1269

70. Ernesto Jimenez Ruiz OAEI tracks Bernardo Cuenca Grau Ian Horrocks University of Oxford •  Benchmark •  Conference •  Mul-Farm •  Anatomy •  Large BioMed –  Very large dataset (FMA-‐NCI) –  Includes coherence analysis 6/6/1270

71. Detailed results hXp://oaei.ontologymatching.org/2011.5/ results/index.html 6/6/1271

72. Ques-ons? Write a mail to Chris-an Meilicke chris-an@informa-k.uni-‐mannheim.de 6/6/1272

73. IWEST 2012 workshop located at ESWC 2012 Seman-c Search Systems Evalua-on Campaign 6/6/12 73

74. Two phase approach •  Seman-c search tools evalua-on demands a user-‐in-‐the-‐loop phase –  usability criterion •  Two phases: –  User-‐in-‐the-‐loop –  Automated 6/6/1274

75. Evalua-on criteria by phase Each phase will address a diﬀerent subset of criteria. •  Automated phase: query expressiveness, scalability, performance •  User-‐in-‐the-‐loop phase: usability, query expressiveness 6/6/1275

76. Par-cipants Tool Descrip9on UITL Auto K-‐Search Form-‐based x x Ginseng Natural language with constrained vocabulary and x grammar NLP-‐Reduce Natural language for full English ques-ons, sentence x fragments, and keywords. Jena Arq SPARQL query engine. Automated phase baseline x RDF.Net Query SPARQL-‐based x Seman-c Crystal Graph-‐based x Aﬀec-ve Graphs Graph-‐based x 6/6/12 76

77. Usability Evalua-on Setup •  Data: Mooney Natural Language Learning Data •  Subjects: 20 (10 expert users; 10 casual users) –  Each subject evaluated the 5 par-cipa-ng tools •  Task: Formulate 5 ques-ons in each tool’s interface •  Data Collected: success rate, input -me, number of aXempts, response -me, user sa-sfac-on ques-onnaires, demographics 04.08.201077

78. 1 concept, 1 rela-on Ques-ons 1) Give me all the capitals of the USA? 2 concepts, 2 rela-ons 2) What are the ci9es in states through which the Mississippi runs? compara-ve 3) Which states have a city named Columbia with a city popula-on over 50,000? superla-ve 4) Which lakes are in the state with the highest point? 5) Tell me which rivers do not traverse the nega-on state with the capital Nashville? 04.08.2010 78

79. Automated Evalua-on Setup •  Data: EvoOnt dataset –  Five sizes: 1K 10K 100K 1M 10M triples •  Task: Answer 10 ques-ons per dataset size •  Data Collected: ontology load -me, query -me, number of results, result list •  Analyses: precision, recall, f-‐measure, mean query -me, mean -me per result, etc 04.08.201079

80. Conﬁgura-on •  All tools executed on SEALS PlaKorm •  Each tool executed within a Virtual Machine Linux Windows OS Ubuntu 10.10 (64-‐bit) Windows 7 (64-‐bit) Num CPUs 2 4 Memory (GB) 4 4 Tools Arq v2.8.2 and Arq v2.9.0 RDF Query v0.5.1-‐beta 6/6/1280

81. FINDINGS -‐ USABILITY 6/6/1281

82. Graph-‐based tools most liked (highest ranks and average SUS scores) Tool 100.0 Semantic-Crystal •  Perceived by expert users System Usability Scale "SUS" Questionnaire score Affective-Graphs K-Search Ginseng Nlp-Reduce 80.0 as intui9ve allowing them to easily formulate more 60.0 complex queries. 40.0 •  Casual users enjoyed the fun and visually-‐appealing 20.0 interfaces which created a 17 pleasant search .0 experience. Casual Expert UserType 04.08.2010 82

83. Form-‐based approach most liked by casual users •  Perceived by casual users as Tool 5Extended Questionnaire Question "The systems query Semantic-Crystal language was easy to understand and use" score Affective-Graphs K-Search Ginseng Nlp-Reduce midpoint between NL and 4 graph-‐based. •  Allow more complex queries 3 than the NL does. •  Less complicated and less 2 61 query input -me than the graph-‐based. 1 17 •  Together with graph-‐based: Casual Expert most liked by expert users UserType 04.08.2010 83

84. Casual Users liked Controlled-‐NL approach •  Casuals: Tool •  liked guidance through 100.0 Semantic-CrystalSystem Usability Scale "SUS" Questionnaire score Affective-Graphs sugges-ons. K-Search Ginseng Nlp-Reduce 80.0 •  Prefer to be ‘controlled’ by the language model, allowing only 60.0 valid queries. 40.0 •  Experts: •  restric-ve and frustra-ng. 20.0 •  Prefer to have more ﬂexibility and expressiveness rather than .0 17 support and restric-on. Casual Expert UserType 04.08.2010 84

85. Free-‐NL challenge: habitability problem 1.0 Tool Semantic-Crystal Affective-Graphs •  Free-‐NL liked for its simplicity, K-Search .8 Ginseng Nlp-Reduce familiarity, naturalness and low query input -me required. Answer found rate 42 96 .6 •  Facing habitability problem: mismatch between users query 98 .4 terms and tools ones. .2 99 •  Lead to lowest success rate, highest number of trials to get .0 97 Casual Expert UserType a sa-sfying answer, and in turn very low user sa-sfac-on. 04.08.2010 85

86. FINDINGS -‐ AUTOMATED 6/6/1286

87. Overview •  K-‐Search couldn’t load the ontologies –  external ontology import not supported –  cyclic rela-ons with concepts in remote ontologies not supported •  Non-‐NL tools transform queries a priori •  Na-ve SPARQL tools exhibit diﬀerences in query approach (see load and query -mes) 6/6/1287

88. Ontology load -me Arq v2.8.2 ontology load time Arq v2.9.0 ontology load time 100000 RDF Query v0.5.1-beta ontology load time •  RDF Query loads ontology on-‐the-‐ﬂy. Load -mes therefore independent of Time (ms) 10000 dataset size. •  Arq loads ontology 1000 into memory. 1 10 100 1000 Dataset size (thousands of triples) 6/6/12 88

89. Query -me Arq v2.8.2 mean query time •  RDF Query loads Arq v2.9.0 mean query time ontology on-‐the-‐ﬂy. 100000 RDF Query v0.5.1-beta mean query time Query -mes therefore incorporate load -me. •  Expensive for more than one query in a Time (ms) 10000 session. •  Arq loads ontology into memory. 1000 •  Query -mes largely independent of dataset size 1 10 100 1000 Dataset size (thousands of triples) 6/6/12 89

90. SEALS Seman-c Web Service Tools Evalua-on Campaign 2011 Seman9c Web Service Discovery Evalua9on Results 04.08.20106/6/1204.08.201090

91. Evalua-on of SWS Discovery •  Finding Web Services based on their seman-c descrip-ons •  For a given goal, and a given set of service descrip-ons, the tool returns the match degree between the goal and each service •  Measurement services are provided via the SEALS plaKorm to measure the rate of matching correctness 91 91

92. Campaign Overviewhttp://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/ semantic-web-service-tools-evaluation-campaign-2011•  Goal –  Which ontology/annota-on is the best: WSMO-‐Lite, OWL-‐S or SAWSDL? •  Assump-ons: –  Same corresponding Test Collec-ons (TCs) –  Same corresponding Matchmaking algorithms (Tools) –  The corresponding tools will belong to the same provider –  The level of performance of a tool for a speciﬁc TC is of secondary importance 92 92

93. Campaign Overviewhttp://www.seals-project.eu/seals-evaluation-campaigns/2nd-seals-evaluation-campaigns/ semantic-web-service-tools-evaluation-campaign-2011Given that a tool T can apply the same corresponding matchmaking algorithm M to corresponding test collec-ons, say, TC1, TC2 and TC3, we would like to compare the performance (e.g. Precision, Recall) among MTC1, MTC2 and MTC3 93 93

94. Background: S3 Challenge hXp://www-‐ags.d€i.uni-‐sb.de/~klusch/s3/index.html T1 T2 …… Tn TI TII …… TXV …… M1 M2 …… Mn MI MII …… MXV TCa (e.g owl-‐s) TCb (e.g. sawsdl) …… 94 94

95. Background: S3 Challenge hXp://www-‐ags.d€i.uni-‐sb.de/~klusch/s3/index.html 1st Evalua9on Campaign (2010) T1 T2 …… Tn TI TII …… TXV …… M1 M2 …… Mn MI MII …… MXV TCa (e.g owl-‐s) TCb (e.g. sawsdl) …… 95 95

96. Background: SWS Challenge hXp://sws-‐challenge.org/wiki/index.php/Scenario:_Shipment_Discovery T1 TI Ta M1 MI Ma …… Formalism1(e.g. ocml) FormalismI(e.g. owl-‐s) Formalisma Goal descrip-ons (e.g. plain text) 96 96

97. SEALS 2nd SWS Discovery Evalua-on T1 T2 T3 …… M TC1 (e.g owl-‐s) TC2 (e.g. sawsdl) TC3 (e.g. wsmo-‐lite) …… 97 97

98. SEALS Test Collec-ons •  WSMO-‐LITE-‐TC (1080 services, 42 goals) hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/WSMO-‐LITE-‐TC-‐SWRL/1.0-‐4b hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/WSMO-‐LITE-‐TC-‐SWRL/1.0-‐4g •  SAWSDL-‐TC (1080 services, 42 goals) hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/SAWSDL-‐TC/3.0-‐1b hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/SAWSDL-‐TC/3.0-‐1g •  OWLS-‐TC (1083 services, 42 goals) hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/OWLS-‐TC/4.0-‐11b hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/OWLS-‐TC/4.0-‐11g 98

99. Metrics – Galago (1) 99 99

100. Metrics – Galago (2) 100 100

101. SWS Discovery Evalua-on Workﬂow 101

102. SWS Tool Deployment Wrapper for SEALS plaKorm 102

103. Tools WSMO-‐LITE-‐TC SAWSDL-‐TC OWLS-‐TC WSMO-‐LITE-‐OU1 SAWSDL-‐OU1 SAWSDL-‐URJC2 OWLS-‐URJC2 SAWSDL-‐M03 OWLS-‐M03 1. Ning Li, The Open University 2. Ziji Cong et al., University of Rey Juan Carlos 3. MaXhias Klusch et al. German Research Center for Ar-ﬁcial Intelligence 103 103

104. Tools WSMO-‐LITE-‐TC SAWSDL-‐TC OWLS-‐TC WSMO-‐LITE-‐OU1 SAWSDL-‐OU1 SAWSDL-‐URJC2 OWLS-‐URJC2 SAWSDL-‐M03 OWLS-‐M03 1. Ning Li, The Open University 2. Ziji Cong et al., University of Rey Juan Carlos 3. MaXhias Klusch et al. German Research Center for Ar-ﬁcial Intelligence 104 104

105. Evalua-on Execu-on •  Evalua-on workﬂow was executed on the SEALS PlaKorm •  All tools were executed within a Virtual Machine Windows OS Windows 7 (64-‐bit) Num CPUs 4 Memory (GB) 4 Tools WSMO-‐LITE-‐OU, SAWSDL-‐OU 105 6/6/12

106. Par-al Evalua-on Results WSMO-‐LITE vs. SAWSDL WSMO-‐LITE-‐OU SAWSDL-‐OU M WSMO-‐LITE-‐TC SAWSDL-‐TC 106

107. * This table only shows the results that are diﬀerent 107

108. Analysis •  Out of 42 goals, only 19 have diﬀerent results in terms of Precision and recall •  On 17 out of 19 occasions, WSMO-‐Lite improves discovery precision over SAWSDL through specializing service seman-cs •  WSMO-‐Lite performs worse than SAWSDL in 6 of 19 occasions on discovery recall while performing the same for the other 13 occasions 108

109. Analysis •  Goal #17: novel_author_service.wsdl (Educ-on domain) hXp://seals.s-2.at/tdrs-‐web/testdata/persistent/WSMO-‐LITE-‐TC-‐SWRL/1.0-‐4b/suite/ 17/component/GoalDocument/ •  Services chosen from SAWSDL but not WSMO-‐Lite (Economy domain) •  roman-cnovel_authormaxprice_service.wsdl •  roman-cnovel_authorprice_service.wsdl •  roman-cnovel_authorrecommendedprice_service •  short-‐story_authorprice_service.wsdl •  science-‐ﬁc-on-‐novel_authorprice_service.wsdl •  scienceﬁc-onbook_authorrecommendedprice_service.wsdl •  ………. 109

110. Lessons Learned •  WSMO-‐LITE-‐OU tends to perform beXer than SAWSDL-‐OU in terms of precision, but slightly worse in recall. •  The only feature of WSMO-‐Lite used against SAWSDL was the service category (based on TC domains). –  Services were ﬁltered by service category in WSMO-‐LITE-‐ OU and not in SAWSDL-‐OU •  Further tests with addi-onal tools and measures are needed for any conclusive results about WSMO-‐Lite vs. SAWSDL (many tools are not available yet) 110

111. Conclusions •  This has been the ﬁrst SWS evalua-on campaign in the community focusing on the impact of the service ontology/ annota-on on performance •  This comparison has been facilitated by the genera-on of WSMO-‐LITE-‐TC as a counterpart of SAWSDL-‐TC and OWLS-‐TC in the SEALS repository •  The current comparison only involves 2 ontologies/ annota-ons (WSMO-‐Lite and SAWSDL) •  Raw and Interpreta-on results are available in RDF via the SEALS repository (public access) 111

Quick Start Guide: Online Registration with Event Manager for Golf

Recomendados

Recomendados

Mais conteúdo relacionado

Último

Último (20)

Destaque

Destaque (20)

Quick Start Guide: Online Registration with Event Manager for Golf