Practical SPARQL Benchmarking Revisited

1
Rob Vesse
rvesse@yarcdata.com
@RobVesse

2
1. Rewind to 2012
2. Limitations
3. Evolving the Framework
4. Examples
5. Future Work

4
 Presentation I gave at this conference in 2012
 Slides at http://www.slideshare.net/RobVesse/practical-sparql-benchmarking
 Highlighted some issues with SPARQL Benchmarking:
 Standard Benchmarks all have know deficiencies
 Lack of standardized methodology
 Best benchmark is the one you run with your data and workload
 Introduced the 1.x version of our SPARQL Query
Benchmarker tool
 Java tool and API for benchmarking
 Used a methodology based upon combination of the BSBM runner and Revelytix SP2B white
paper
 Reports various appropriate statistics
 Various configuration options to change what exactly is benchmarked e.g. whether results are
fully parsed and counted

5
 The 1.x tool was open sourced shortly after the 2012
conference under a 3 clause BSD License
 Available on SourceForge
 http://sourceforge.net/projects/sparql-query-bm/files/1.0.0/
 Also as Maven artifacts (in Maven Central):
 Group ID: net.sf.sparql-query-bm
 Artifact IDs:
 cmd
 core
 Latest 1.x Version: 1.1.0

 The 1.x tool can only benchmark SPARQL queries
 SPARQL 1.1 has been standardized since the 1.x version of
the tool was written and adds various additional SPARQL
features that you may want to test:
7
 SPARQL Updates
 SPARQL Graph Store Protocol
 Queries are fixed
 No parameterization support
 Can't pass custom endpoint parameters in
 For example enable/disable reasoning
 Also no way to test endpoint specific extensions
 e.g. transactions

8
 Requires using HTTP endpoints to access the SPARQL
system to be tested
 Adds communication overheads to the results
 Sometimes this may be desirable
 No ability to test SPARQL operations in-memory
 i.e. can't test lower level APIs

 Only supports a single benchmarking methodology
 Methodology is hard coded
 Can't do things like run a subset of the provided operations
on each run
9
 Or repeat an operation within a run
 Or retry an operation under specific failure conditions
 Configuration of the methodology is tightly coupled to the
methodology
 Many aspects are actually independent of the methodology

1
0
 Used a simplistic text based format
 One query file per line
 No way to specify additional parameters
 No way to assign a friendly name to queries
 Assigns each query the filename

 There is a progress monitoring API but it is limited
 E.g. Gets called after a query completes but not before it
starts
 Makes it awkward/impossible to implement some kinds of
monitoring
1
1
 e.g. crash detection, memory usage

1
2
 In the interests of speed over usability we rolled our own
command line arguments parser
 Means argument parsing is awkward to extend

1
4
 Earlier this year we found a compelling reason to rewrite
the tool and address the various limitations
 First 2.x release was made 9th June 2014
 Minor bug fix and maintenance releases since
 Releases available at:
 http://sourceforge.net/projects/sparql-query-bm/files/
 Code is now using Git
 http://git.code.sf.net/p/sparql-query-bm/git sparql-query-bm-git
 Mirrors available on GitHub for those who think that it is the one true source
 https://github.com/rvesse/sparql-query-bm
 Maven artifacts available through Maven Central as before:
 Group ID: net.sf.sparql-query-bm
 Artifact IDs: core, cmd and dist
 Latest 2.x version: 2.0.1

 Concept of Queries replaced with the general concept of
Operations
 Also divorces the definition of an operation with how to run
said operation
1
5
 Makes it easier to change runtime behaviour of operations
 20 built-in operations provided
 API allows defining and plugging in new operations as
desired
 http://sparql-query-bm.sourceforge.net/javadoc/latest/core/

1
6
 Several kinds of query/update
 Fixed
 Parameterized
 Dataset Size
 Variants for both remote endpoints and in-memory
datasets
 Remote variants have additional NVP variants
 Allows adding custom parameters to the remote request
 Accounts for 13 of the built in operations

1
7
 One for each graph store protocol operation:
 DELETE
 GET
 HEAD
 POST
 PUT
 Accounts for a further 5 of the built-in operations

1
8
 Sleep
 Do nothing for some period
 Useful for simulating quiet periods as part of testing
 Mix
 Allow grouping a set of operations into a single operation
 Lets you compose mixes from other mixes

1
9
 As already noted in-memory variants of some operations
are now available
 These run tests against a Dataset implementation
 Part of Apache Jena ARQ API
 Removes SPARQL Protocol and HTTP overhead from testing
 Of course depending on Dataset implementation may still be some communication overhead
 But this is likely using lower level back end native communications protocols instead

2
0
 Addresses the limitation of hard coded methodology
 Separates test running into three components:
 Overall runner
 Mix runner
 Operation runner
 Each has own API and can be customized as desired
 Various useful base/abstract implementations provided
 Four different test runners are provided:
 Benchmark
 Smoke
 Soak
 Stress

2
1
 Smoke
 Runs the mix once and indicates whether it passes/fails
 Pass is defined as all operations pass
 Soak
 Run the mix continuously for some period of time
 Test how a system reacts under continuous load
 Stress
 Run the mix with increasingly high load
 Test how a system reacts under increasing load
 AbstractRunner provides a basic framework and helper
method to make it easy to add custom runners or
customize existing runs

2
2
 Allows customizing how mixes and individual operations
are run
 Some alternative implementations built in:
 E.g. SamplingOperationMixRunner
 Runs a sample of the operations in the mix
 May include repeats
 E.g. RetryingOperationRunner
 Retries an operation if it doesn't succeed
 Easy to implement your own

2
3
 Separates test configuration from the test runner
 Interface with all common configuration defined
 Endpoints
 Timeouts
 Progress Listeners
 etc
 NB - Runners are typically defined such that they restrict
their input options to sub-interfaces that add runner
specific configuration e.g.
 Warm-ups for benchmarks
 Total runtime for soak testing
 Ramp up factor for stress testing

2
4
 Now using TSV as the file format
 Still wanted to be simple enough that someone with zero RDF/SPARQL knowledge can
configure
 Each line is a series of parameters separated by a tab
character
 First parameter is an identifier for the type of the operation
 Used to decide how to interpret the remaining parameters
 Can define your own mix file format and register a loader
for it
 Possible to override the loader for a specific operation
identifier since this has an API
 Means you can do neat tricks like use a mix designed for remote endpoints against an in-memory
dataset

query 806670-warmup1.rq 806670 Warmup Query 1
query 806670-nofilter.rq 806670 Query with No Filter
query 806670-filter3.rq 806670 Query with Filter (Variant 3)
param-query 806670-filter3-params.rq instances.tsv Parameterized Query with
Filter (Variant 3)
query 806670-filter4.rq 806670 Query with Filter (Variant 4)
query 806670-filter4a.rq 806670 Query with Filter (Variant 4a - Zero Results)
param-query 806670-filter4-params.rq instances.tsv Parameterized Query with
Filter (Variant 4)
query 806238-comment43.rq 806238 Query (Comment 43)
query 806238-comment43a.rq 806238 Query (Comment 43 - SELECT * sub-query)
query 806238-comment45.rq 806238 Query (Comment 45 - Multiple sub-queries)
query 806238-comment54.rq 806238 Query (Comment 54)
param-update load-full1m.ru graph-names.tsv Load 1M Dataset into named graph
param-query count-loaded.rq graph-names.tsv Count named graph
param-update drop-loaded.ru graph-names.tsv Drop named graph
query count.rq Count quads
checkpoint10 Checkpoint every 10 runs
sleep 180 3 minute sleep
2
5

 Now provides notifications before and after operation and
mix runs
 Improvements to how some of the built-in
implementations handle multi-threaded output
2
6
 Makes it easier to distinguish where errors occurred when running multi-threaded
benchmarks

2
7
 Now based upon the powerful open source Airline library
 https://github.com/airlift/airline
 Provides a command line interface to each built-in runner
 Also provides AbstractCommandwith all standard options exposed
 Standardized exit codes across all commands
 Comprehensive built-in help
 Can help you define operation mixes
 ./operations
 ./operation --op param-query

 These are things we've done (or are currently doing) with
the framework that aren't in the open source releases
 However the 2.x framework makes these (hopefully) easy
to replicate yourself
2
9

3
0
 Many stores often have rich REST APIs in addition to their
SPARQL APIs
 Can be useful to include testing of these in your mixes
 Requires implementing two interfaces:
 Operation
 OperationCallable
 Abstract implementations of both available to give you the
boiler plate bits
 Internally we have 9 different custom operations defined
which test a subset of our REST API:
 Database Management
 Asynchronous Queries
 Import Management

 One thing we're particularly interested in is how operations
affect memory usage
3
1
 We added custom progress listeners that track and monitor memory usage
 Reports on min, max and average memory usage
 We also have another progress listener that tracks
processes to identify when a test run may have been
impacted by other activity on the system

3
2
public class RetryOnAuthFailureOperationRunner extends RetryingOperationRunner {
public RetryOnAuthFailureOperationRunner() {
this(1);
}
public RetryOnAuthFailureOperationRunner(int maxRetries) {
super(maxRetries);
}
@Override
protected <T extends Options> boolean shouldRetry(Runner<T> runner, T options,
Operation op, OperationRun run) {
return run.getErrorCategory() == ErrorCategories.AUTHENTICATION;
}
}
 Extends the built-in RetryingOperationRunner
 Simply adds a constraint on retries by overriding the
shouldRetry() method

3
4
 Embrace Java 7 features fully
 Use ServiceLoader to automatically discover new operations and mix formats
 Make it even easier to customize runners
 i.e. provide more abstraction of the current implementations

3
5
Questions?
rvesse@yarcdata.com
@RobVesse

Practical SPARQL Benchmarking Revisited

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Practical SPARQL Benchmarking Revisited

Semelhante a Practical SPARQL Benchmarking Revisited (20)

Mais de Rob Vesse

Mais de Rob Vesse (6)

Último

Último (20)

Practical SPARQL Benchmarking Revisited

Notas do Editor