O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
seevl: Data-driven music discoveryAlexandre Passant, co-founder, CEO, MDG Web ltdhttp://seevl.net // @seevl // alex@seevl....
a bit of backgroud...
• Knowledge Engineering• Social Web & Enterprise 2.0• Sensor Networks & Real-Time
architecture
dbpedia:Bad_Brains                         dbpedia:Hardcore_Punk                              p:associatedActs       p:gen...
dbpedia:Bad_Brains                         dbpedia:Hardcore_Punk                              p:associatedActs       p:gen...
Our approach: SLADE• Semantic LAyer for Data Exploration • A framework to build data-driven apps • ETL from existing sourc...
The pipeline                    Data-extraction                         and                     interlinking              ...
Challenges• Some technical challenges faced when building  SLADE and seevl.net • Data models: Chosing the right schemas • ...
data models
RDF since day one• RDF ? • Agile model (ideal when iterating) • Intuitive aspect of graph modelling • Standard toolkits (S...
Artist data• Music Ontology • Label, Genres, Influences,Origins ... • Collaborations between artists • Activity period (add...
Social activities• SIOC & SIOC-actions • Social graph / sub-graph • Action-centric activities (like, listen)• Inferring us...
Similarity / Recsys• Graph-based similarities • Data-driven recommendations • Ranking using weight-factors • Explanations ...
Provenance• Keep trace of every statement in the ETL • Origin, type and time of extraction• With a low number of additiona...
Provenance and graphsGRAPH svl:seevl_id/wikipedia/facts/extract{    svl: seevl_id mo:genre svl:BntvuZAy .    svl:seevl_id/...
data access
SPARQL• Pros • W3C Standard, Powerful • HTTP-based w/ SPARQL Protocol • SPARQL Update in 1.1• Cons • Learning curve for no...
URI patterns + JSON-LD • Pre-defined URIs mapped to SPARQL   query patterns, returning JSON-LD data  • Search queries or re...
JSON-LD• JSON for Linking Data • The best of both worlds • JSON serialization, works with any parser • Additional semantic...
Search• /entity/?property=value    • JSON-LD mappings used in URI templates    • Works with literals, dates, resources    ...
Search (text)• /entity/?  prefLabel=clash&type=artist&_sort=count_desc• Translated into    SELECT ?x WHERE {        ?x a m...
Search (relations)• /entity/?genre=BntvuZAy&type=artist• Translated into   SELECT ?x WHERE {       ?x a mo:artist ; mo:gen...
Resource description• Patterns mapped to resource URI to  retrieve subset of the resource description • /entity/seevl_id/i...
scalability
Is SPARQL fast enough?• SPARQL is very powerful, but can be slow • Some simple queries may lead to deep    graph patterns ...
Splitting queries• “List all resource sharing common  property-values with the current one,  whatever that property is” • ...
SPARQL: splitting queries                   Direct SPARQL       Property-slicing      Complete-slicing                 Que...
SPARQL + Redis• Started by using Memcache to store query  results (e.g. “?x genre $y”)  • Good, but costly for the first us...
SPARQL + Redis• Redis • HSET to define entities (minimal data) • ZADD to store ordered sets of key-    values, with our own...
SPARQL + Redisself.redis.hset(entity, uri, uri)self.redis.hset(entity, prefLabel, prefLabel)self.redis.hset(entity, descri...
user-experience
User-experience• Interfaces for graph-based/semantic data • Don’t need to be ugly! • As long as they’re built for users fir...
take-away message
Lessons learnt• Don’t reinvent the wheel, check existing  stacks and use what fits for the job• Make it simple for your dev...
Questions?http://seevl.net // @seevlalex@seevl.net // @terraces
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
seevl: Data-driven music discovery
Próximos SlideShares
Carregando em…5
×

seevl: Data-driven music discovery

Presentation about seevl.net at the LA SemWeb meetup, October 2nd 2012

http://www.meetup.com/lasemweb/events/83232222/

  • Seja o primeiro a comentar

seevl: Data-driven music discovery

  1. 1. seevl: Data-driven music discoveryAlexandre Passant, co-founder, CEO, MDG Web ltdhttp://seevl.net // @seevl // alex@seevl.net // @terracesLA SemWeb & WebSpeed Meet-up, 2 October 2012Cross Campus, Santa Monica
  2. 2. a bit of backgroud...
  3. 3. • Knowledge Engineering• Social Web & Enterprise 2.0• Sensor Networks & Real-Time
  4. 4. architecture
  5. 5. dbpedia:Bad_Brains dbpedia:Hardcore_Punk p:associatedActs p:genre p:genre:alex foaf:topic_interest dbpedia:Beastie_Boys dbpedia:Black_Flag_(band) p:currentMembers dbpedia:Adam_Yauch dbpedia:B._B._King skos:subject skos:subject dbpedia:Category:American_vegatarians
  6. 6. dbpedia:Bad_Brains dbpedia:Hardcore_Punk p:associatedActs p:genre p:genre:alex foaf:topic_interest dbpedia:Beastie_Boys dbpedia:Black_Flag_(band) p:currentMembers dbpedia:Adam_Yauch dbpedia:B._B._King skos:subject skos:subject dbpedia:Category:American_vegatarians
  7. 7. Our approach: SLADE• Semantic LAyer for Data Exploration • A framework to build data-driven apps • ETL from existing sources / APIs • Search, discovery, recommendations • Data access / API • Generic, config-based, domain-agnostic
  8. 8. The pipeline Data-extraction and interlinking Entity-centric semantic knowledge baseWeb data sources (artists, genres, labels, locations...) Storage REST-ful interface Search, discovery and recommendation seevl products engine, on-top of our graph-database
  9. 9. Challenges• Some technical challenges faced when building SLADE and seevl.net • Data models: Chosing the right schemas • Data access: SPARQL or API or ... ? • Scalability: Caching and optimisation strategies • User Experience: User-centric design
  10. 10. data models
  11. 11. RDF since day one• RDF ? • Agile model (ideal when iterating) • Intuitive aspect of graph modelling • Standard toolkits (SPARQL / HTTP)• OWL? RDFS? • Minor use of inference (type, hierarchies)
  12. 12. Artist data• Music Ontology • Label, Genres, Influences,Origins ... • Collaborations between artists • Activity period (add-on)• Additional models/mappings • e.g. Bio Vocabulary (birth/death), FOAF...
  13. 13. Social activities• SIOC & SIOC-actions • Social graph / sub-graph • Action-centric activities (like, listen)• Inferring user’s taste profile • Top artist, genres, labels • Using latest actions
  14. 14. Similarity / Recsys• Graph-based similarities • Data-driven recommendations • Ranking using weight-factors • Explanations / tracking• The Similarity Ontology • Domain-agnostic
  15. 15. Provenance• Keep trace of every statement in the ETL • Origin, type and time of extraction• With a low number of additional triples • Introducing “data-slices” • Multiple slices (=subgraphs) per resource • Quick updates (DELETE / INSERT)
  16. 16. Provenance and graphsGRAPH svl:seevl_id/wikipedia/facts/extract{ svl: seevl_id mo:genre svl:BntvuZAy . svl:seevl_id/wikipedia/extract dc:created “2012-10-25” ; rdfs:seeAlso wikipedia:Social_Distortion .}
  17. 17. data access
  18. 18. SPARQL• Pros • W3C Standard, Powerful • HTTP-based w/ SPARQL Protocol • SPARQL Update in 1.1• Cons • Learning curve for non-RDF people
  19. 19. URI patterns + JSON-LD • Pre-defined URIs mapped to SPARQL query patterns, returning JSON-LD data • Search queries or resources description • Content-negotiation or ?_format=json • GET and POST • POST => SPARQL UPDATE • GET => SPARQL SELECT / ASK
  20. 20. JSON-LD• JSON for Linking Data • The best of both worlds • JSON serialization, works with any parser • Additional semantics (URIs, typed links, etc.) with JSON-LD parsers • Use of context/mappings to avoid URIs
  21. 21. Search• /entity/?property=value • JSON-LD mappings used in URI templates • Works with literals, dates, resources • Ranking algorithm / alpha-ranking • Patterns defined in a single config file
  22. 22. Search (text)• /entity/? prefLabel=clash&type=artist&_sort=count_desc• Translated into SELECT ?x WHERE { ?x a mo:artist ; skos:prefLabel ?x . ?x bif:contains “clash” . }
  23. 23. Search (relations)• /entity/?genre=BntvuZAy&type=artist• Translated into SELECT ?x WHERE { ?x a mo:artist ; mo:genre svl:BntvuZAy . }
  24. 24. Resource description• Patterns mapped to resource URI to retrieve subset of the resource description • /entity/seevl_id/infos • /entity/seevl_id/facts • /entity/seevl_id/links • /entity/seevl_id/related(/related_id)
  25. 25. scalability
  26. 26. Is SPARQL fast enough?• SPARQL is very powerful, but can be slow • Some simple queries may lead to deep graph patterns or transversal queries depending on the modelling • FILTERS (e.g. text and date based queries) are expensive • Not all triple-stores are equal
  27. 27. Splitting queries• “List all resource sharing common property-values with the current one, whatever that property is” • Fits in a single SPARQL query • Doesn’t properly scale• Becoming faster when splitting the query and recomposing results via internal scripts
  28. 28. SPARQL: splitting queries Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  29. 29. SPARQL + Redis• Started by using Memcache to store query results (e.g. “?x genre $y”) • Good, but costly for the first user• Then, materialising results in-memory using Redis as a key-value cache system • Low indexing time (few minute on laptop) • Increasing query-performance, real-time
  30. 30. SPARQL + Redis• Redis • HSET to define entities (minimal data) • ZADD to store ordered sets of key- values, with our own ranking scheme • ZRANGE to retreive w/ correct order• Everything in memory, instant query results
  31. 31. SPARQL + Redisself.redis.hset(entity, uri, uri)self.redis.hset(entity, prefLabel, prefLabel)self.redis.hset(entity, description, description)self.redis.zadd(‘genre:BntvuZAy’, entity, score)...self.redis.zrange(pattern, min, max, withscores)
  32. 32. user-experience
  33. 33. User-experience• Interfaces for graph-based/semantic data • Don’t need to be ugly! • As long as they’re built for users first• Focus on vertical-UX, rather than SemWeb-UX • Check best practices in the domain • Involve HCI / non-SemWeb people
  34. 34. take-away message
  35. 35. Lessons learnt• Don’t reinvent the wheel, check existing stacks and use what fits for the job• Make it simple for your developers, using REST-ful interfaces and design patterns• Accept compromises, be pragmatic• This of users / create persona who are not SemWeb-geeks when designing the UX
  36. 36. Questions?http://seevl.net // @seevlalex@seevl.net // @terraces

×