Axa Assurance Maroc - Insurer Innovation Award 2024
20130204 graph to-pacer-xml
1. GraphTO
February 2013, Mozilla Toronto
David Colebatch & Darrick Wiebe us@xnlogic.com
2. Agenda
• Who We Are
• Intro to GraphDB Sponsored By:
• Intro to Patent-Grant Data
• Graph Concepts
• Pacer::Xml
3. ¿por qué?
• Data Set Size
• Connectivity of Data
• Semi-structure
• Evolution of SOA and REST
4. The Zone of SQL Adequacy
SQL database
Social
Requirement of application
Geo
Performance
Salary List
Network / Cloud
Management
ERP
MDM
CRM
Data complexity
6. Relational Model vs. Graph
Each of these models
expresses the same thing
Person* Person-Friend Friend*
7. Graph db performance
๏ a sample social graph
• with ~1,000 persons
๏ average 50 friends per person
๏ pathExists(a,b) limited to depth 4
๏ caches warmed up to eliminate disk I/O
Database # persons query time
MySQL 1,000 2,000 ms
Neo4j 1,000 2 ms
Neo4j 1,000,000 2 ms
14. US PTO Data
• Patent Grant Data in XML
• bi-weekly chunks
• Pacer::Xml has handy loader as an example:
jruby-1.7.0 > g = PacerXml::Sample.load_100
Downloading a sample xml file from...
There are four trends underpinning the NoSQL and specifically the GraphDB movements: 1)...the size of data that we are managing is more than doubling every two years, with around 2.4 Zettabytes expected by the end of this year (or 250mil years of the TV show “24”). 2) Data is more highly-connected than ever before. FOAF on social networks; Configuration Management for a Datacenter 3) Schema-less data persistence; Add a field to just one record, no problem. Sparkes on Toyota 4) Application Architecture changed from flat-files and batch processing, to shared RDBMS, SOA + Web services
*This is a somewhat contrived example, as “person” & “friend” would normally be one table with a self join.
A borrowed slide from neo technology
Gephi - example of high-level graph visualization where you might be looking for clustering of data types and super nodes.
d3js.org - example of mixing high-level overview of relationships, with specific relationships on hover
A few options exist for graph query languages, some you may have hear of. SPARQL is a recursive acronym for “SPARQL Protocol and RDF Query Language” for Resource Description Framework. Cypher and Gremlin are modern graph query languages with strong ties to the Neo4j community. Pacer is a ruby gem that you can include in your projects and get jamming on embedded graph databases straight away.
Chris compared Traffic-based and Content-based message ranking approaches to discover Ego Networks. We don’t need to worry about the details here though. Chris has left us with a nice property graph which identifies official reporting relationships by an edge labelled “Directly_Reported_To”.