SlideShare a Scribd company logo
1 of 27
Meetup Feb 17th, 2014
Migrating from MongoDB to Neo4j

1
Agenda
• Intros
– name, what you do, interest in Neo4j?

• Case Study, Moving from MongoDB
– considerations, why and how
– steps taken, findings
– using the Batch Importer

• Group Discussion
– experiences from others?
source:
http://neo4j.rubyforge.org/guides/why_gra
ph_db.html
Case Study, Moving from
MongoDB

source:
http://neo4j.rubyforge.org/guides/why_gra
ph_db.html
Our Startup
– A mobile drink discovery platform: explore new
drinks, post photos, learn new facts, follow other drink

afficionados (whisky, beer, wine, cocktail experts)

4
Using MongoDB
– Pluses for us:
• flexible (by far, most substantial benefit)
• good documentation
• easy to host and integrate with our code

– Downsides for us:
• lots of collections needed (i.e. for mapping data, many to many
relationships)
• queries with multiple joins

5
Relying on Redis
– Needed to cache a lot in Redis
– We cached
• user profile
• news feed

– Too much complexity
• another denormalized data model to manage
• more difficult to test
• increase in bugs and edge cases

– Still awesome, but just relied on it too much

6
Evaluating Neo4j
– Our goals
• simplify data model (less
denormalization)
• speed up highly relational queries
• keep our flexibilty (schemaless data
model)

– Considerations
• how will we host?
• will it make our codebase more

complex?
• support?
• easy to troubleshoot production
issues?
7
How We Evaluated
1. We set up an instance on Amazon EC2 (though
Heroku was still an option as well)
2. Imported realistic production data with the Batch
Importer

3. Took our most popular, slowest query and tested it
4. Wrote more example queries for standard use cases
(creating nodes, relationships, etc), easy to use?
5. Ran a branch of our code with Neo4j for a month
8
How We Evaluated
1.

Made sure we could get good support for the product

2.

Determined effort involved in hosting it on Amazon EC2 (though Heroku was
also an option)

3.

Determined effort needed to import bulk data and change our data model

4.

Audited each line of code and made a list of the types of queries we’d need.
Estimated effort involved in updating our codebase.

5.

Imported production data and took our most popular, slowest query and tested

performance.
6.

Wrote other more common queries and tested performance more (using
Apache Benchmark)

7.

Was the driver (this case Ruby) support okay and was it well-written? Would it

be maintained years from now?
8.

Test it out as a code branch for at least a month
9
Our Findings
1. So far so good (been testing for a few weeks now)

2. Set up an instance on Amazon EC2. Wasn’t that bad.
3. Complex queries were a lot faster
4. Ruby driver (Neography) does the job though not
perfect.
5. Planning to use Neo4j’s official Ruby library once
they finish version 3.0 (which seems to not require
JRuby)
10
Our Findings
6. We needed to create an abstraction layer in the code
to simplify reads and write with the database. Wasn’t
that bad though.
7. Our data model got a lot more intuitive. No more

map collections (yay)
8. We can now implement recommendations a lot more
easily when we want to

9. No longer need to rely heavily on Redis and caching
11
Our Findings

10.We think about our data differently now
11. Managing the data model is actually fun

12
Tutorial on Batch Importer
1. Our example involves real data

2. We will be using Ruby to generate .CSV files
representing nodes and relationships
3. Beware, existing documentation is “not good” to put
it lightly
4. Using the 2.0 version! (Precompiled binary)

https://github.com/jexp/batch-import/tree/20
13
Steps
1. Install Neo4j
2. Download a binary version of batch importer
3. Batch Importer requires .CSV files. One type of file
will import nodes, another will import relationships

4. Decide on fields that make nodes unique
1. ex: user has a username, a drink has a name
2. makes the process of mapping node relationships later a
lot easier too
14
.CSV Format for Nodes
• Tab separated columns
• Importing Nodes
– node property names in first row
– format is <field name>:<field type> (defaults to String)
– all rows after that are corresponding property values

• Importing Relationships
– sepate .CSV file, source node’s unique field in first col, target node’s unique
field in second col, the word “type” in the 3rd column
– since we’re arleady using unique index on nodes, it’s easy to relate them!

– can import multiple relations between two types of nodes in the same .CSV
file

15
Creating Drink Nodes
• Example output (tab delimited)

16
Creating Drink Nodes
namespace :export

do

require 'csv'

task :generate_drink_nodes => :environment do

CSV.open("drink_nodes.csv", "wb", { :col_sep => "t" }) do |csv|
csv << ["name:string:drink_name_index", "type:label", "name"]
Drink.all.each do |drink|

csv << [drink.name, "Drink", drink.name]
end
end
end

end
17
Running the Script
• Make sure all nodes, relationships deleted from Neo4j
–

MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r

• Stop your Neo4j server before importing

• Run the import command (per the binary batch
importer we downloaded earlier):
–

./import.sh ~/neo4j-community-2.0/data/graph.db user_nodes.csv

18
Creating User Nodes
• Example output (tab delimited):

19
Creating User Nodes
CSV.open("user_nodes.csv", "wb", { :col_sep => "t" }) do |csv|

csv << ["username:string:user_username_index",
"type:label",
"first_name",
"last_name"]

User.all.each do |user|
csv << [user.username, "User", user.first_name, user.last_name]
end

20
User to User Relationships
• NOTE: it’s easy to relate users to users since we
already have an index set up.
• Example output (tab delimited):

21
User to User Relationships
CSV.open("user_rels.csv", "wb", { :col_sep => "t" }) do |csv|
csv << ["username:string:user_username_index",
"username:string:user_username_index",
"type"]
User.all.each do |user|
user.following.each do |other_user|
csv << [user.username, other_user.username, "FOLLOWS"]
end
user.followers.each do |other_user|
csv << [other_user.username, user.username, "FOLLOWS"]
end
end
end
22
User to Drink Relationships
• Example output:

23
User to Drink Relationships
CSV.open("user_drink_rels.csv", "wb", { :col_sep => "t" }) do |csv|
csv <<
["username:string:user_username_index", "name:string:drink_name_index", "type"]
User.all.each do |user|
user.liked_drinks.each do |drink|

csv << [user.username, drink.name, "LIKED"]
end
user.disliked_drinks.each do |drink|
csv << [user.username, drink.name, "DISLIKED"]
end
user.drink_journal_entries.each do |entry|
csv << [user.username, entry.drink.name, "JOURNALED"]
end
end
end
24
Test Your Data
• Test with some cypher queries
– cheat sheet: http://docs.neo4j.org/refcard/2.0
– ex:

MATCH(n:User)-[r:FOLLOWS]-(o) WHERE
n.username='nickTribeca' RETURN n, r limit 50

• Note: you must limit your results or else the Data
Browser will become too slow to use
25
That’s the Tutorial
• You can always migrate data yourself without

the batch importer
– ie. script to query MongoDB data and insert it to
Neo4j in real time using your API

• Using the Batch Importer is really fast though
• Have found it faster to write and less error
prone than writing my own script
26
Group Q&A
• Thanks for coming
• @seenickcode

• nicholas.manning@gmail.com for
questions

• Want to present? Let me know.

27

More Related Content

Similar to Migrating from MongoDB to Neo4j - Lessons Learned

MEAN Stack WeNode Barcelona Workshop
MEAN Stack WeNode Barcelona WorkshopMEAN Stack WeNode Barcelona Workshop
MEAN Stack WeNode Barcelona WorkshopValeri Karpov
 
Women Who Code - RSpec JSON API Workshop
Women Who Code - RSpec JSON API WorkshopWomen Who Code - RSpec JSON API Workshop
Women Who Code - RSpec JSON API WorkshopEddie Lau
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Derek Jacoby
 
JS-IL: Getting MEAN in 1 Hour
JS-IL: Getting MEAN in 1 HourJS-IL: Getting MEAN in 1 Hour
JS-IL: Getting MEAN in 1 HourValeri Karpov
 
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayStoring eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayMongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Laying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkLaying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkIonic Security
 
From CoreOS to Kubernetes and Concourse CI
From CoreOS to Kubernetes and Concourse CIFrom CoreOS to Kubernetes and Concourse CI
From CoreOS to Kubernetes and Concourse CIDenis Izmaylov
 
MEAN Stack Workshop at Node Philly, 4/9/14
MEAN Stack Workshop at Node Philly, 4/9/14MEAN Stack Workshop at Node Philly, 4/9/14
MEAN Stack Workshop at Node Philly, 4/9/14Valeri Karpov
 
node_js.pptx
node_js.pptxnode_js.pptx
node_js.pptxdipen55
 
project_proposal_osrf
project_proposal_osrfproject_proposal_osrf
project_proposal_osrfom1234567890
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Henry S
 
What is Node.js? (ICON UK)
What is Node.js? (ICON UK)What is Node.js? (ICON UK)
What is Node.js? (ICON UK)Tim Davis
 
JavaScript Modules Done Right
JavaScript Modules Done RightJavaScript Modules Done Right
JavaScript Modules Done RightMariusz Nowak
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsasync_io
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community EngineCommunity Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community enginemathraq
 

Similar to Migrating from MongoDB to Neo4j - Lessons Learned (20)

MEAN Stack WeNode Barcelona Workshop
MEAN Stack WeNode Barcelona WorkshopMEAN Stack WeNode Barcelona Workshop
MEAN Stack WeNode Barcelona Workshop
 
Women Who Code - RSpec JSON API Workshop
Women Who Code - RSpec JSON API WorkshopWomen Who Code - RSpec JSON API Workshop
Women Who Code - RSpec JSON API Workshop
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
JS-IL: Getting MEAN in 1 Hour
JS-IL: Getting MEAN in 1 HourJS-IL: Getting MEAN in 1 Hour
JS-IL: Getting MEAN in 1 Hour
 
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayStoring eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Laying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on SparkLaying the Foundation for Ionic Platform Insights on Spark
Laying the Foundation for Ionic Platform Insights on Spark
 
Caffe2
Caffe2Caffe2
Caffe2
 
From CoreOS to Kubernetes and Concourse CI
From CoreOS to Kubernetes and Concourse CIFrom CoreOS to Kubernetes and Concourse CI
From CoreOS to Kubernetes and Concourse CI
 
MEAN Stack Workshop at Node Philly, 4/9/14
MEAN Stack Workshop at Node Philly, 4/9/14MEAN Stack Workshop at Node Philly, 4/9/14
MEAN Stack Workshop at Node Philly, 4/9/14
 
node_js.pptx
node_js.pptxnode_js.pptx
node_js.pptx
 
project_proposal_osrf
project_proposal_osrfproject_proposal_osrf
project_proposal_osrf
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
Short-Training asp.net vNext
Short-Training asp.net vNextShort-Training asp.net vNext
Short-Training asp.net vNext
 
What is Node.js? (ICON UK)
What is Node.js? (ICON UK)What is Node.js? (ICON UK)
What is Node.js? (ICON UK)
 
JavaScript Modules Done Right
JavaScript Modules Done RightJavaScript Modules Done Right
JavaScript Modules Done Right
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Ember - introduction
Ember - introductionEmber - introduction
Ember - introduction
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Migrating from MongoDB to Neo4j - Lessons Learned

  • 1. Meetup Feb 17th, 2014 Migrating from MongoDB to Neo4j 1
  • 2. Agenda • Intros – name, what you do, interest in Neo4j? • Case Study, Moving from MongoDB – considerations, why and how – steps taken, findings – using the Batch Importer • Group Discussion – experiences from others? source: http://neo4j.rubyforge.org/guides/why_gra ph_db.html
  • 3. Case Study, Moving from MongoDB source: http://neo4j.rubyforge.org/guides/why_gra ph_db.html
  • 4. Our Startup – A mobile drink discovery platform: explore new drinks, post photos, learn new facts, follow other drink afficionados (whisky, beer, wine, cocktail experts) 4
  • 5. Using MongoDB – Pluses for us: • flexible (by far, most substantial benefit) • good documentation • easy to host and integrate with our code – Downsides for us: • lots of collections needed (i.e. for mapping data, many to many relationships) • queries with multiple joins 5
  • 6. Relying on Redis – Needed to cache a lot in Redis – We cached • user profile • news feed – Too much complexity • another denormalized data model to manage • more difficult to test • increase in bugs and edge cases – Still awesome, but just relied on it too much 6
  • 7. Evaluating Neo4j – Our goals • simplify data model (less denormalization) • speed up highly relational queries • keep our flexibilty (schemaless data model) – Considerations • how will we host? • will it make our codebase more complex? • support? • easy to troubleshoot production issues? 7
  • 8. How We Evaluated 1. We set up an instance on Amazon EC2 (though Heroku was still an option as well) 2. Imported realistic production data with the Batch Importer 3. Took our most popular, slowest query and tested it 4. Wrote more example queries for standard use cases (creating nodes, relationships, etc), easy to use? 5. Ran a branch of our code with Neo4j for a month 8
  • 9. How We Evaluated 1. Made sure we could get good support for the product 2. Determined effort involved in hosting it on Amazon EC2 (though Heroku was also an option) 3. Determined effort needed to import bulk data and change our data model 4. Audited each line of code and made a list of the types of queries we’d need. Estimated effort involved in updating our codebase. 5. Imported production data and took our most popular, slowest query and tested performance. 6. Wrote other more common queries and tested performance more (using Apache Benchmark) 7. Was the driver (this case Ruby) support okay and was it well-written? Would it be maintained years from now? 8. Test it out as a code branch for at least a month 9
  • 10. Our Findings 1. So far so good (been testing for a few weeks now) 2. Set up an instance on Amazon EC2. Wasn’t that bad. 3. Complex queries were a lot faster 4. Ruby driver (Neography) does the job though not perfect. 5. Planning to use Neo4j’s official Ruby library once they finish version 3.0 (which seems to not require JRuby) 10
  • 11. Our Findings 6. We needed to create an abstraction layer in the code to simplify reads and write with the database. Wasn’t that bad though. 7. Our data model got a lot more intuitive. No more map collections (yay) 8. We can now implement recommendations a lot more easily when we want to 9. No longer need to rely heavily on Redis and caching 11
  • 12. Our Findings 10.We think about our data differently now 11. Managing the data model is actually fun 12
  • 13. Tutorial on Batch Importer 1. Our example involves real data 2. We will be using Ruby to generate .CSV files representing nodes and relationships 3. Beware, existing documentation is “not good” to put it lightly 4. Using the 2.0 version! (Precompiled binary) https://github.com/jexp/batch-import/tree/20 13
  • 14. Steps 1. Install Neo4j 2. Download a binary version of batch importer 3. Batch Importer requires .CSV files. One type of file will import nodes, another will import relationships 4. Decide on fields that make nodes unique 1. ex: user has a username, a drink has a name 2. makes the process of mapping node relationships later a lot easier too 14
  • 15. .CSV Format for Nodes • Tab separated columns • Importing Nodes – node property names in first row – format is <field name>:<field type> (defaults to String) – all rows after that are corresponding property values • Importing Relationships – sepate .CSV file, source node’s unique field in first col, target node’s unique field in second col, the word “type” in the 3rd column – since we’re arleady using unique index on nodes, it’s easy to relate them! – can import multiple relations between two types of nodes in the same .CSV file 15
  • 16. Creating Drink Nodes • Example output (tab delimited) 16
  • 17. Creating Drink Nodes namespace :export do require 'csv' task :generate_drink_nodes => :environment do CSV.open("drink_nodes.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["name:string:drink_name_index", "type:label", "name"] Drink.all.each do |drink| csv << [drink.name, "Drink", drink.name] end end end end 17
  • 18. Running the Script • Make sure all nodes, relationships deleted from Neo4j – MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r • Stop your Neo4j server before importing • Run the import command (per the binary batch importer we downloaded earlier): – ./import.sh ~/neo4j-community-2.0/data/graph.db user_nodes.csv 18
  • 19. Creating User Nodes • Example output (tab delimited): 19
  • 20. Creating User Nodes CSV.open("user_nodes.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["username:string:user_username_index", "type:label", "first_name", "last_name"] User.all.each do |user| csv << [user.username, "User", user.first_name, user.last_name] end 20
  • 21. User to User Relationships • NOTE: it’s easy to relate users to users since we already have an index set up. • Example output (tab delimited): 21
  • 22. User to User Relationships CSV.open("user_rels.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["username:string:user_username_index", "username:string:user_username_index", "type"] User.all.each do |user| user.following.each do |other_user| csv << [user.username, other_user.username, "FOLLOWS"] end user.followers.each do |other_user| csv << [other_user.username, user.username, "FOLLOWS"] end end end 22
  • 23. User to Drink Relationships • Example output: 23
  • 24. User to Drink Relationships CSV.open("user_drink_rels.csv", "wb", { :col_sep => "t" }) do |csv| csv << ["username:string:user_username_index", "name:string:drink_name_index", "type"] User.all.each do |user| user.liked_drinks.each do |drink| csv << [user.username, drink.name, "LIKED"] end user.disliked_drinks.each do |drink| csv << [user.username, drink.name, "DISLIKED"] end user.drink_journal_entries.each do |entry| csv << [user.username, entry.drink.name, "JOURNALED"] end end end 24
  • 25. Test Your Data • Test with some cypher queries – cheat sheet: http://docs.neo4j.org/refcard/2.0 – ex: MATCH(n:User)-[r:FOLLOWS]-(o) WHERE n.username='nickTribeca' RETURN n, r limit 50 • Note: you must limit your results or else the Data Browser will become too slow to use 25
  • 26. That’s the Tutorial • You can always migrate data yourself without the batch importer – ie. script to query MongoDB data and insert it to Neo4j in real time using your API • Using the Batch Importer is really fast though • Have found it faster to write and less error prone than writing my own script 26
  • 27. Group Q&A • Thanks for coming • @seenickcode • nicholas.manning@gmail.com for questions • Want to present? Let me know. 27