SlideShare uma empresa Scribd logo
1 de 13
CloudGraph ®

MySql to HBase in 5 Steps
Converting MySql or Oracle databases to Apache HBase™ with on-line
examples using the popular Wordnet® dictionary
Scott Cinnamond – TerraMeta Software Inc.
http://cloudgraph.org
What is Wordnet ?
®

• Large complex lexical (MySql) database of
English.
• Nouns, verbs, adjectives and adverbs
grouped into sets of cognitive synonyms
(synsets), each expressing a distinct
concept.
• Synsets are interlinked by means of
conceptual-semantic and lexical relations.
HBase Conversion Steps
http://wordnet.cloudgraph.org

1) Model Creation: reverse engineer Wordnet DB
into UML®

2) Code Generation: provision persistence and
query-DSL java code

3) HBase™ Table Mapping: map data graphs and
row keys to table(s)

4) Data Migration: MySql to HBase
5) Services / App Creation: build services,
web app
1.) Model Creation
Reverse engineer Wordnet DB into PlasmaSDO™ UML® Model

• Capture entities, properties, data types,
associations, enumerations, comments as UML
• Why UML? Popular standards-based format.
Editable, viewable using standard tools.
Supports enterprise governance processes
• How? Maven build with plasma-maven-plugin
RDB tool (goal:RDB, action:reverse, dialect:mysql)
• Download working example at
https://github.com/cloudgraph/wordnet
Generated Wordnet Model
(core subset of 30 total entities and enumerations)
2.) Code Generation
Provision SDO persistence and query DSL java code

• Generate Java API based on Wordnet UML
Model
• Why? Use across RDB, HBase, other
CloudGraph Services. Compile time checking for
queries, all persistence logic
• How? Maven build with plasma-maven-plugin
SDO and DSL tools
• See generated API Javadocs on-line at
http://wordnet.cloudgraph.org
3.) HBase™ Table Mapping
Map data graphs and row keys to HBase™ table(s)

• Configure delimited, hashed, salted, formatted,
composite row keys with (xpath) paths into
target data graphs
• Map data graph roots to HBase tables
• Why? Automates row-key creation via data
extraction processing from anywhere in your
data graphs
• How? CloudGraph Configuration XML. See
https://github.com/cloudgraph/wordnet
4.) Data Migration
MySql to HBase

• Create RDB-to-HBase standalone
migration app using generated
persistence and DSL query API
incrementally call CloudGraph HBase and
RDB services
• Why? Wordnet data is large and highly
connected, so must be incrementally
extracted/inserted and linked
5.) Services / App Creation
Build services, web app

• Build simple pojo services using
persistence and DSL query API
• Encapsulate Wordnet business logic
• Add adapter/wrapper structures
• Call services called from web-app
Web App
http://wordnet.cloudgraph.org

• Auto-complete field triggers CloudGraph
HBase to use the HBase fuzzy row filter
API
• Find button returns all semantic and
lexical relations for the selected word,
including descriptions and example
sentences
• Resulting relation graphs typically contain
more than 100 nodes and return in less
than 200 milliseconds
Conclusions
• Complex, highly recursive RDB models
can be easily converted and leveraged in
HBase and future CloudGraph services
• Large lexical data graphs can be returned
in single query
• Data migration difficult given complex
recursive model
Resources
• Download the complete CloudGraph Wordnet
example: https://github.com/cloudgraph/wordnet
• Run the example online:
http://wordnet.cloudgraph.org
• Project details, contact information:
http://cloudgraph.org
• Beta Source Repo:
https://github.com/terrameta/cloudgraph
• Production Source Repo (under construction):
https://github.com/cloudgraph
Status / Legal
•
•

•

Project Status
– CloudGraph ® is currently under private beta testing
Licensing
– CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed
under version 2 of the GNU General Public License
Trademarks
– WordNet ® is a registered trademark of Princeton University
– Apache HBase™ is a trademark of Apache Software Foundation
– CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta
Software Inc.

Mais conteúdo relacionado

Mais procurados

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Spark Summit
 

Mais procurados (20)

Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 

Destaque (6)

Social Media Marketing Overview Share
Social Media Marketing Overview ShareSocial Media Marketing Overview Share
Social Media Marketing Overview Share
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Pentaho Enterprise vs. Pentaho Community
Pentaho Enterprise vs. Pentaho CommunityPentaho Enterprise vs. Pentaho Community
Pentaho Enterprise vs. Pentaho Community
 
Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
Managing "Big Data" Application Complexity with CloudGraph
Managing "Big Data" Application Complexity with CloudGraphManaging "Big Data" Application Complexity with CloudGraph
Managing "Big Data" Application Complexity with CloudGraph
 
Hadoop administration using cloudera student lab guidebook
Hadoop administration using cloudera   student lab guidebookHadoop administration using cloudera   student lab guidebook
Hadoop administration using cloudera student lab guidebook
 

Semelhante a MySql to HBase in 5 Steps

Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
Amazon Web Services
 

Semelhante a MySql to HBase in 5 Steps (20)

Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hive
HiveHive
Hive
 
Database Freedom: Database Week SF
Database Freedom: Database Week SFDatabase Freedom: Database Week SF
Database Freedom: Database Week SF
 
AWS-DMS-2023.pptx
AWS-DMS-2023.pptxAWS-DMS-2023.pptx
AWS-DMS-2023.pptx
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Database Freedom: Database Week San Francisco
Database Freedom: Database Week San FranciscoDatabase Freedom: Database Week San Francisco
Database Freedom: Database Week San Francisco
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWS
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesDropping ACID: Wrapping Your Mind Around NoSQL Databases
Dropping ACID: Wrapping Your Mind Around NoSQL Databases
 
What is Database Freedom?
What is Database Freedom?What is Database Freedom?
What is Database Freedom?
 
NoSQL
NoSQLNoSQL
NoSQL
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

MySql to HBase in 5 Steps

  • 1. CloudGraph ® MySql to HBase in 5 Steps Converting MySql or Oracle databases to Apache HBase™ with on-line examples using the popular Wordnet® dictionary Scott Cinnamond – TerraMeta Software Inc. http://cloudgraph.org
  • 2. What is Wordnet ? ® • Large complex lexical (MySql) database of English. • Nouns, verbs, adjectives and adverbs grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. • Synsets are interlinked by means of conceptual-semantic and lexical relations.
  • 3. HBase Conversion Steps http://wordnet.cloudgraph.org 1) Model Creation: reverse engineer Wordnet DB into UML® 2) Code Generation: provision persistence and query-DSL java code 3) HBase™ Table Mapping: map data graphs and row keys to table(s) 4) Data Migration: MySql to HBase 5) Services / App Creation: build services, web app
  • 4. 1.) Model Creation Reverse engineer Wordnet DB into PlasmaSDO™ UML® Model • Capture entities, properties, data types, associations, enumerations, comments as UML • Why UML? Popular standards-based format. Editable, viewable using standard tools. Supports enterprise governance processes • How? Maven build with plasma-maven-plugin RDB tool (goal:RDB, action:reverse, dialect:mysql) • Download working example at https://github.com/cloudgraph/wordnet
  • 5. Generated Wordnet Model (core subset of 30 total entities and enumerations)
  • 6. 2.) Code Generation Provision SDO persistence and query DSL java code • Generate Java API based on Wordnet UML Model • Why? Use across RDB, HBase, other CloudGraph Services. Compile time checking for queries, all persistence logic • How? Maven build with plasma-maven-plugin SDO and DSL tools • See generated API Javadocs on-line at http://wordnet.cloudgraph.org
  • 7. 3.) HBase™ Table Mapping Map data graphs and row keys to HBase™ table(s) • Configure delimited, hashed, salted, formatted, composite row keys with (xpath) paths into target data graphs • Map data graph roots to HBase tables • Why? Automates row-key creation via data extraction processing from anywhere in your data graphs • How? CloudGraph Configuration XML. See https://github.com/cloudgraph/wordnet
  • 8. 4.) Data Migration MySql to HBase • Create RDB-to-HBase standalone migration app using generated persistence and DSL query API incrementally call CloudGraph HBase and RDB services • Why? Wordnet data is large and highly connected, so must be incrementally extracted/inserted and linked
  • 9. 5.) Services / App Creation Build services, web app • Build simple pojo services using persistence and DSL query API • Encapsulate Wordnet business logic • Add adapter/wrapper structures • Call services called from web-app
  • 10. Web App http://wordnet.cloudgraph.org • Auto-complete field triggers CloudGraph HBase to use the HBase fuzzy row filter API • Find button returns all semantic and lexical relations for the selected word, including descriptions and example sentences • Resulting relation graphs typically contain more than 100 nodes and return in less than 200 milliseconds
  • 11. Conclusions • Complex, highly recursive RDB models can be easily converted and leveraged in HBase and future CloudGraph services • Large lexical data graphs can be returned in single query • Data migration difficult given complex recursive model
  • 12. Resources • Download the complete CloudGraph Wordnet example: https://github.com/cloudgraph/wordnet • Run the example online: http://wordnet.cloudgraph.org • Project details, contact information: http://cloudgraph.org • Beta Source Repo: https://github.com/terrameta/cloudgraph • Production Source Repo (under construction): https://github.com/cloudgraph
  • 13. Status / Legal • • • Project Status – CloudGraph ® is currently under private beta testing Licensing – CloudGraph ® 0.5.5 Community Edition (CE) is open source licensed under version 2 of the GNU General Public License Trademarks – WordNet ® is a registered trademark of Princeton University – Apache HBase™ is a trademark of Apache Software Foundation – CloudGraph ® is a trademark of TerraMeta Software LLC, TerraMeta Software Inc.