SlideShare uma empresa Scribd logo
1 de 11
Apache Gora
●

What is it ?

●

Gora – Nutch

●

Supports

●

Data Access

●

API's

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – What is it ?
●

Provides for Big Data
–
–

Persistence

–
●

In memory data model
Data store abstraction

Supports persisting to
–
–

Key/value stores

–

Document stores

–
●

Column stores

RDBMS's

Supports use of Hadoop

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – What is it ?
●

Released via Apache 2 license

●

Written in Java

●

Offers a persistence framework

●

Designed for big data applications

●

Used by Nutch 2.x for web crawl data storage

●

Used for
–

Persistence

–

Indexing

–

Analytics

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Nutch
●

Nutch 2.x now uses Gora
–

Abstracted storage

–

Data store independence

–

Handles object to persistent mappings

–

Use various NoSql solutions

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Supports
●

Gora supports the following
–

Apache Accumulo

–

Apache Cassandra

–

Apache Hbase

–

Amazon DynamoDB

–

Pig

–

Hive

–

Cascading

–

MapReduce

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Data Access
●

Java API for data access
–

●

Independent of location

Core Gora API's
–

Store

–

Persistency

–

Query

–

MapReduce

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Store API
●

Java API – org.apache.gora.store.*
–

DataStore handles object persistence

–

DataStore methods process objects
●
●
●
●

Persist
Fetch
Query
Delete

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Persistency API
●

Java API – org.apache.gora.persistency.*
–

Core classes
●

●

●

BeanFactory
– Construct keys
Persistent
– Persist objects
State
– State managed through StateManager
–
–

NEW, CLEAN (UNMODIFIED)
DIRTY (MODIFIED), DELETED

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – Query API
●

Java API – org.apache.gora.query.*
–

Core classes
●

●

●

Query
– Constructed via DataStore
PartitionQuery
– Divide results of Query into partitions.
– Run queries on data nodes.
– Generate Hadoop InputSplits
Result

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Apache Gora – MapReduce API
●

Java API – org.apache.gora.mapreduce.*
–

GoraMapper

–

GoraReducer

–

ALL Record Counter

–

Reader

–

Writer

–

Hadoop / Avro
●
●
●

Serialise
De-serialise
Persistent

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz
Contact Us
●

Feel free to contact us at
–

www.semtech-solutions.co.nz

–

info@semtech-solutions.co.nz

●

We offer IT project consultancy

●

We are happy to hear about your problems

●

You can just pay for those hours that you need

●

To solve your problems

Mais conteúdo relacionado

Mais de Mike Frampton

Mais de Mike Frampton (20)

Apache Singa AI
Apache Singa AIApache Singa AI
Apache Singa AI
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
OrientDB
OrientDBOrientDB
OrientDB
 
Prometheus
PrometheusPrometheus
Prometheus
 
Apache Tephra
Apache TephraApache Tephra
Apache Tephra
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
JanusGraph DB
JanusGraph DBJanusGraph DB
JanusGraph DB
 
Apache Ignite
Apache IgniteApache Ignite
Apache Ignite
 
Apache Samza
Apache SamzaApache Samza
Apache Samza
 
Apache Flink
Apache FlinkApache Flink
Apache Flink
 
Apache Edgent
Apache EdgentApache Edgent
Apache Edgent
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
An introduction to Apache Mesos
An introduction to Apache MesosAn introduction to Apache Mesos
An introduction to Apache Mesos
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to Pentaho
 
An introduction to Apache Thrift
An introduction to Apache ThriftAn introduction to Apache Thrift
An introduction to Apache Thrift
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache Cassandra
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 

Último

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Último (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

An introduction to Apache Gora

  • 1. Apache Gora ● What is it ? ● Gora – Nutch ● Supports ● Data Access ● API's www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Apache Gora – What is it ? ● Provides for Big Data – – Persistence – ● In memory data model Data store abstraction Supports persisting to – – Key/value stores – Document stores – ● Column stores RDBMS's Supports use of Hadoop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Apache Gora – What is it ? ● Released via Apache 2 license ● Written in Java ● Offers a persistence framework ● Designed for big data applications ● Used by Nutch 2.x for web crawl data storage ● Used for – Persistence – Indexing – Analytics www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Apache Gora – Nutch ● Nutch 2.x now uses Gora – Abstracted storage – Data store independence – Handles object to persistent mappings – Use various NoSql solutions www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Apache Gora – Supports ● Gora supports the following – Apache Accumulo – Apache Cassandra – Apache Hbase – Amazon DynamoDB – Pig – Hive – Cascading – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Apache Gora – Data Access ● Java API for data access – ● Independent of location Core Gora API's – Store – Persistency – Query – MapReduce www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Apache Gora – Store API ● Java API – org.apache.gora.store.* – DataStore handles object persistence – DataStore methods process objects ● ● ● ● Persist Fetch Query Delete www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Apache Gora – Persistency API ● Java API – org.apache.gora.persistency.* – Core classes ● ● ● BeanFactory – Construct keys Persistent – Persist objects State – State managed through StateManager – – NEW, CLEAN (UNMODIFIED) DIRTY (MODIFIED), DELETED www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 9. Apache Gora – Query API ● Java API – org.apache.gora.query.* – Core classes ● ● ● Query – Constructed via DataStore PartitionQuery – Divide results of Query into partitions. – Run queries on data nodes. – Generate Hadoop InputSplits Result www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 10. Apache Gora – MapReduce API ● Java API – org.apache.gora.mapreduce.* – GoraMapper – GoraReducer – ALL Record Counter – Reader – Writer – Hadoop / Avro ● ● ● Serialise De-serialise Persistent www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 11. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems