SlideShare uma empresa Scribd logo
1 de 13
Accumulo: A Quick Introduction
James Salter
25 July 2013
About Me
• James Salter
• Former: PhD, University of Surrey
▫ Resource discovery in peer-to-peer networks
▫ Recommender systems
• Current: Applied Researcher
▫ Data mining algorithms, information fusion
▫ Hadoop
▫ Large graphs
▫ “other interesting things”
Outline
• What is Accumulo?
• Comparison with Relational Databases
• Architecture
• Potential Applications
Apache Hadoop
• Framework for distributed computing
• Clusters of commodity machines
• MapReduce
▫ Best-known sub-project
▫ Batch processing of bulk data
▫ (Potentially) large files of output
What is Accumulo?
• A distributed key/value store
▫ Runs in parallel across a Hadoop cluster
• Very scalable
▫ trillions of records, 10s of Petabytes of data
• Cell level security
▫ Every data item has a security label
• Open source version of Google’s BigTable
▫ Original development by NSA
▫ Now a top-level Apache project
Relational schema to Accumulo
CustName Birthday Phone
Alice 12/03/45 794838
Bob 09/09/67
Mary 23/04/83 975838
CustName ItemID Quantity
Alice 17 1
Alice 89 5
Bob 92 1
Mary 12 1
ItemID ItemName
12 DVD
17 Magazine
89 Ticket
92 Shirt
CustName Birthday Phone DVD Magazine Ticket Shirt
Alice 12/03/45 794838 1 5
Bob 09/09/67 1
Mary 23/04/83 975838 1
Relational schema to Accumulo
Row,Column Value
{Alice,Birthday} 12/03/45
{Alice,Phone} 794838
{Alice,Magazine} 1
{Alice,Ticket} 5
{Bob,Birthday} 09/09/67
{Bob,Shirt} 1
... ...
nulls are
not stored
easy to add
new columns
e.g. {Bob,Book}
CustName Birthday Phone DVD Magazine Ticket Shirt
Alice 12/03/45 794838 1 5
Bob 09/09/67 1
Mary 23/04/83 975838 1
Table Structure
• Tables contain key/value pairs sorted by key
• Split into tablets, distributed across a cluster
▫ Tablets reflect a portion of the table’s keyspace
Key Value
{Alice,Birthday} 12/03/45
{Alice,Magazine} 1
{Alice,Phone} 794838
{Alice,Ticket} 5 Key Value
{Bob,Birthday} 09/09/67
{Bob,Shirt} 1
... ...
Tablet Server
• Hosts one or more tablets
▫ Not necessarily for the same table
• Tablets store references to ISAM (Indexed
Sequential Access Method) files in HDFS
▫ Key/values stored in ISAM files
Tablet Server
Tablet
Table A
RowIDs g-n
Tablet
Table F
RowIDs a-c
Tablet
Table J
RowIDs x-zz
HDFS
ISAM
File
ISAM
File
ISAM
File
Master
• Detects Tablet Server failures
▫ Migrates tablets to other Tablet Servers
• Responsible for load balancing
▫ Assigns tablets to Tablet Servers
▫ Instructs Tablet Servers to migrate tablets
Potential Applications
• Massive datastore
▫ Interactive retrieval of MapReduce results
• Graph database/graph mining
▫ Data input to Google Pregel clones (e.g. Giraph)
• Machine learning/classification
▫ Good for storing sparse feature vectors
• Not good for applications involving JOIN
▫ Limited joins possible – Intersecting Iterator
▫ Combine with Hive, Impala, etc.
Conclusion
• Accumulo is a key-value datastore
• Data layout very different from Relational DBs
• Distributed architecture on top of Hadoop
• Many uses aside from “just” a simple store
Accumulo: A Quick Introduction

Mais conteúdo relacionado

Mais procurados

A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primersogrady
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoopSandeep Patil
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysisRaminder Singh
 
Apache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom WhiteApache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom Whitetomwhite
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceDan Pilone
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Nathan Bijnens
 
reddit genie
reddit geniereddit genie
reddit genieMark Wang
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoopryanlecompte
 
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopSWAMI06
 
new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016Rajesh Kumar
 
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...StampedeCon
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptxMoldovan Radu Adrian
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Big data references
Big data referencesBig data references
Big data referenceszarigatongy
 
Edanz journal selector case study a prototype based on solr nutch hadoop
Edanz journal selector case study a prototype based on solr nutch hadoopEdanz journal selector case study a prototype based on solr nutch hadoop
Edanz journal selector case study a prototype based on solr nutch hadooplucenerevolution
 

Mais procurados (20)

A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
 
Introduction to Big Data and hadoop
Introduction to Big Data and hadoopIntroduction to Big Data and hadoop
Introduction to Big Data and hadoop
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
 
Introduction to Bigdata & Hadoop
Introduction to Bigdata & HadoopIntroduction to Bigdata & Hadoop
Introduction to Bigdata & Hadoop
 
Apache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom WhiteApache Con Eu2008 Hadoop Tour Tom White
Apache Con Eu2008 Hadoop Tour Tom White
 
ESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of ConvenienceESIP 2018 - The Case for Archives of Convenience
ESIP 2018 - The Case for Archives of Convenience
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
 
reddit genie
reddit geniereddit genie
reddit genie
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Big data
Big dataBig data
Big data
 
Bigdata
BigdataBigdata
Bigdata
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
 
Frequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoopFrequent itemset mining_on_hadoop
Frequent itemset mining_on_hadoop
 
new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016new_Rajesh_Hadoop Developer_2016
new_Rajesh_Hadoop Developer_2016
 
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
Welcome to the Jungle: Distributed Systems for Large Data Sets - StampedeCon ...
 
INTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATAINTRODUCTION OF BIG DATA
INTRODUCTION OF BIG DATA
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Big data references
Big data referencesBig data references
Big data references
 
Edanz journal selector case study a prototype based on solr nutch hadoop
Edanz journal selector case study a prototype based on solr nutch hadoopEdanz journal selector case study a prototype based on solr nutch hadoop
Edanz journal selector case study a prototype based on solr nutch hadoop
 

Semelhante a Accumulo: A Quick Introduction

Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at myliferesponseteam
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoopGeoff Hendrey
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data ModelingAdam Doyle
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesUri Laserson
 
12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptxShree Shree
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSpraveen bhat
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopRojaT4
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptwondimagegndesta
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopMedia Gorod
 

Semelhante a Accumulo: A Quick Introduction (20)

Map reduce and hadoop at mylife
Map reduce and hadoop at mylifeMap reduce and hadoop at mylife
Map reduce and hadoop at mylife
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
2013 year of real-time hadoop
2013 year of real-time hadoop2013 year of real-time hadoop
2013 year of real-time hadoop
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx12-BigDataMapReduce.pptx
12-BigDataMapReduce.pptx
 
Hadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFSHadoop/MapReduce/HDFS
Hadoop/MapReduce/HDFS
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
 

Mais de James Salter

Security for The Machine: By Design
Security for The Machine: By DesignSecurity for The Machine: By Design
Security for The Machine: By DesignJames Salter
 
The Machine - a vision for the future of computing
The Machine - a vision for the future of computingThe Machine - a vision for the future of computing
The Machine - a vision for the future of computingJames Salter
 
Big data ... for security
Big data ... for securityBig data ... for security
Big data ... for securityJames Salter
 
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...James Salter
 
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer NetworksINC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer NetworksJames Salter
 
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...James Salter
 
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...James Salter
 
Agents and P2P Networks
Agents and P2P NetworksAgents and P2P Networks
Agents and P2P NetworksJames Salter
 
Lecture - Network Technologies: Peer-to-Peer Networks
Lecture - Network Technologies: Peer-to-Peer NetworksLecture - Network Technologies: Peer-to-Peer Networks
Lecture - Network Technologies: Peer-to-Peer NetworksJames Salter
 
Lecture: Software Agents and P2P
Lecture: Software Agents and P2PLecture: Software Agents and P2P
Lecture: Software Agents and P2PJames Salter
 
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsINC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsJames Salter
 

Mais de James Salter (11)

Security for The Machine: By Design
Security for The Machine: By DesignSecurity for The Machine: By Design
Security for The Machine: By Design
 
The Machine - a vision for the future of computing
The Machine - a vision for the future of computingThe Machine - a vision for the future of computing
The Machine - a vision for the future of computing
 
Big data ... for security
Big data ... for securityBig data ... for security
Big data ... for security
 
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
An Efficient Reactive Model for Resource Discovery in DHT-Based Peer-to-Peer ...
 
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer NetworksINC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
INC 2005 - ROME: Optimising DHT-based Peer-to-Peer Networks
 
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
PDPTA 05 Poster: ROME: Optimising Lookup and Load-Balancing in DHT-Based P2P ...
 
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
FCS 05: A Multi-Ring Method for Efficient Multi-Dimensional Data Lookup in P2...
 
Agents and P2P Networks
Agents and P2P NetworksAgents and P2P Networks
Agents and P2P Networks
 
Lecture - Network Technologies: Peer-to-Peer Networks
Lecture - Network Technologies: Peer-to-Peer NetworksLecture - Network Technologies: Peer-to-Peer Networks
Lecture - Network Technologies: Peer-to-Peer Networks
 
Lecture: Software Agents and P2P
Lecture: Software Agents and P2PLecture: Software Agents and P2P
Lecture: Software Agents and P2P
 
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsINC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
 

Último

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Último (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Accumulo: A Quick Introduction

  • 1. Accumulo: A Quick Introduction James Salter 25 July 2013
  • 2. About Me • James Salter • Former: PhD, University of Surrey ▫ Resource discovery in peer-to-peer networks ▫ Recommender systems • Current: Applied Researcher ▫ Data mining algorithms, information fusion ▫ Hadoop ▫ Large graphs ▫ “other interesting things”
  • 3. Outline • What is Accumulo? • Comparison with Relational Databases • Architecture • Potential Applications
  • 4. Apache Hadoop • Framework for distributed computing • Clusters of commodity machines • MapReduce ▫ Best-known sub-project ▫ Batch processing of bulk data ▫ (Potentially) large files of output
  • 5. What is Accumulo? • A distributed key/value store ▫ Runs in parallel across a Hadoop cluster • Very scalable ▫ trillions of records, 10s of Petabytes of data • Cell level security ▫ Every data item has a security label • Open source version of Google’s BigTable ▫ Original development by NSA ▫ Now a top-level Apache project
  • 6. Relational schema to Accumulo CustName Birthday Phone Alice 12/03/45 794838 Bob 09/09/67 Mary 23/04/83 975838 CustName ItemID Quantity Alice 17 1 Alice 89 5 Bob 92 1 Mary 12 1 ItemID ItemName 12 DVD 17 Magazine 89 Ticket 92 Shirt CustName Birthday Phone DVD Magazine Ticket Shirt Alice 12/03/45 794838 1 5 Bob 09/09/67 1 Mary 23/04/83 975838 1
  • 7. Relational schema to Accumulo Row,Column Value {Alice,Birthday} 12/03/45 {Alice,Phone} 794838 {Alice,Magazine} 1 {Alice,Ticket} 5 {Bob,Birthday} 09/09/67 {Bob,Shirt} 1 ... ... nulls are not stored easy to add new columns e.g. {Bob,Book} CustName Birthday Phone DVD Magazine Ticket Shirt Alice 12/03/45 794838 1 5 Bob 09/09/67 1 Mary 23/04/83 975838 1
  • 8. Table Structure • Tables contain key/value pairs sorted by key • Split into tablets, distributed across a cluster ▫ Tablets reflect a portion of the table’s keyspace Key Value {Alice,Birthday} 12/03/45 {Alice,Magazine} 1 {Alice,Phone} 794838 {Alice,Ticket} 5 Key Value {Bob,Birthday} 09/09/67 {Bob,Shirt} 1 ... ...
  • 9. Tablet Server • Hosts one or more tablets ▫ Not necessarily for the same table • Tablets store references to ISAM (Indexed Sequential Access Method) files in HDFS ▫ Key/values stored in ISAM files Tablet Server Tablet Table A RowIDs g-n Tablet Table F RowIDs a-c Tablet Table J RowIDs x-zz HDFS ISAM File ISAM File ISAM File
  • 10. Master • Detects Tablet Server failures ▫ Migrates tablets to other Tablet Servers • Responsible for load balancing ▫ Assigns tablets to Tablet Servers ▫ Instructs Tablet Servers to migrate tablets
  • 11. Potential Applications • Massive datastore ▫ Interactive retrieval of MapReduce results • Graph database/graph mining ▫ Data input to Google Pregel clones (e.g. Giraph) • Machine learning/classification ▫ Good for storing sparse feature vectors • Not good for applications involving JOIN ▫ Limited joins possible – Intersecting Iterator ▫ Combine with Hive, Impala, etc.
  • 12. Conclusion • Accumulo is a key-value datastore • Data layout very different from Relational DBs • Distributed architecture on top of Hadoop • Many uses aside from “just” a simple store