SlideShare uma empresa Scribd logo
1 de 15
© Hortonworks Inc. 2014
Designing and Testing
Accumulo Iterators
Josh Elser
Member of Technical Staff
PMC, Apache Accumulo
10, November 2015
Page 1
Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
© Hortonworks Inc. 2014
Design
Page 2
How do I know if my Iterator works?
What can I do in an Iterator?
How are these methods even called?!
© Hortonworks Inc. 2014
Common Patterns
Only a certain subset of algorithms fit into Accumulo
Iterators well.
(Avoid shoving a square peg into a round hole.)
• Filtering
• Reduction
• Bounded aggregations
–Keep an upper bound on the number of elements being
aggregated to avoid memory issues
• Transformations
–Key sort-order must be retained
–Best limited to the Value only
Page 3
© Hortonworks Inc. 2014
Design
Josh’s Iterator Design Principles:
•Always make forward-progress
•Think functional – Avoid unnecessary state
•Operate only on the data you have
•Do one thing and do it efficiently
Page 4
© Hortonworks Inc. 2014
Design
Page 5
Make
Forward
Progress
Start
End
© Hortonworks Inc. 2014
Think about your Iterator like a function
Unnecessary State
Page 6
def sum(list):
sum = 0
for entry in list:
sum += entry
return sum
• Avoid holding onto state when
at all possible.
• Think in terms of a stream
rather than chunks of data.
• Beware of memory implications
when performing aggregations.
© Hortonworks Inc. 2014
Operate locally
Daily Reminder: Iterators have no calls
for implementing a safe cleanup.
• Iterators cannot properly handle I/O-related issues to
external systems.
• Slow-external calls result in slow Accumulo.
• Some problems are more-safely implemented outside of
an Accumulo Iterators. Not a Coprocessor/Container.
Page 7
© Hortonworks Inc. 2014
Simplicity
Avoid doing multiple things in a
single Iterator.
•Object Oriented Design 101
•Iterators can be tricky to debug on their own
•Configuring multiple iterators are a feature
Page 8
© Hortonworks Inc. 2014
Testing
You should always test your code
before running it in any environment
to ensure that it functions as intended.
Page 9
© Hortonworks Inc. 2014
Testing
HOW?
Page 10
© Hortonworks Inc. 2014
Testing
A framework designed for testing Iterators given input,
a Range, options, and expected output.
Page 11
© Hortonworks Inc. 2014
Testing
Page 12
Test
Test
Test
Test
Test
Iterator Class
Range
Iterator Options
Sorted Input Data
Verification of
output records
OR
True/False check
User-Provided
Framework
© Hortonworks Inc. 2014
Features
•Auto-Discovery of test cases
•JUnit Parameterized test integration
•Provided Generic Tests
–Default Constructor
–Re-Seek (teardown)
–Deep Copy Verification
Page 13
© Hortonworks Inc. 2014
Future Work
•More Iterator tests!
•A final resting place for the code
•Documentation
•Usability testing
Page 14
https://issues.apache.org/jira/browse/ACCUMULO-626
© Hortonworks Inc. 2014
Thanks!
jelser@hortonworks.com
Page 15

Mais conteúdo relacionado

Mais procurados

Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BIDataWorks Summit
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impalaNAVER D2
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep divehuguk
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera, Inc.
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperJeff Smith
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentationhadooparchbook
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Cloudera, Inc.
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasyDataWorks Summit
 

Mais procurados (19)

Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Cloudera Impala technical deep dive
Cloudera Impala technical deep diveCloudera Impala technical deep dive
Cloudera Impala technical deep dive
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache HadoopCloudera Impala: A Modern SQL Engine for Apache Hadoop
Cloudera Impala: A Modern SQL Engine for Apache Hadoop
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Cloudera impala
Cloudera impalaCloudera impala
Cloudera impala
 
Impala Architecture presentation
Impala Architecture presentationImpala Architecture presentation
Impala Architecture presentation
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
SAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made EasySAM - Streaming Analytics Made Easy
SAM - Streaming Analytics Made Easy
 

Semelhante a Designing and Testing Accumulo Iterators

The View - Lotusscript coding best practices
The View - Lotusscript coding best practicesThe View - Lotusscript coding best practices
The View - Lotusscript coding best practicesBill Buchan
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018Rohan Rasane
 
Oracle Database Lifecycle Management
Oracle Database Lifecycle ManagementOracle Database Lifecycle Management
Oracle Database Lifecycle ManagementHari Srinivasan
 
Coherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gemsCoherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gemsharvraja
 
Oracle Open World Exadata Monitoring and Management with EM12c
Oracle Open World Exadata Monitoring and Management with EM12cOracle Open World Exadata Monitoring and Management with EM12c
Oracle Open World Exadata Monitoring and Management with EM12cKellyn Pot'Vin-Gorman
 
Using MySQL Enterprise Monitor for Continuous Performance Improvement
Using MySQL Enterprise Monitor for Continuous Performance ImprovementUsing MySQL Enterprise Monitor for Continuous Performance Improvement
Using MySQL Enterprise Monitor for Continuous Performance ImprovementMark Matthews
 
Agile Software Testing the Agilogy Way
Agile Software Testing the Agilogy WayAgile Software Testing the Agilogy Way
Agile Software Testing the Agilogy WayJordi Pradel
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_finalRamya Sunil
 
Improving the Design of Existing Software
Improving the Design of Existing SoftwareImproving the Design of Existing Software
Improving the Design of Existing SoftwareSteven Smith
 
Practical Software Testing Tools
Practical Software Testing ToolsPractical Software Testing Tools
Practical Software Testing ToolsDr Ganesh Iyer
 
Load testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew SiemerLoad testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew SiemerAndrew Siemer
 
QA standup - workload analysis
QA standup  - workload analysisQA standup  - workload analysis
QA standup - workload analysisSerghei Radov
 
EARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements SyntaxEARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements SyntaxTechWell
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven developmentEinar Ingebrigtsen
 
So You Think You Can Write a Test Case - XBOSoft Webinar
So You Think You Can Write a Test Case - XBOSoft WebinarSo You Think You Can Write a Test Case - XBOSoft Webinar
So You Think You Can Write a Test Case - XBOSoft WebinarXBOSoft
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to WorkSingleStore
 
Leandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightLeandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightNeotys_Partner
 
Improving The Quality of Existing Software
Improving The Quality of Existing SoftwareImproving The Quality of Existing Software
Improving The Quality of Existing SoftwareSteven Smith
 

Semelhante a Designing and Testing Accumulo Iterators (20)

The View - Lotusscript coding best practices
The View - Lotusscript coding best practicesThe View - Lotusscript coding best practices
The View - Lotusscript coding best practices
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Oracle Database Lifecycle Management
Oracle Database Lifecycle ManagementOracle Database Lifecycle Management
Oracle Database Lifecycle Management
 
Coherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gemsCoherence 12.1.3 hidden gems
Coherence 12.1.3 hidden gems
 
Oracle Open World Exadata Monitoring and Management with EM12c
Oracle Open World Exadata Monitoring and Management with EM12cOracle Open World Exadata Monitoring and Management with EM12c
Oracle Open World Exadata Monitoring and Management with EM12c
 
Using MySQL Enterprise Monitor for Continuous Performance Improvement
Using MySQL Enterprise Monitor for Continuous Performance ImprovementUsing MySQL Enterprise Monitor for Continuous Performance Improvement
Using MySQL Enterprise Monitor for Continuous Performance Improvement
 
Developer day v2
Developer day v2Developer day v2
Developer day v2
 
Agile Software Testing the Agilogy Way
Agile Software Testing the Agilogy WayAgile Software Testing the Agilogy Way
Agile Software Testing the Agilogy Way
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
 
WMS Overview
WMS OverviewWMS Overview
WMS Overview
 
Improving the Design of Existing Software
Improving the Design of Existing SoftwareImproving the Design of Existing Software
Improving the Design of Existing Software
 
Practical Software Testing Tools
Practical Software Testing ToolsPractical Software Testing Tools
Practical Software Testing Tools
 
Load testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew SiemerLoad testing with Visual Studio and Azure - Andrew Siemer
Load testing with Visual Studio and Azure - Andrew Siemer
 
QA standup - workload analysis
QA standup  - workload analysisQA standup  - workload analysis
QA standup - workload analysis
 
EARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements SyntaxEARS: The Easy Approach to Requirements Syntax
EARS: The Easy Approach to Requirements Syntax
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven development
 
So You Think You Can Write a Test Case - XBOSoft Webinar
So You Think You Can Write a Test Case - XBOSoft WebinarSo You Think You Can Write a Test Case - XBOSoft Webinar
So You Think You Can Write a Test Case - XBOSoft Webinar
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to Work
 
Leandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & RightLeandro Melendez - Switching Performance Left & Right
Leandro Melendez - Switching Performance Left & Right
 
Improving The Quality of Existing Software
Improving The Quality of Existing SoftwareImproving The Quality of Existing Software
Improving The Quality of Existing Software
 

Mais de Josh Elser

Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBaseJosh Elser
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query ServerJosh Elser
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsJosh Elser
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Josh Elser
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerJosh Elser
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloJosh Elser
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010Josh Elser
 

Mais de Josh Elser (10)

Practical Kerberos with Apache HBase
Practical Kerberos with Apache HBasePractical Kerberos with Apache HBase
Practical Kerberos with Apache HBase
 
Apache Phoenix Query Server
Apache Phoenix Query ServerApache Phoenix Query Server
Apache Phoenix Query Server
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Effective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo IteratorsEffective Testing of Apache Accumulo Iterators
Effective Testing of Apache Accumulo Iterators
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
De-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServerDe-Mystifying the Apache Phoenix QueryServer
De-Mystifying the Apache Phoenix QueryServer
 
Data-Center Replication with Apache Accumulo
Data-Center Replication with Apache AccumuloData-Center Replication with Apache Accumulo
Data-Center Replication with Apache Accumulo
 
RPInventory 2-25-2010
RPInventory 2-25-2010RPInventory 2-25-2010
RPInventory 2-25-2010
 

Último

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 

Último (20)

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 

Designing and Testing Accumulo Iterators

  • 1. © Hortonworks Inc. 2014 Designing and Testing Accumulo Iterators Josh Elser Member of Technical Staff PMC, Apache Accumulo 10, November 2015 Page 1 Apache, Accumulo, and Apache Accumulo are trademarks of the Apache Software Foundation.
  • 2. © Hortonworks Inc. 2014 Design Page 2 How do I know if my Iterator works? What can I do in an Iterator? How are these methods even called?!
  • 3. © Hortonworks Inc. 2014 Common Patterns Only a certain subset of algorithms fit into Accumulo Iterators well. (Avoid shoving a square peg into a round hole.) • Filtering • Reduction • Bounded aggregations –Keep an upper bound on the number of elements being aggregated to avoid memory issues • Transformations –Key sort-order must be retained –Best limited to the Value only Page 3
  • 4. © Hortonworks Inc. 2014 Design Josh’s Iterator Design Principles: •Always make forward-progress •Think functional – Avoid unnecessary state •Operate only on the data you have •Do one thing and do it efficiently Page 4
  • 5. © Hortonworks Inc. 2014 Design Page 5 Make Forward Progress Start End
  • 6. © Hortonworks Inc. 2014 Think about your Iterator like a function Unnecessary State Page 6 def sum(list): sum = 0 for entry in list: sum += entry return sum • Avoid holding onto state when at all possible. • Think in terms of a stream rather than chunks of data. • Beware of memory implications when performing aggregations.
  • 7. © Hortonworks Inc. 2014 Operate locally Daily Reminder: Iterators have no calls for implementing a safe cleanup. • Iterators cannot properly handle I/O-related issues to external systems. • Slow-external calls result in slow Accumulo. • Some problems are more-safely implemented outside of an Accumulo Iterators. Not a Coprocessor/Container. Page 7
  • 8. © Hortonworks Inc. 2014 Simplicity Avoid doing multiple things in a single Iterator. •Object Oriented Design 101 •Iterators can be tricky to debug on their own •Configuring multiple iterators are a feature Page 8
  • 9. © Hortonworks Inc. 2014 Testing You should always test your code before running it in any environment to ensure that it functions as intended. Page 9
  • 10. © Hortonworks Inc. 2014 Testing HOW? Page 10
  • 11. © Hortonworks Inc. 2014 Testing A framework designed for testing Iterators given input, a Range, options, and expected output. Page 11
  • 12. © Hortonworks Inc. 2014 Testing Page 12 Test Test Test Test Test Iterator Class Range Iterator Options Sorted Input Data Verification of output records OR True/False check User-Provided Framework
  • 13. © Hortonworks Inc. 2014 Features •Auto-Discovery of test cases •JUnit Parameterized test integration •Provided Generic Tests –Default Constructor –Re-Seek (teardown) –Deep Copy Verification Page 13
  • 14. © Hortonworks Inc. 2014 Future Work •More Iterator tests! •A final resting place for the code •Documentation •Usability testing Page 14 https://issues.apache.org/jira/browse/ACCUMULO-626
  • 15. © Hortonworks Inc. 2014 Thanks! jelser@hortonworks.com Page 15