O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

HBaseCon 2013: General Session

2.298 visualizações

Publicada em

Welcome!
Michael Stack, Software Engineer, Cloudera & HBase PMC Chair
9:00-9:05am
Conference MC Michael Stack, Chair of the HBaseCon 2013 Program Committee, welcomes you to the conference and offers a preview of the day.

The Apache HBase Community: Best Ever and Getting Better
Amr Awadallah, CTO and Co-founder, Cloudera
9:05-9:15am
Amr comments on the explosion of interest in Apache HBase over the past few years, how that interest has influenced the Hadoop stack overall, and why Cloudera considers its involvement in the HBase community to be so important.

State of the Apache HBase Union
Michael Stack & Lars Hofhansl, Architect, Salesforce.com
9:15-9:40am
Release-managers-in-crime Michael and Lars offer a look back, and a look forward, at HBase releases and what they have brought us (and will bring us in the future).

The Apache HBase Ecosystem
Aaron Kimball, Chief Architect, WibiData
9:40-10:05am
Today, HBase stands as Apache Hadoop did years ago, a project with a growing and vibrant community in its own right. In this talk, Aaron will overview some of the projects built on top of HBase that you’ll get a chance to learn about during the day – each of these projects having grown out of a need to use HBase for an application that requires real-time atomic access to data. As an example, he’ll present the motivations for Kiji and how it is helping organizations create amazing new applications using HBase and Hadoop.

Overview of Apache HBase at Facebook (Slides Not Available)
Liyin Tang, Software Engineer, Facebook & HBase PMC Member
10:05-10:30am
In this keynote, you’ll get an overview of how HBase is used at Facebook. Explore Facebook’s applications using HBase as an OLTP service, which require high reliability, efficiency, and scalability, and how HBase can tolerate small network glitches and rack failures. You’ll also learn the use cases for adopting HBase as a batch processing service and various optimizations to scale processing throughput. Finally, learn Facebook’s thoughts about the future of HBase.

Publicada em: Tecnologia

HBaseCon 2013: General Session

  1. 1. Hosted by Welcome Michael Stack, Software Engineer, Cloudera & HBase PMC Chair
  2. 2. Goals of HBaseCon 2013 Bring the Apache HBase community together Encourage contributions to the HBase ecosystem Share challenges and solutions for HBase 1 2 3
  3. 3. HBaseCon 2013 Session Tracks Operations Internals Ecosystem Case Studies Session Track 3Track 1 Track 2 Track 3 Track 4
  4. 4. HBaseCon 2013 Program Committee Gary Helmling Lars Hofhansl Jonathan Hsieh Doug Meil Andrew Purtell Enis Söztutar Michael Stack – Chair Liyin Tang Architect Engineer Software Engineer Chief Software Architect Systems Architect Member of Technical Staff Software Engineer Software Engineer
  5. 5. Thank You to Our Sponsors Community Sponsor Conference Sponsors Media Sponsors
  6. 6. Visit Sponsors = Chance to Win =
  7. 7. Conference Notes • Please fill out the overall conference survey • Reception is 5:40pm – 8:00pm in the Yerba Buena Foyer • Connecting to the internet • Wireless network = Marriott Conference • Passcode = db075b
  8. 8. Hosted by The Apache HBase Community: Best Ever and Getting Better Amr Awadallah, CTO and Co-founder, Cloudera @awadallah
  9. 9. The Apache HBase Community Has Never Been Healthier JIRA ActivityCommits Activity
  10. 10. The Market for HBase Skills is Bigger than Ever
  11. 11. The HBase Ecosystem is Rich and Expanding HBaseCon 2013 speakers from these companies this year (logos below the dotted line are net-new from 2012!)
  12. 12. Top 5 Reasons Cloudera Loves HBase Its vibrant community is a benchmark for the entire Apache Hadoop ecosystem. It’s a first-class citizen inside the Hadoop stack. It allows us to offer support services for which a lot of customers will pay good money. It draws top-drawer engineer talent to Cloudera. It gives us an excuse to host this tremendous conference and throw a big party for the community! 1 2 3 4 5
  13. 13. Hosted by Thank You for Attending HBaseCon and Thank You for Contributing!
  14. 14. Hosted by State of the Apache HBase Union Michael Stack, Software Engineer, Cloudera & HBase PMC Chair and Lars Hofhansl, Architect, Salesforce.com
  15. 15. We are your Release Managers! • Mr. (0.94.x) Lars Hofhansl • Michael Stack (0.95.x/0.96.x)
  16. 16. Introducing...
  17. 17. Your PMC...
  18. 18. Your Committers...
  19. 19. MVP …and the award goes to…
  20. 20. Diverse Team* *http://hbase.apache.org/team-list.html
  21. 21. Deploys • Multitenant multifarious feature store • a.k.a dumping ground • Stumbleupon, Y!, Salesforce • Reconciliation store • eBay • Timeseries • OpenTSDB, Salesforce, FB ODS • Lots-o-entities store • Flurry, Kiji, Genome • Lots-o-entities BLOBs, FB Messages
  22. 22. OLTP & OLAP
  23. 23. Dev Rate
  24. 24. # of Commits Total Files 2021 Total Lines of Code 832122 Total Commits 6615 Authors 39
  25. 25. JIRA: 2008-2013
  26. 26. JIRA: Adoption
  27. 27. JIRA: Opened vs Closed
  28. 28. New Committers (by First Commit) vs. Active Committers (One Commit/Month)
  29. 29. Commits/Month over Time (0.94/trunk)
  30. 30. • 419 jiras in 0.94.0 • 660 jiras in 0.94.1 – 0.94.8
  31. 31. • Frequent, small releases • Train model • 4–6 week cycles
  32. 32. • Wire compatible between releases • Upgrade possible to any point release • Test stability
  33. 33. • Focus on: • Performance (FB, Salesforce) • Stability
  34. 34. http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/
  35. 35. >1000 So far...
  36. 36. http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/
  37. 37. http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/
  38. 38. • Hadoop HDFS Fixes • Faster Recovery • Detection • Replay • Assign
  39. 39. • System tables • Filesystem • Up in zookeeper • Over the wire
  40. 40. RPC • Implements protobuf service • Specification! • Data on the side • Encoding • Compression PB DATA
  41. 41. Snapshots • By table • Snapshot, clone, restore, export • Inexpensive • Just metadata • Good for... • ackups • Replication • Offline processing
  42. 42. Compactions • Pluggable • Tiered • Striped • Trigger
  43. 43. Tests • Cluster test module • Standalone or cluster • Sizeable • x data • x runtime • “Borrows” test types from all over • Netflix “ChaosMonkey” • Apache Accumulo linked-list dataloss checker
  44. 44. Miscellaneous • Smarter load balancer • Revamped • Metrics • UI • Etc. • Hadoop 1.x and 2.x
  45. 45. HBase Ecosystem
  46. 46. Chasm
  47. 47. kiji.org • Entity-centric, simple model • Types, complex, compound types • Each cell is schema versioned • Works across MR & REST, etc. • Production users • Open-source
  48. 48. SQL • “A SQL skin over HBase” • Coprocessors, custom filters, jdbc driver • https://github.com/forcedotcom/phoenix Phoenix
  49. 49. 1.0? Next
  50. 50. Related: QoS Next • Latency resilience/”Latency tolerance”* • Bring home the outliers * http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf
  51. 51. Next Faster scans (OLAP)
  52. 52. Next • More databasey!!! • Statistics • 2ndary indexing • Take 3 • Types • Serialization • Keep sort order
  53. 53. larsh@apache.org stack@apache.org
  54. 54. Hosted by The Apache HBase Ecosystem Aaron Kimball, Chief Architect, WibiData
  55. 55. About me
  56. 56. In the beginning… There was a search engine … and when building indexes got too hard Along came an elephant to help push:
  57. 57. The ASF is an ecosystem
  58. 58. And so is Hadoop
  59. 59. HBase: The new ecosystem Phoenix
  60. 60. Now powering…
  61. 61. Big Data Applications Customer RelationsMobile Web Applications Hadoop & HBase Storage Analytics Serving Real-time model scoring Investigative analytics
  62. 62. Major application targets MapReduce, Cascading, Crunch…
  63. 63. Big Data Apps are hard to build • Serialization & versioning • Deployment • Communication between teams • Front end, back end, short request, batch, real time… Every Java developer should be able to build Big Data Apps – today it’s too hard.
  64. 64. Kiji Kiji is designed to help you build real-time Big Data Applications on Apache HBase + + 100% Apache 2 licensed
  65. 65. Kiji architecture
  66. 66. Leading design decisions • Store your data in HBase • Encode it using Avro • An entity-centric table design • Manage a data dictionary around tables • Distribute writes across the cluster
  67. 67. Key features • Work with big data in rich types with schema evolution • Guides users to successful application design • Scala-based modeling language • Integration with front-end systems • Deployment of real-time model scoring
  68. 68. Kiji • Go to kiji.org and download the BentoBox – Zero-config Hadoop + HBase + Kiji instance – “Batteries included” • 15-minute quickstart guide and a tutorial with full source code
  69. 69. Come attend ! Want a deep dive on Kiji? KijiCon is tomorrow! A 1-day workshop of tutorials & hacking Register @ kijicon.eventbrite.com
  70. 70. Conclusions • Each month shows new peak interest in HBase • The ecosystem is growing • Open source technologists are working hand in hand to make HBase more accessible • We’d love your help in the community!
  71. 71. aaron@wibidata.com Build big data applications.

×