Crossing the Chasm

Crossing the ChasmHadoop for the Enterprise Sanjay Radia – Hortonworks Founder & Architect Formerly Hadoop Architect @ Yahoo! 4 Years @ Yahoo! @srr (@hortonworks) © Hortonworks Inc. 2011 June 29, 2011

Crossing the Chasm Geoffrey A. Moore Apache Hadoop grew rapidly charting new territories in features, abstractions, APIs, scale, fault tolerance, multi-tenancy, operations … Small number of early customers who needed a new platform Provide Hadoop as a service to make adoption easy Today: Dramatic growth in adoption and customer base Growth of Hadoop stack and applications New requirements and expectations Mission Critical Late Majority Early Majority Early Adopters 3

Crossing the Chasm: Overview How the Chasm is being crossed Security SLAs & Predictability Scalability Availability & Data Integrity Backward Compatibility Quality & Testing Fundamental architectural improvements Federation & MR.next Adapt to changing, sometime unforeseen, needs Fuel innovation and rapid development The Community Effect 4

Security Early Gains No authorization or authentication requirements Added permissions and passed client-side userid to server (0.16) Addresses accidental deletes by another user Service Authorization (0.18, 0.20) Issues: Stronger Authorization required Shared clusters – multiple tenants Critical data New categories of users (financial) SOX compliance Our Response Authentication using Kerberos (0.20.203) 10 Person-year effort by Yahoo! 5

SLAs and Predictability Issue: Customers uncomfortable with shared clusters Customer traditionally plan for peaks with dedicated HW Dedicated clusters had poor utilization Response: Capacity scheduler (0.20) Guaranteed capacities in a multi-tenant shared cluster Almost like dedicated hardware Each organization given queue(s) with a guaranteed capacity controls who is allowed to submit jobs to their queues sets the priorities of jobs within their queue creates sub-queues (0.21) for finer grain control within their capacity Unused capacity given to tasks in other queues Better than private cluster –access to unused capacity when in crunch Resource limits for tasks – deals with misbehaved apps Response: FairShare Scheduler (0.20) Focus is fair share of resources, but does have pools 6

Scalability Early Gains Simple design allowed rapid improvements Single master, namespace in RAM, simpler locking Cluster size improvements: 1K  2K  4K Vertical scaling: Tuned GC + Efficient memory usage Archive file system – reduce files and blocks (0.20) Current Issues Growth of files and storage limited by single NN (0.20) Only an issue for very very large clusters JobTracker does not scale to beyond 30K tasks – needs redesign Our Response RW locks in NN (0.22) MR.next– complete rewrite of MR servers (JT, TT) - 100K tasks (0.23) Federation: horizontal scaling of namespace – billion files (0.23) NN that keeps only part of Namespace in memory –trillion files (0.23.x) 7

HDFS Availability & Data Integrity:Early Gains Simple design, Java, storage fault tolerance Java – saved from pointer errors that lead to data corruption Simplicity - subset of Posix – random writers not supported Storage: Rely in OS’s file system rather than use raw disk Storage Fault Tolerance: multiple replicas, active monitoring Single Namenode Master Persistent state: multiple copies + checkpoints Restart on failure How well did it work? Lost 650 blocks out of 329 M on 10 clusters with 20K nodes in 2009 82% abandoned open file (append bug, fixed in 0.21) 15% files created with single replica (data reliability not needed) 3% due to roughly 7 bugs that were then fixed (0.21) Over the last 18 months 22 failures on 25 clusters Only 8 would have benefitted from HA failover!! (0.23 failures per cluster year) NN is very robust and can take a lot of abuse NN is resilient against overload caused by misbehaving apps 8

HDFS Availability & Data Integrity:Response Data Integrity Append/flush/sync redesign (0.21) Pipeline recruits new replicas rather than just remove them on failures (0.23) Improving Availability of NN Faster HDFS restarts NN bounce in 20 minutes (0.23) Federation allows smaller NNs (0.23) Federation will significantly improve NN isolation hence availability (0.23) Why did we wait this long for HA NN? The failure rates did not demand making this a high priority Failover requires corner cases to be correctly addressed Correct fencing of shared state during failover is critical Can lead to corruption of data and reduceavailability!! Many factors impact availability, not just failover 9

HDFS Availability & Data Integrity:Response: HA NN Active work has started on HA NN (Failover) HA NN – Detailed design (HDFS-1623) Community effort HDFS-1971, 1972, 1973, 1974,1975, 2005, 2064, 1073 HA: Prototype work Backup NN (0.21) Avatar NN (Facebook) HA NN prototype using Linux HA (Yahoo!) HA NN prototype with Backup NN and block report replicator (EBay) HA the highest priority for 23.x 10

MapReduce: Fault Tolerance and Availability Early Gains: Fault-tolerance of tasks and compute nodes Current Issues:Loss of job queue if Job tracker is restarted Our Response MR.next designed with fault tolerance and availability HA Resource Manager (0.23.x) Loss of Resource Manager – degraded mode - recover via restart or failover Apps continue with their current resources App Manager can reschedule with current resources New apps cannot submitted or launched, New resources cannot be allocated Loss of an App Manager - recovers App is restarted and state is recovered Loss of tasks and nodes - recovers Recovered as in old MapReduce 11

Backwards Compatibility Early Gains Early success stemmed from a philosophy of ship early and often, resulting in changing APIs. Data and metadata compatibility always maintained The early customers paid the price current customers reap benefits of more mature interfaces Issues Increased adoption leads to increased expectations of backwards compatibility 12

Backward Compatibility:Response Interface classification - audience and stability tags (0.21) Patterned on enterprise-quality software process Evolve interfaces but maintain backward compatible Added newer forward looking interfaces - old interface maintained Test for compatibility Run old jars of automation tests, Real Yahoo applications Applications adopting higher abstractions (Pig, Hive) Insulates from lower primitive interfaces Wire compatibility (Hadoop-7347) Maintain compatibility with current protocol (java serialization) Adapters for addressing future discontinuity e.g. serializationor protocol change Moved to ProtocolBuf for data transfer protocol 13

Testing & Quality Nightly Testing Against 1200 automated tests on 30 nodes Against live data and live applications QE Certification for Release Large variety and scale tests on 500 nodes Performance benchmarking QE HIT integration testing of whole stack Release Testing Sandbox cluster – 3 clusters each with 400 -1K nodes Major releases: 2 months testing on actual data - all production projects must sign off Research clusters – 6 Clusters (non-revenue production jobs) (4K Nodes) Major releases – minimum 2 months before moving to production .25Million to .5Million jobs per week if it clears research then mostly fine in fine in production Release Production clusters - 11 clusters (4.5K nodes) Revenue generating, stricter SLAs 14

Fundamental Architecture Changes that cut across several issues Coupled One-to-One Job Manager Resource Scheduler Storage Resources Compute Resources MapReduce HDFS Namesystem HDFS storage: mostly a separate layer – but one customer: one NN Federation generalizes the layer MapReduce – compute resource scheduling tightly coupled to MapReduce job management MapReduce.next separates the layers 15

HBase Fundamental Architecture Changes that cut across several issues Resource Scheduler Layered One-to-Many HDFS Namesystem HDFS Namesystem Alternate NN Implementation MR App with Different version MR lib Storage Resources HDFS Namesystem HDFS Namesystem MR tmp MPI App HDFS Namesystem MR App Compute Resources Scalability, Isolation, Availability Generic lower layer: first class support new applications on top MR tmp, HBase, MPI, Layering facilitates faster development of new work NN that caches Namespace – a few months of work New implementations of MR App manager Compatibility: Support multiple versions of MR Tenants upgrade at their own pace – crucial for shared clusters 16

The Community Effect Some projects are done entirely by teams at Yahoo!, FB or Cloudera But several projects are joint work Yahoo & FB on NN scalability and concurrency esp in face of misbehaved apps Edits log v2 and refactoring edits log (Cloudera and Yahoo!/Hortonworks) HDFS-1073, 2003, 1557, 1926 NN HA – Yahoo!/Hortonworks, Cloudera, FB, EBay HDFS-1623, 1971, 1972, 1973, 1974, ,1975, 2005 Features to support HBase: FB, Cloudera, Yahoo, and the HBase community Expect to see rapid improvements in the very near future Further Scalability - NN that cache part of namespace Improved IO Performance - DN performance improvements Wire Compatibility - Wire protocols, operational improvements, New App Managers for MR.next Continued improvement of management and operability 17

Hadoop is Successfully Crossing the Chasm Hadoop used in enterprises for revenue generating applications Apache Hadoop is improving at a rapid rate Addressing many issues including HA Fundamental design improvements to fuel innovation The might of a large growing developer community Battle tested on large clusters and variety of applications At Yahoo!, Facebook and the many other Hadoop customers. Data integrity has been a focus from the early days A level of testing that even the large commercial vendors cannot match! Can you trust your data to anything less? 18

Q & A Hortonworks @ Hadoop Summit 1:45pm: Next Generation Apache Hadoop MapReduce Community track by Arun Murthy 2:15pm: Introducing HCatalog (Hadoop Table Manager) Community track by Alan Gates 4:00pm: Large Scale Math with Hadoop MapReduce Applications and Research Track by Tsz-Wo Sze 4:30pm: HDFS Federation and Other Features Community track by Suresh Srinivas and Sanjay Radia 19

About Hortonworks Mission: Revolutionize and commoditize the storage and processing of big data via open source Vision: Half of the world’s data will be stored in Apache Hadoop within five years Strategy: Drive advancements that make Apache Hadoop projects more consumable for the community, enterprises and ecosystem Make Apache Hadoop easy to install, manage and use Improve Apache Hadoop performance and availability Make Apache Hadoop easy to integrate and extend © Hortonworks Inc. 2011 20

Crossing the Chasm

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Crossing the Chasm

Semelhante a Crossing the Chasm (20)

Mais de Hortonworks

Mais de Hortonworks (20)

Último

Último (20)

Crossing the Chasm

Notas do Editor