Enviar pesquisa
Carregar
Hadoop 10th Birthday and Hadoop 3 Alpha
•
1 gostou
•
115 visualizações
Dataya Nolja
Seguir
데이터야놀자 2016 서동진
Leia menos
Leia mais
Dados e análise
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 14
Baixar agora
Baixar para ler offline
Recomendados
[Td 2015]microsoft 개발자들을 위한 달콤한 hadoop, hd insight(최종욱)
[Td 2015]microsoft 개발자들을 위한 달콤한 hadoop, hd insight(최종욱)
Sang Don Kim
NetApp AI Control Plane
NetApp AI Control Plane
SeungYong Baek
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
SeungYong Baek
[OpenStack Day in Korea 2015] Track 1-2 - Red hat Enterprise Linux OpenStack ...
[OpenStack Day in Korea 2015] Track 1-2 - Red hat Enterprise Linux OpenStack ...
OpenStack Korea Community
YARN overview
YARN overview
SeongHyun Jeong
Cloudera & Zookeeper
Cloudera & Zookeeper
Junyoung Park
sparklyr을 활용한 R 분산 처리
sparklyr을 활용한 R 분산 처리
Sang-bae Lim
Cloudera session seoul - Spark bootcamp
Cloudera session seoul - Spark bootcamp
Sang-bae Lim
Recomendados
[Td 2015]microsoft 개발자들을 위한 달콤한 hadoop, hd insight(최종욱)
[Td 2015]microsoft 개발자들을 위한 달콤한 hadoop, hd insight(최종욱)
Sang Don Kim
NetApp AI Control Plane
NetApp AI Control Plane
SeungYong Baek
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
SeungYong Baek
[OpenStack Day in Korea 2015] Track 1-2 - Red hat Enterprise Linux OpenStack ...
[OpenStack Day in Korea 2015] Track 1-2 - Red hat Enterprise Linux OpenStack ...
OpenStack Korea Community
YARN overview
YARN overview
SeongHyun Jeong
Cloudera & Zookeeper
Cloudera & Zookeeper
Junyoung Park
sparklyr을 활용한 R 분산 처리
sparklyr을 활용한 R 분산 처리
Sang-bae Lim
Cloudera session seoul - Spark bootcamp
Cloudera session seoul - Spark bootcamp
Sang-bae Lim
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
치완 박
On premise db & cloud database
On premise db & cloud database
Oracle Korea
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
SeungYong Baek
Mastering devops with oracle 강인호
Mastering devops with oracle 강인호
Inho Kang
Hadoop administration
Hadoop administration
Ryan Guhnguk Ahn
서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료
Teddy Choi
데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)
SeungYong Baek
올챙이 현재와 미래
올챙이 현재와 미래
cho hyun jong
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Nanha Park
Big data application architecture 요약2
Big data application architecture 요약2
Seong-Bok Lee
Open standard open cloud engine (3)
Open standard open cloud engine (3)
uEngine Solutions
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
SAMUEL SJ Cheon
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
Keeyong Han
ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이
Sangmin Lee
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
JooHyung Kim
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
NAVER D2
DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기
SeungYong Baek
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정
HyeonSeok Choi
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
SAMUEL SJ Cheon
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례
rockplace
How to Study Mathematics for ML
How to Study Mathematics for ML
Dataya Nolja
Music Data Start to End
Music Data Start to End
Dataya Nolja
Mais conteúdo relacionado
Semelhante a Hadoop 10th Birthday and Hadoop 3 Alpha
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
치완 박
On premise db & cloud database
On premise db & cloud database
Oracle Korea
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
SeungYong Baek
Mastering devops with oracle 강인호
Mastering devops with oracle 강인호
Inho Kang
Hadoop administration
Hadoop administration
Ryan Guhnguk Ahn
서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료
Teddy Choi
데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)
SeungYong Baek
올챙이 현재와 미래
올챙이 현재와 미래
cho hyun jong
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Nanha Park
Big data application architecture 요약2
Big data application architecture 요약2
Seong-Bok Lee
Open standard open cloud engine (3)
Open standard open cloud engine (3)
uEngine Solutions
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
SAMUEL SJ Cheon
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
Keeyong Han
ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이
Sangmin Lee
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
JooHyung Kim
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
NAVER D2
DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기
SeungYong Baek
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정
HyeonSeok Choi
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
SAMUEL SJ Cheon
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례
rockplace
Semelhante a Hadoop 10th Birthday and Hadoop 3 Alpha
(20)
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
[Open Technet Summit 2014] 쓰기 쉬운 Hadoop 기반 빅데이터 플랫폼 아키텍처 및 활용 방안
On premise db & cloud database
On premise db & cloud database
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
Mastering devops with oracle 강인호
Mastering devops with oracle 강인호
Hadoop administration
Hadoop administration
서울 하둡 사용자 모임 발표자료
서울 하둡 사용자 모임 발표자료
데이터 레이크 알아보기(Learn about Data Lake)
데이터 레이크 알아보기(Learn about Data Lake)
올챙이 현재와 미래
올챙이 현재와 미래
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Deview 2013 :: Backend PaaS, CloudFoundry 뽀개기
Big data application architecture 요약2
Big data application architecture 요약2
Open standard open cloud engine (3)
Open standard open cloud engine (3)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
2017 red hat open stack(rhosp) function overview (samuel,2017-0516)
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
ApacheCon2011 에서는 무슨일이
ApacheCon2011 에서는 무슨일이
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
HashiTalk 2021 - Terraform 도입과 파이프라인 구축 및 운영
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
[225]빅데이터를 위한 분산 딥러닝 플랫폼 만들기
DevOps - CI/CD 알아보기
DevOps - CI/CD 알아보기
Java 초보자를 위한 hadoop 설정
Java 초보자를 위한 hadoop 설정
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
(Red hat]private cloud-osp-introduction(samuel)2017-0530(printed)
공개소프트웨어 기반 주요 클라우드 전환 사례
공개소프트웨어 기반 주요 클라우드 전환 사례
Mais de Dataya Nolja
How to Study Mathematics for ML
How to Study Mathematics for ML
Dataya Nolja
Music Data Start to End
Music Data Start to End
Dataya Nolja
Find a Leak Time in the Schedule
Find a Leak Time in the Schedule
Dataya Nolja
A Financial Company Story of Bringing Open Source and ML in
A Financial Company Story of Bringing Open Source and ML in
Dataya Nolja
Practice, Practice, Practice and do the Dirty Work
Practice, Practice, Practice and do the Dirty Work
Dataya Nolja
Predicting People Who May Get off at the Next Station
Predicting People Who May Get off at the Next Station
Dataya Nolja
Endless Trial-and-Errors for Data Collecting
Endless Trial-and-Errors for Data Collecting
Dataya Nolja
Log Design Case Study
Log Design Case Study
Dataya Nolja
Let's Play with Data Safely
Let's Play with Data Safely
Dataya Nolja
Things Data Scientists Should Keep in Mind
Things Data Scientists Should Keep in Mind
Dataya Nolja
Things Happend between JDBC and MySQL
Things Happend between JDBC and MySQL
Dataya Nolja
Human-Machine Interaction and AI
Human-Machine Interaction and AI
Dataya Nolja
Julia 0.5 and TensorFlow
Julia 0.5 and TensorFlow
Dataya Nolja
Zeppelin and Open Source Ecosystem and Silicon Valley
Zeppelin and Open Source Ecosystem and Silicon Valley
Dataya Nolja
Kakao Bank Powered by Open Sources
Kakao Bank Powered by Open Sources
Dataya Nolja
Open Source is My Job
Open Source is My Job
Dataya Nolja
Creating Value through Data Analysis
Creating Value through Data Analysis
Dataya Nolja
How to Make Money from Data - Global Cases
How to Make Money from Data - Global Cases
Dataya Nolja
Structured Streaming with Apache Spark
Structured Streaming with Apache Spark
Dataya Nolja
How to Create Value from Data, and Its Difficulty
How to Create Value from Data, and Its Difficulty
Dataya Nolja
Mais de Dataya Nolja
(20)
How to Study Mathematics for ML
How to Study Mathematics for ML
Music Data Start to End
Music Data Start to End
Find a Leak Time in the Schedule
Find a Leak Time in the Schedule
A Financial Company Story of Bringing Open Source and ML in
A Financial Company Story of Bringing Open Source and ML in
Practice, Practice, Practice and do the Dirty Work
Practice, Practice, Practice and do the Dirty Work
Predicting People Who May Get off at the Next Station
Predicting People Who May Get off at the Next Station
Endless Trial-and-Errors for Data Collecting
Endless Trial-and-Errors for Data Collecting
Log Design Case Study
Log Design Case Study
Let's Play with Data Safely
Let's Play with Data Safely
Things Data Scientists Should Keep in Mind
Things Data Scientists Should Keep in Mind
Things Happend between JDBC and MySQL
Things Happend between JDBC and MySQL
Human-Machine Interaction and AI
Human-Machine Interaction and AI
Julia 0.5 and TensorFlow
Julia 0.5 and TensorFlow
Zeppelin and Open Source Ecosystem and Silicon Valley
Zeppelin and Open Source Ecosystem and Silicon Valley
Kakao Bank Powered by Open Sources
Kakao Bank Powered by Open Sources
Open Source is My Job
Open Source is My Job
Creating Value through Data Analysis
Creating Value through Data Analysis
How to Make Money from Data - Global Cases
How to Make Money from Data - Global Cases
Structured Streaming with Apache Spark
Structured Streaming with Apache Spark
How to Create Value from Data, and Its Difficulty
How to Create Value from Data, and Its Difficulty
Hadoop 10th Birthday and Hadoop 3 Alpha
1.
1© Cloudera, Inc. All rights reserved. Hadoop 10th Birthday and Hadoop 3 Alpha Dongjin Seo – Cloudera Korea, SE
2.
2© Cloudera, Inc. All rights reserved. Agenda Ⅰ. Hadoop 10th
Birthday Ⅱ. Hadoop 3 Alpha
3.
3© Cloudera, Inc. All rights reserved. Apache Hadoop at 10 Apache Hadoop
4.
4© Cloudera, Inc. All rights reserved. Apache Hadoop’s Timeline [2002] • Doug Cutting and Mike Cafarella create Nutch, an open source web crawler [2003] • Google publishes its “Google File System” paper [2004] •
Cutting & Cafarella implement Nutch features that will become HDFS • Google publishes its “MapReduce” paper The Invention Years 2002 ~ 2004 The Incubation Years 2005 ~ 2007 The Coming-Out Years 2008 ~ 2009 The Rapid Adaption Years 2010 ~ 2015 [2005] • Cafarella spearheads an implementation of MapReduce in Nutch [2006] • Cutting joins Yahoo!; starts Hadoop subproject by carving code from Nutch • First Apache release of Hadoop [2007] • First Hadoop User Group meeting • Community contributions begin to rise steeply [2008] • Hadoop becomes a Top Level ASF project • Yahoo! launches world’s largest Hadoop application • Hive, Hadoop’s first SQL framework, becomes a Hadoop sub-project • Cloudera, first company to commercialize Hadoop, is founded • Initial Apache release of Pig [2009] • Cutting joins Cloudera as its chief architect [2010-11] • The extended Hadoop community busily builds out a plethora of new components (Crunch, Sqoop, Flume, Oozie, etc) [2012-15] • HDFS NameNode HA, YARN, significant new features for enterprise adoption • Impala joins the ecosystem • Spark becomes a Top Level ASF project • Kudu, the first native storage option for Hadoop since HBase, joins the ASF Incubator
5.
5© Cloudera, Inc. All rights reserved. 2006 2008 2009
2010 2011 2012 2013 Core Hadoop (HDFS, MapReduce) HBase ZooKeeper Solr Pig Core Hadoop Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig Core Hadoop Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop The stack is continually evolving and growing! 2007 Solr Pig Core Hadoop Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop 2014 2015- Kudu RecordService Ibis Falcon Knox Flink Parquet Sentry Spark Tez Impala Kafka Drill Flume Bigtop Oozie HCatalog Hue Sqoop Avro Hive Mahout HBase ZooKeeper Solr Pig YARN Core Hadoop Evolution of the Hadoop Platform
6.
6© Cloudera, Inc. All rights reserved. Why Did Hadoop Succeed? Open source community and license A large and diverse community of developers has historically made, and continues to make, the Hadoop ecosystem among the most active and engaged in history, while the Apache License lowers the barrier to entry for users. Extensibility/adaptability With the possible exception of Linux, no other complex platform has evolved on so many levels, and so quickly, to meet user requirements over time. A strong focus on systems The roots of Hadoop are in making distributed computing infrastructure more accessible by application developers. That continuing focus continues to bear fruit in areas like resource management and security.
7.
7© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Release!! HDFS Erasure Coding (HDFS-EC) Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes YARN Timeline Service v.2 Shell Script Rewrite MapReduce task-level native optimization Support for Multiple Stanby NameNodes Java 8 Minimum Runtime Version New Default Ports for Several Services Intra-DataNode Balancer Reworked daemon and task heap management
8.
8© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Major New Changes HDFS Erasure Coding (HDFS-EC) Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes Erasure Coding ? -
Fault-tolerance를 위한 데이터 보존 기법 중 하나로 흔히 RAID-5에서 사용되는 기법입니다. 데이터 저장 시 EC Codec으로 데이터를 균일한 사이즈의 Data cell/Parity cell로 인코딩하며 이와 반대로 데이터 로드 시 Data cell과 Parity cell로 구성된 EC Group에서 유실된 cell에 대해서 해당 그룹에 남아있는 cell들로 부터 재구성하여 원본 데이터 복구하는 디코딩 작업을 실행 HDFS-EC ü EC의 Reed-Solomon algorithm을 수행하는 Intel ISA-L (Intelligence Stroage Acceleration Library) 사용하여 스토리지 성능, 처리량, 보안, 안정성 개선 ü EC는 Exclusive-OR 공식 기반이지만 Multiple failure 을 보장하지 못하는 불안정한 부분이 존재하여, Reed-Solomon algorithm을 적용하여 Multiple failures를 보장 ü Cloudera + Intel + Hadoop Community 합작하여 빌드 ü 각 개별 디렉토리에 ‘hdfs erasurecode -setPolicy’ 커맨드로 policy 적용 Erasure Coding의 필요성 ü 복제개수 3개는 장애대응에 용이하나 기회비용이 비쌈 (200% overhead in storage space and other resources (e.g. network bandwidth when writing the data)) ü 일반적인 Operations 에서는 추가적으로 복제된 블럭에 대해 접근을 잘 하지 않음 Erasure Coding의 효과 ü 더 적은 용량으로도 Fault-tolerance 보장 ü 3x replication 에 비해 storage cost ~50% 감소 효과
9.
9© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Major New Changes YARN Timeline Service v.2 Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes YARN Timeline Service v.2 -
Event, Metrics와 같은 컨테이너 관련 정보 및 Map, Reduce Task 관련 Application 정보를 WebUI를 통해 확인 할 수 있도록 제공되는 서비스 1) Improving scalability and reliability of Timeline Service ü 확장성이 높고, 분산 저장 가능한 아키텍처(e.g. HBase)를 채택하여 신뢰성 향상 2) enhancing usability by introducing flows and aggregation ü YARN Application 의 단계별 논리적 Flow 제공 ü Flow 레벨에서의 metrics에 대한 Aggregation 지원 Shell Script Rewrite ü 오랫동안 가지고 있던 버그 수정 및 새로운 기능 추가를 위해 Hadoop 쉘스크립트가 수정 됨 ü 기존 쉘스크립트 버전에 대해 완벽하게 호환되지 않아 Hadoop 환경변수를 사용하거나 shell command를 이용하는 사용자에게 영향주는 부분에 대한 검토가 필요 more info: https://issues.apache.org/jira/browse/HADOOP-9902 https://issues.apache.org/jira/secure/attachment/12599817/more-info.txt
10.
10© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Major New Changes MapReduce task-level native optimization Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes NativeTask ? - 데이터
프로세싱에 초첨을 맞춘 native computing unit으로 Hadoop MapReduce를 위한 고성능 C++ API & runtime NativeTask의 필요성 ü I/O bottleneck. Most Hadoop workloads are data intensive, so if no compression is used for input, mid-output, and output, I/O(disk, network) could be a bottleneck. ü Inefficient implementation.(Map side sort, Serialization/Deserialization, Shuffle, Data locality, Scheduling & starting overhead) ü Inflexible programming paradigm. - limits its performance NativeTask의 효과 ü NativeTask 를 map output collector에 적용하여 shuffle-intensive job의 성능을 30% 이상 향상 more info: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Java side (to bypass normal java data flow)) Native side (Actual computation)) JNI delegate the data processing
11.
11© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Major New Changes Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes Support for Multiple Stanby NameNodes ü 기존
single Active Namenode, single Stanby Namenode로만 구성이 가능했던 아키텍처에서 다중 Stanby Namenode로 구성이 가능하도록 변경 ü 더 높은 수준의 High-Availability를 제공 ü Namenode 는 5개 초과 X, 3개를 추천함 à communication overheads 고려 more info: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop- hdfs/HDFSHighAvailabilityWithQJM.html Java 8 Minimum Runtime Version ü Java 7에 대한 오라클의 공식적인 지원이 종료(April 2015)되어 Java 8으로 변경할 수 밖에 없는 상황이 됨 ü 이에 따라 Hadoop 3의 최소 자바 버전은 Java 8 로 변경 됨
12.
12© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Major New Changes New Default Ports for Several Services Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes ü Hadoop 시작 시
bind error를 피하기 위해 Namenode, Secondary NN, Datanode, KMS의 port를 변경 ü 대규모 클러스터에서의 rolling restart의 신뢰성 향상에 기여할 것으로 예상 Namenode ports: 50470 --> 9871, 50070 --> 9870, 8020 --> 9820 Secondary NN ports: 50091 --> 9869, 50090 --> 9868 Datanode ports: 50020 --> 9867, 50010 --> 9866, 50475 --> 9865, 50075 --> 9864 KMS port: 16000 --> 9600 more info: https://issues.apache.org/jira/browse/HDFS-9427 https://issues.apache.org/jira/browse/HADOOP-12811 Intra-DataNode Balancer ü 디스크 추가/변경으로 발생되는 Datanode 내의 디스크들에 대한 데이터 적재 불균형을 HDFS Command(diskbalancer)로 해결 more info: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop- hdfs/HDFSCommands.html
13.
13© Cloudera, Inc. All rights reserved. Apache Hadoop 3 Alpha Major New Changes Release Date: 03 September, 2016 Changes: about 3,000 3.0.0-alpha1 Major new changes Reworked daemon and task heap management ü 호스트의 메모리
사이즈를 기반으로 자동으로 튜닝해주는 기능 추가 ü HADOOP_HEAPSIZE는 deprecated 됨 ü 기존에 비해 간단하게 map/reduce heap 사이즈 설정이 가능하게 되어 요구되는 heap사이즈를 task설정이나 Java 옵션으로 명시해야되는 수고가 없어졌음 more info: https://issues.apache.org/jira/browse/HADOOP-10950 https://issues.apache.org/jira/browse/MAPREDUCE-5785 Conclusion v Hadoop의 핵심인 Component(HDFS, MARECUDE)의 큰 변화는 또 다른 큰 발전을 위한 발돋음 ü HDFS 저장공간 활용도 증가 à ROI 증가, TOC 감소 à 초기 도입에 대한 장벽이 낮아짐 ü Multiple Stanby Namenode, heap 튜닝 기능 à 운영환경에 대한 용이성 증가 à 운영에 투자되는 비용을 클러스터 확장, 개선과 같은 곳에 투자할 수 있는 기회 증가 v 변화한 Hadoop과 연관 Component들의 변화가 기대 됨 ü HIVE 의 Query 속도 개선? ü HBase의 Throuthput 증가? ü 데이터 압축 효율성 증가? ü etc v 이러한 큰 변화 뒤에는 Community와 구성원들의 관심과 사랑이 있기에 가능
14.
14© Cloudera, Inc. All rights reserved. 감사합니다 djseo@cloudera.com
Baixar agora