SlideShare uma empresa Scribd logo
1 de 18
Status of Hadoop 0.23
Operations at Yahoo!
From Inception to Customer Validation

Charles Wimmer, Staff Site Reliability
        Engineer at LinkedIn
Summary of This Talk
● Includes
  ○ Operational changes required to support 0.23
● Does not include
  ○ Specifics about customer testing
  ○ Deployment into Research or Production clusters
Scope of This Change at Yahoo!
●   42,000+ Hadoop servers
●   20+ clusters
●   Three tiers: Sandbox, Research, Production
●   0.20.205.x
Overview of the Process
● Provide customers a 0.23 Sandbox cluster
● Provide customers enough data to test their
  applications
● Provide developer support to address
  application issues quickly
● Upgrade Research and Production clusters
  as applications are certified to work with 0.23
Test Cluster
●   420 Nodes
●   2 x Westmere 4 core processors
●   24G RAM
●   12 x 2T Disks
●   No Federation
Configuration
● Hierarchical Queues
● Memory Configuration
● Kerberos
Hierarchical Queues
<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>BIZUNIT-A,BIZUNIT-U,BIZUNIT-C,unfunded</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.capacity</name>
  <value>100</value>
</property>
Hierarchical Queues
<property>
 <name>yarn.scheduler.capacity.root.BIZUNIT-A.capacity</name>
  <value>50</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.capacity</name>
  <value>30</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-C.capacity</name>
  <value>15</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
  <value>5</value>
</property>
Hierarchical Queues
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-a.
capacity</name>
  <value>50</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-b.
capacity</name>
  <value>50</value>
</property>
Hierarchical Queues
Memory Configuration
  <property> <name>yarn.
nodemanager.resource.memory-
mb</name>
  <value>21504</value>
  </property>
Kerberos Configuration
  <property>
   <name>yarn.resourcemanager.principal</name>
    <value>mapred/clustername-jt1.domain.name.com@REALM.
NAME.COM</value>
  </property>



  <property>
   <name>yarn.nodemanager.principal</name>
   <value>tt/_HOST@REALM.NAME.COM</value>
  </property>
Init Scripts
●   DataNode/NameNode
●   SecondaryNameNode
●   HistoryServer
●   NodeManager
●   ResourceManager
DataNode/NameNode
start_20(){
 . . .
}
start_next(){
 . . .
}
if [ -x /home/gs/hadoop/current/bin/hdfs ] ; then
   start_next $@
else
   start_20 $@
fi
SecondaryNameNode
function clean_checkpoint_dir {
 CHECKPOINT_DIR=/grid/0/tmp/hadoop-
hdfs/dfs/namesecondary/current
 if [ -d "$CHECKPOINT_DIR" ] ; then
   DELETE_DIR=`mktemp -p /grid/0/tmp -d delete-XXXXXX`
    if [ $? -eq 0 ] ; then
    echo "moving $CHECKPOINT_DIR to ${DELETE_DIR}/ "
    mv $CHECKPOINT_DIR ${DELETE_DIR}/
    cat<<EOF | at now+1min 2>/dev/null
if [ -d $DELETE_DIR ] ; then
    rm -rf --preserve-root $DELETE_DIR
fi
EOF
    fi
 fi
}
HistoryServer
case "$1" in
 start)
   su $HADOOP_USER -s /bin/sh -c 
"$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh
--config $HADOOP_CONF_DIR start
historyserver"
   RET=$?
   ;;
ResourceManager/NodeManager
case "$1" in
 start)
  su $HADOOP_USER -s /bin/sh -c 
"$HADOOP_PREFIX/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR start $PROC"
  RET=$?
  ;;
Questions?
Charles Wimmer

@cwimmer

charles@wimmer.net

cwimmer@linkedin.com

Mais conteúdo relacionado

Mais procurados

My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backupAndrejs Vorobjovs
 
MySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesMySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesLenz Grimmer
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymenthyeongchae lee
 
MySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningMySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningLenz Grimmer
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
MySQL Performance Tuning Variables
MySQL Performance Tuning VariablesMySQL Performance Tuning Variables
MySQL Performance Tuning VariablesFromDual GmbH
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016PostgreSQL-Consulting
 
[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩NHN FORWARD
 
Instroduce Hazelcast
Instroduce HazelcastInstroduce Hazelcast
Instroduce Hazelcastlunfu zhong
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Zaar Hai
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryMongoDB
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained Mydbops
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRWoonsan Ko
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBMydbops
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
 
Postgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackPostgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackKamesh Pemmaraju
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 

Mais procurados (20)

My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backup
 
MySQL Backup and Security Best Practices
MySQL Backup and Security Best PracticesMySQL Backup and Security Best Practices
MySQL Backup and Security Best Practices
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
Broker otw.pptx
Broker otw.pptxBroker otw.pptx
Broker otw.pptx
 
MySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery PlanningMySQL Server Backup, Restoration, and Disaster Recovery Planning
MySQL Server Backup, Restoration, and Disaster Recovery Planning
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
MySQL Performance Tuning Variables
MySQL Performance Tuning VariablesMySQL Performance Tuning Variables
MySQL Performance Tuning Variables
 
Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016Linux internals for Database administrators at Linux Piter 2016
Linux internals for Database administrators at Linux Piter 2016
 
SQL Server vs Postgres
SQL Server vs PostgresSQL Server vs Postgres
SQL Server vs Postgres
 
[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩[2019] 200만 동접 게임을 위한 MySQL 샤딩
[2019] 200만 동접 게임을 위한 MySQL 샤딩
 
Instroduce Hazelcast
Instroduce HazelcastInstroduce Hazelcast
Instroduce Hazelcast
 
Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?Apache ignite - a do-it-all key-value db?
Apache ignite - a do-it-all key-value db?
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
 
MySQL Timeout Variables Explained
MySQL Timeout Variables Explained MySQL Timeout Variables Explained
MySQL Timeout Variables Explained
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
Hosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCRHosting huge amount of binaries in JCR
Hosting huge amount of binaries in JCR
 
Parallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDBParallel Replication in MySQL and MariaDB
Parallel Replication in MySQL and MariaDB
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
Postgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackPostgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStack
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 

Semelhante a Status of Hadoop 0.23 Operations at Yahoo

Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastJorge Lopez-Malla
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Matt Fuller
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQLMydbops
 
Trouble shooting apachecloudstack
Trouble shooting apachecloudstackTrouble shooting apachecloudstack
Trouble shooting apachecloudstackSailaja Sunil
 
Oracle database 12.2 new features
Oracle database 12.2 new featuresOracle database 12.2 new features
Oracle database 12.2 new featuresAlfredo Krieg
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensionsChiou-Nan Chen
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDoKC
 
REST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using RESTREST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using RESTChristian Gohmann
 
SQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud FeaturesSQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud FeaturesGuillermo Caicedo
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...Timofey Turenko
 
Craft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurationsCraft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurationsNate Iler
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesBobby Curtis
 
Moving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupMoving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupGiulio Vian
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheusBob Cotton
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppet
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackRadhika Puthiyetath
 
Monitoring your technology stack with New Relic
Monitoring your technology stack with New RelicMonitoring your technology stack with New Relic
Monitoring your technology stack with New RelicRonald Bradford
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...European Collaboration Summit
 

Semelhante a Status of Hadoop 0.23 Operations at Yahoo (20)

Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Trouble shooting apachecloudstack
Trouble shooting apachecloudstackTrouble shooting apachecloudstack
Trouble shooting apachecloudstack
 
Oracle database 12.2 new features
Oracle database 12.2 new featuresOracle database 12.2 new features
Oracle database 12.2 new features
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
REST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using RESTREST in Piece - Administration of an Oracle Cluster/Database using REST
REST in Piece - Administration of an Oracle Cluster/Database using REST
 
Ci for all
Ci for allCi for all
Ci for all
 
SQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud FeaturesSQL Server 2014 Hybrid Cloud Features
SQL Server 2014 Hybrid Cloud Features
 
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
 
Craft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurationsCraft CMS: Beyond the Small Business; Advanced tools and configurations
Craft CMS: Beyond the Small Business; Advanced tools and configurations
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
 
Moving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway MeetupMoving a Windows environment to the cloud - DevOps Galway Meetup
Moving a Windows environment to the cloud - DevOps Galway Meetup
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NYPuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache Cloudstack
 
Monitoring your technology stack with New Relic
Monitoring your technology stack with New RelicMonitoring your technology stack with New Relic
Monitoring your technology stack with New Relic
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Último (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Status of Hadoop 0.23 Operations at Yahoo

  • 1. Status of Hadoop 0.23 Operations at Yahoo! From Inception to Customer Validation Charles Wimmer, Staff Site Reliability Engineer at LinkedIn
  • 2. Summary of This Talk ● Includes ○ Operational changes required to support 0.23 ● Does not include ○ Specifics about customer testing ○ Deployment into Research or Production clusters
  • 3. Scope of This Change at Yahoo! ● 42,000+ Hadoop servers ● 20+ clusters ● Three tiers: Sandbox, Research, Production ● 0.20.205.x
  • 4. Overview of the Process ● Provide customers a 0.23 Sandbox cluster ● Provide customers enough data to test their applications ● Provide developer support to address application issues quickly ● Upgrade Research and Production clusters as applications are certified to work with 0.23
  • 5. Test Cluster ● 420 Nodes ● 2 x Westmere 4 core processors ● 24G RAM ● 12 x 2T Disks ● No Federation
  • 6. Configuration ● Hierarchical Queues ● Memory Configuration ● Kerberos
  • 7. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.queues</name> <value>BIZUNIT-A,BIZUNIT-U,BIZUNIT-C,unfunded</value> </property> <property> <name>yarn.scheduler.capacity.root.capacity</name> <value>100</value> </property>
  • 8. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.BIZUNIT-A.capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.capacity</name> <value>30</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-C.capacity</name> <value>15</value> </property> <property> <name>yarn.scheduler.capacity.root.unfunded.capacity</name> <value>5</value> </property>
  • 9. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-a. capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-b. capacity</name> <value>50</value> </property>
  • 11. Memory Configuration <property> <name>yarn. nodemanager.resource.memory- mb</name> <value>21504</value> </property>
  • 12. Kerberos Configuration <property> <name>yarn.resourcemanager.principal</name> <value>mapred/clustername-jt1.domain.name.com@REALM. NAME.COM</value> </property> <property> <name>yarn.nodemanager.principal</name> <value>tt/_HOST@REALM.NAME.COM</value> </property>
  • 13. Init Scripts ● DataNode/NameNode ● SecondaryNameNode ● HistoryServer ● NodeManager ● ResourceManager
  • 14. DataNode/NameNode start_20(){ . . . } start_next(){ . . . } if [ -x /home/gs/hadoop/current/bin/hdfs ] ; then start_next $@ else start_20 $@ fi
  • 15. SecondaryNameNode function clean_checkpoint_dir { CHECKPOINT_DIR=/grid/0/tmp/hadoop- hdfs/dfs/namesecondary/current if [ -d "$CHECKPOINT_DIR" ] ; then DELETE_DIR=`mktemp -p /grid/0/tmp -d delete-XXXXXX` if [ $? -eq 0 ] ; then echo "moving $CHECKPOINT_DIR to ${DELETE_DIR}/ " mv $CHECKPOINT_DIR ${DELETE_DIR}/ cat<<EOF | at now+1min 2>/dev/null if [ -d $DELETE_DIR ] ; then rm -rf --preserve-root $DELETE_DIR fi EOF fi fi }
  • 16. HistoryServer case "$1" in start) su $HADOOP_USER -s /bin/sh -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver" RET=$? ;;
  • 17. ResourceManager/NodeManager case "$1" in start) su $HADOOP_USER -s /bin/sh -c "$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start $PROC" RET=$? ;;