11. 招式-PSSH
PSSH provides parallel versions of OpenSSH and
related tools. Included are pssh, pscp, prsync,
pnuke, and pslurp.
https://code.google.com/p/parallel-ssh/
31. Hadoop Security - Without security
• From any machine that can access hadoop
• [root@hackserver opt]# su hdfs
• [hdfs@hackserver opt]$ hadoop fs -rmr /
• Say Goodbye to your data
如有雷同純屬巧合
36. Hadoop Security- More Security but still in incubation
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_Knox_Gateway_Admin_Guide/content/ch01.html
53. Common Datanode Decommission Process
• Add DataNodes hostnames to dfs.exclude file
• On NameNode host, run hdfs dfsadmin –refreshNodes
• Check Web UI to see whether the state has changed
to Decommission In Progress for the DataNodes being
decommissioned. (1day~2day)
• When all the DataNodes report their state
as Decommissioned, You can then shut down the
decommissioned nodes.
• Replace the crash HDD, reboot server and re-config the
HDD from Raid card. (20 mins)
• Mount the HDD
• Start Datanode service
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_system-admin-guide/content/admin_decommission-slave-nodes-2-1.html
54. TrendMicro Datanode Decommission(HDD hot swap)
• Replace the crash HDD
• Stop datanode & umount the broken mount point.
(5 mins)
• Reinit the HDD from raid card setting
• Check /var/log/message , linux will auto rescan
• Mount the broken point
• Start datanode service
55. HBase Canary Tool
• Contributor: TrendMicro Scott Miao
• Purpose: Check every table’s first region on
regionserver
https://issues.apache.org/jira/browse/HBASE-7525
57. HappyBase+Thrift
• What is HappyBase
• Purpose:
- Check regionserver’s every region response time
- Check table’s every region response time
http://happybase.readthedocs.org/en/latest/
61. Reduce EMR cost
• 100 nodes cost running 1 hour == 1 node running 100 hour
• AWS charge by hour
• If you don’t care about job stable, use spot instance to save
cost
• Use Reserve Instance to save cost
• Use EMR Auto Scaling
• Pilot run your Application to estimate how many machines
and size
• Get your monthly cost from aws caculator
• http://calculator.s3.amazonaws.com/index.html
62. Datacenter Cost by Service(Storage)
• Application Size / ((Server HDD space * 0.75)* Server Cost/2
63. Datacenter Cost by Service(Computing)
• ((Used Map slot + Used Reduce Slot)/(total Map slot + total Reduce
slot))* total server cost/2