SlideShare uma empresa Scribd logo
1 de 19
fsimage and edits in CDH3 and CDH4
Tatsuo Kawasaki
tatsuo@cloudera.com
objective
HDFS metadata (fsimage and edits) management is different
between CDH3 and CDH4.
This presentation introduces a these difference.

Please let me know if you find any issue.
HDFS metada in CDH3
[root@localhost ~]# ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/
total 1100
-rw-r--r-- 1 hdfs hdfs 101 Jan 30 00:21 VERSION
-rw-r--r-- 1 hdfs hdfs    8 Jan 30 00:21 fstime
-rw-r--r-- 1 hdfs hdfs 57248 Jan 30 00:21 fsimage
-rw-r--r-- 1 hdfs hdfs 1048580 Jan 31 16:16 edits



after checkpoint
[root@localhost ~]# ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/
total 84
-rw-r--r-- 1 hdfs hdfs 101 Feb 5 14:37 VERSION
-rw-r--r-- 1 hdfs hdfs 8 Feb 5 14:37 fstime
-rw-r--r-- 1 hdfs hdfs 66760 Feb 5 14:37 fsimage
-rw-r--r-- 1 hdfs hdfs 4 Feb 5 14:37 edits
timeline (CDH3)
 NameNode                              CheckPoint                CheckPoint
              put file                 start                     Done
     t0          t1                           t2                      t3              t4


                                                                                           rename
     fsimage                                                fsimage.ckpt fsimage
                  Update edits
     edits        Update metadata in           edits.new                               edits
                  memory                                                               rename
                                                   create
     fstime                                                                           fstime
                                        get                                transfer   update time

Secondary NameNode


                                                   fsimage fsimage.ckpt
                                                              merge
                                                   edits
Secondary NN Web UI (CDH3)
HDFS metadata in CDH4
After formatting HDFS
-bash-4.1$ ls -l /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/
total 1040
-rw-r--r-- 1 hdfs hdfs 1048576 Feb 5 01:35 edits_inprogress_0000000000000000001
-rw-rw-r-- 1 hdfs hdfs 119 Feb 5 01:33 fsimage_0000000000000000000
-rw-rw-r-- 1 hdfs hdfs 62 Feb 5 01:33 fsimage_0000000000000000000.md5
-rw-r--r-- 1 hdfs hdfs    2 Feb 5 01:35 seen_txid
-rw-rw-r-- 1 hdfs hdfs 202 Feb 5 01:33 VERSION

-bash-4.1$ hexdump -C /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/seen_txid
00000000 31 0a                         |1.|
00000002




                                                                                  Transaction ID is
                                                                                     included in
                                                                                      seen_txid
try to add new file
[training@localhost ~]$ hadoop fs -put /etc/hosts hosts
[training@localhost ~]$
oiv - fsimage viewer
-bash-4.1$ hdfs oiv -i /var/lib/hadoop-
hdfs/cache/hdfs/dfs/name/current/fsimage_000000000000000
0000 -o aaa
-bash-4.1$ cat aaa
drwxr-xr-x - hdfs supergroup          0 1969-12-31 19:00 /




              ‘hosts’ file has not written in
             fsimage before checkpointing
oev – edits viewer
          -bash-4.1$ hdfs oev -i /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_0000000000000000020-0000000000000000027 -o bbb
          cat bbb
          <?xml version="1.0" encoding="UTF-8"?>
          <EDITS>
           <EDITS_VERSION>-40</EDITS_VERSION>
           <RECORD>
            <OPCODE>OP_START_LOG_SEGMENT</OPCODE>
            <DATA>
                                                                                                                    start
              <TXID>20</TXID>
            </DATA>
                                                                                                                Transaction ID
           </RECORD>
           <RECORD>
            <OPCODE>OP_SET_GENSTAMP</OPCODE>
            <DATA>
              <TXID>21</TXID>
              <GENSTAMP>1003</GENSTAMP>
            </DATA>
           </RECORD>                                                                                          Transaction ID
           <RECORD>
            <OPCODE>OP_ADD</OPCODE>
Put         <DATA>
              <TXID>22</TXID>

Transaction   <LENGTH>0</LENGTH>
              <PATH>/user/training/hosts._COPYING_</PATH>
              <REPLICATION>1</REPLICATION>
              <MTIME>1360046220628</MTIME>
              <ATIME>1360046220628</ATIME>
              <BLOCKSIZE>67108864</BLOCKSIZE>
              <CLIENT_NAME>DFSClient_NONMAPREDUCE_1911533003_1</CLIENT_NAME>
              <CLIENT_MACHINE>127.0.0.1</CLIENT_MACHINE>
              <PERMISSION_STATUS>
               <USERNAME>training</USERNAME>
               <GROUPNAME>supergroup</GROUPNAME>
               <MODE>420</MODE>
              </PERMISSION_STATUS>
            </DATA>
           </RECORD>
oev – edits viewer (cont)
      ファイル名edits_0000000000000000020-0000000000000000027

      <RECORD>
        <OPCODE>OP_SET_GENSTAMP</OPCODE>
        <DATA>
         <TXID>23</TXID>
         <GENSTAMP>1004</GENSTAMP>
        </DATA>
       </RECORD>
      <RECORD>
        <OPCODE>OP_UPDATE_BLOCKS</OPCODE>
        <DATA>
         <TXID>24</TXID>
         <PATH>/user/training/hosts._COPYING_</PATH>
         <BLOCK>
          <BLOCK_ID>-3498739165311848505</BLOCK_ID>
          <NUM_BYTES>0</NUM_BYTES>
          <GENSTAMP>1004</GENSTAMP>
         </BLOCK>
        </DATA>
       </RECORD>
       <RECORD>
        <OPCODE>OP_CLOSE</OPCODE>
        <DATA>
         <TXID>25</TXID>
         <LENGTH>0</LENGTH>
         <PATH>/user/training/hosts._COPYING_</PATH>
         <REPLICATION>1</REPLICATION>
         <MTIME>1360046220735</MTIME>
         <ATIME>1360046220628</ATIME>
         <BLOCKSIZE>67108864</BLOCKSIZE>
         <CLIENT_NAME></CLIENT_NAME>
         <CLIENT_MACHINE></CLIENT_MACHINE>
         <BLOCK>
          <BLOCK_ID>-3498739165311848505</BLOCK_ID>
          <NUM_BYTES>83</NUM_BYTES>
          <GENSTAMP>1004</GENSTAMP>
         </BLOCK>
oev – edits viewer (cont)
      ファイル名:edits_0000000000000000020-0000000000000000027

         <PERMISSION_STATUS>
          <USERNAME>training</USERNAME>
          <GROUPNAME>supergroup</GROUPNAME>
          <MODE>420</MODE>
         </PERMISSION_STATUS>
        </DATA>
       </RECORD>
       <RECORD>
        <OPCODE>OP_RENAME_OLD</OPCODE>
        <DATA>
         <TXID>26</TXID>
         <LENGTH>0</LENGTH>
         <SRC>/user/training/hosts._COPYING_</SRC>
         <DST>/user/training/hosts</DST>                         End
         <TIMESTAMP>1360046220738</TIMESTAMP>
        </DATA>                                             Transaction ID
       </RECORD>
       <RECORD>
        <OPCODE>OP_END_LOG_SEGMENT</OPCODE>
        <DATA>
         <TXID>27</TXID>
        </DATA>
       </RECORD>
      </EDITS>
After checkpointing
-bash-4.1$ ls -l /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/
total 1376
-rw-r--r-- 1 hdfs hdfs 1317 Feb 5 01:36 edits_0000000000000000001-0000000000000000019
-rw-r--r-- 1 hdfs hdfs 471 Feb 5 01:37 edits_0000000000000000020-0000000000000000027
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:38 edits_0000000000000000028-0000000000000000029
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:39 edits_0000000000000000030-0000000000000000031
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:40 edits_0000000000000000032-0000000000000000033
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:41 edits_0000000000000000034-0000000000000000035
(略)
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:53 edits_0000000000000000178-0000000000000000179
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:54 edits_0000000000000000180-0000000000000000181
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:55 edits_0000000000000000182-0000000000000000183
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:56 edits_0000000000000000184-0000000000000000185
-rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:58 edits_0000000000000000186-0000000000000000187
-rw-r--r-- 1 hdfs hdfs 1048576 Feb 5 02:58 edits_inprogress_0000000000000000188
-rw-rw-r-- 1 hdfs hdfs 119 Feb 5 01:33 fsimage_0000000000000000000
-rw-rw-r-- 1 hdfs hdfs 62 Feb 5 01:33 fsimage_0000000000000000000.md5
-rw-r--r-- 1 hdfs hdfs 1211 Feb 5 02:58 fsimage_0000000000000000187
-rw-r--r-- 1 hdfs hdfs 62 Feb 5 02:58 fsimage_0000000000000000187.md5
-rw-r--r-- 1 hdfs hdfs     4 Feb 5 02:58 seen_txid
                                                                                        Transaction ID is
-bash-4.1$ hexdump -C /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/seen_txid           included in
00000000 31 38 38 0a                      |188.|
00000004
                                                                                            seen_txid
oiv - fsimage viewer
-bash-4.1$ hdfs oiv -i /var/lib/hadoop-
hdfs/cache/hdfs/dfs/name/current/fsimage_0000000000000000187 -o aaa
-bash-4.1$ cat aaa
drwxr-xr-x - hdfs supergroup          0 2013-02-05 01:35 /
drwxr-xr-x - hdfs supergroup          0 2013-02-05 01:35 /user
drwxr-xr-x - mapred supergroup           0 2013-02-05 01:35 /var
drwxrwxrwt - hdfs supergroup            0 2013-02-05 01:37 /user/training
-rw-r--r-- 1 training supergroup     83 2013-02-05 01:37 /user/training/hosts
drwxr-xr-x - mapred supergroup           0 2013-02-05 01:35 /var/lib
drwxr-xr-x - mapred supergroup           0 2013-02-05 01:35 /var/lib/hadoop-hdfs
drwxr-xr-x - mapred supergroup           0 2013-02-05 01:35 /var/lib/hadoop-hdfs/cache
drwxr-xr-x - mapred supergroup           0 2013-02-05 01:35 /var/lib/hadoop-
hdfs/cache/mapred
drwxr-xr-x - mapred supergroup           0 2013-02-05 01:35 /var/lib/hadoop-
hdfs/cache/mapred/mapred
drwx------ - mapred supergroup          0 2013-02-05 01:35 /var/lib/hadoop-
                     ‘hosts’file has
hdfs/cache/mapred/mapred/system been
-rw------- 1 mapred supergroup
                    written in HDFS 2013-02-05 01:35 /var/lib/hadoop-
                                       4 after
hdfs/cache/mapred/mapred/system/jobtracker.info
                        checkpoint
oev – edits viewer
    -bash-4.1$ hdfs oev -i /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_inprogress_0000000000000000188 -o
    bbb
    -bash-4.1$ cat bbb
    <?xml version="1.0" encoding="UTF-8"?>
    <EDITS>
     <EDITS_VERSION>-40</EDITS_VERSION>                                      Transaction ID
     <RECORD>
      <OPCODE>OP_START_LOG_SEGMENT</OPCODE>
      <DATA>
       <TXID>188</TXID>
      </DATA>
     </RECORD>
    </EDITS>
timeline (CDH4)-1
 NameNode

     t0      (10 times transaction)   t1          (22 times transaction)   t2



 fsimage_0                                   create new edits_inprogress
                                             using new Transaction ID
 edits_inprogress_1                         edits_inprogress_11                  edits_inprogress_33
                                           edits_1-10                           edits_1-10
                                               finalize and rename
                                                                                 edits_11-32
                                               (transaction 1-10)
                                                                                   finalize and rename
Secondary NameNode                                                                 (transaction 11-22)


                                                trigger a log roll:
                                                1) NN Startup
                                                2) saveNameSpace
                                                3) SecondaryNN CheckPoint
                                                4) storage directroy becomes
                                                    available
                                                5) admin operation
timeline (CDH4)-2
 NameNode              CheckPoint        CheckPoint
                       start             Done
                         t4               t5                   t6



 fsimage_0                                                    fsimage_0
                                      fsimage_ckpt_33
 fsimage_33                                                rename
 edits_inprogress_33                edits_inprogress_34
 edits_1-10                                                   edits_1-10
 edits_11-32                                                 edits_11-32
                                                transfer
                                                               edits_33-
Secondary NameNode
 33

                              get
                                        fsimage_ckpt_33
                                        merge
parameters (CDH4)
fsimage_0                The number of image checkpoint files that will be retained
fsimage_33               dfs.namenode.num.checkpoints.retained
edits_inprogress_34
edits_1-10
edits_11-32              The number of extra transaction which should be retained
edits_33-33              dfs.namenode.num.extra.edits.retained

      interval
         dfs.namenode.checkpoint.period
      transcations
         dfs.namenode.checkpoint.txns
      Secondary NameNode Poll NameNode every seconds
         dfs.namenode.checkpoint.check.period

      *fstime is no longer necessary since it’s all encapsulated in the transaction IDs
Secondary NN web UI (CDH4)
reference
• HDFS paramters
  • http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop-
    project-dist/hadoop-hdfs/hdfs-default.xml
• HDFS-1073
  • https://issues.apache.org/jira/secure/attachment/12478323/hdfs
    1073.pdf
• O’Reilly Hadoop: The definitive Guide, 3rd edition

Mais conteúdo relacionado

Mais procurados

101 3.3 perform basic file management
101 3.3 perform basic file management101 3.3 perform basic file management
101 3.3 perform basic file managementAcácio Oliveira
 
101 3.3 perform basic file management
101 3.3 perform basic file management101 3.3 perform basic file management
101 3.3 perform basic file managementAcácio Oliveira
 
Most frequently used unix commands for database administrator
Most frequently used unix commands for database administratorMost frequently used unix commands for database administrator
Most frequently used unix commands for database administratorDinesh jaisankar
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...NETWAYS
 
Linux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for BeginnersLinux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for BeginnersDavide Ciambelli
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testingGaruda Trainings
 
2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-iLogesh Kumar Anandhan
 
101 4.1 create partitions and filesystems
101 4.1 create partitions and filesystems101 4.1 create partitions and filesystems
101 4.1 create partitions and filesystemsAcácio Oliveira
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configurationGerrit van Vuuren
 
Fun with processes - lightning talk
Fun with processes - lightning talkFun with processes - lightning talk
Fun with processes - lightning talkPaweł Dawczak
 
Lpi lição 01 exam 102 objectives
Lpi lição 01  exam 102 objectivesLpi lição 01  exam 102 objectives
Lpi lição 01 exam 102 objectivesAcácio Oliveira
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
From Drives to URLs
From Drives to URLsFrom Drives to URLs
From Drives to URLsadil raja
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Monowar Mukul
 
Linux command line cheatsheet
Linux command line cheatsheetLinux command line cheatsheet
Linux command line cheatsheetWe Ihaveapc
 
Postgresql 12 streaming replication hol
Postgresql 12 streaming replication holPostgresql 12 streaming replication hol
Postgresql 12 streaming replication holVijay Kumar N
 

Mais procurados (20)

101 3.3 perform basic file management
101 3.3 perform basic file management101 3.3 perform basic file management
101 3.3 perform basic file management
 
101 3.3 perform basic file management
101 3.3 perform basic file management101 3.3 perform basic file management
101 3.3 perform basic file management
 
Most frequently used unix commands for database administrator
Most frequently used unix commands for database administratorMost frequently used unix commands for database administrator
Most frequently used unix commands for database administrator
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
 
Rhel1
Rhel1Rhel1
Rhel1
 
Linux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for BeginnersLinux Bash Shell Cheat Sheet for Beginners
Linux Bash Shell Cheat Sheet for Beginners
 
Build Your OS Part1
Build Your OS Part1Build Your OS Part1
Build Your OS Part1
 
basic-unix.pdf
basic-unix.pdfbasic-unix.pdf
basic-unix.pdf
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testing
 
2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i2345014 unix-linux-bsd-cheat-sheets-i
2345014 unix-linux-bsd-cheat-sheets-i
 
101 4.1 create partitions and filesystems
101 4.1 create partitions and filesystems101 4.1 create partitions and filesystems
101 4.1 create partitions and filesystems
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configuration
 
Fun with processes - lightning talk
Fun with processes - lightning talkFun with processes - lightning talk
Fun with processes - lightning talk
 
Log
LogLog
Log
 
Lpi lição 01 exam 102 objectives
Lpi lição 01  exam 102 objectivesLpi lição 01  exam 102 objectives
Lpi lição 01 exam 102 objectives
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
From Drives to URLs
From Drives to URLsFrom Drives to URLs
From Drives to URLs
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine
 
Linux command line cheatsheet
Linux command line cheatsheetLinux command line cheatsheet
Linux command line cheatsheet
 
Postgresql 12 streaming replication hol
Postgresql 12 streaming replication holPostgresql 12 streaming replication hol
Postgresql 12 streaming replication hol
 

Semelhante a HDFS metadata (fsimage and edits) difference CDH3 and CDH4

Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfsSarwar Javaid
 
Ugif 09 2013 new environment and dynamic setting in ids 12.10
Ugif 09 2013   new environment and dynamic setting in ids 12.10Ugif 09 2013   new environment and dynamic setting in ids 12.10
Ugif 09 2013 new environment and dynamic setting in ids 12.10UGIF
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
First there was the command line
First there was the command lineFirst there was the command line
First there was the command lineAdrian Cardenas
 
Linux Common Command
Linux Common CommandLinux Common Command
Linux Common CommandJeff Yang
 
Linea de comandos bioface zem800
Linea de comandos bioface zem800Linea de comandos bioface zem800
Linea de comandos bioface zem800thomaswarnerherrera
 
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013Puppet
 
List command linux fidora
List command linux fidoraList command linux fidora
List command linux fidoraJinyuan Loh
 
Keynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics StudiesKeynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics StudiesESEM 2014
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesIO Visor Project
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationKanwar Batra
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaYoungHeon (Roy) Kim
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debrisfrewmbot
 
Learning the command line
Learning the command lineLearning the command line
Learning the command lineAdrian Cardenas
 
Devops for beginners
Devops for beginnersDevops for beginners
Devops for beginnersVivek Parihar
 
4.3 control mounting and unmounting of filesystems v2
4.3 control mounting and unmounting of filesystems v24.3 control mounting and unmounting of filesystems v2
4.3 control mounting and unmounting of filesystems v2Acácio Oliveira
 
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...Cloudera, Inc.
 

Semelhante a HDFS metadata (fsimage and edits) difference CDH3 and CDH4 (20)

Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfs
 
Ugif 09 2013 new environment and dynamic setting in ids 12.10
Ugif 09 2013   new environment and dynamic setting in ids 12.10Ugif 09 2013   new environment and dynamic setting in ids 12.10
Ugif 09 2013 new environment and dynamic setting in ids 12.10
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
First there was the command line
First there was the command lineFirst there was the command line
First there was the command line
 
Linux Common Command
Linux Common CommandLinux Common Command
Linux Common Command
 
Linea de comandos bioface zem800
Linea de comandos bioface zem800Linea de comandos bioface zem800
Linea de comandos bioface zem800
 
Ex200
Ex200Ex200
Ex200
 
Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013Using Puppet to Create a Dynamic Network - PuppetConf 2013
Using Puppet to Create a Dynamic Network - PuppetConf 2013
 
List command linux fidora
List command linux fidoraList command linux fidora
List command linux fidora
 
Keynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics StudiesKeynote 1 - Engineering Software Analytics Studies
Keynote 1 - Engineering Software Analytics Studies
 
BPF Tools 2017
BPF Tools 2017BPF Tools 2017
BPF Tools 2017
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replication
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Miscelaneous Debris
Miscelaneous DebrisMiscelaneous Debris
Miscelaneous Debris
 
Learning the command line
Learning the command lineLearning the command line
Learning the command line
 
Devops for beginners
Devops for beginnersDevops for beginners
Devops for beginners
 
4.3 control mounting and unmounting of filesystems v2
4.3 control mounting and unmounting of filesystems v24.3 control mounting and unmounting of filesystems v2
4.3 control mounting and unmounting of filesystems v2
 
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

HDFS metadata (fsimage and edits) difference CDH3 and CDH4

  • 1. fsimage and edits in CDH3 and CDH4 Tatsuo Kawasaki tatsuo@cloudera.com
  • 2. objective HDFS metadata (fsimage and edits) management is different between CDH3 and CDH4. This presentation introduces a these difference. Please let me know if you find any issue.
  • 3. HDFS metada in CDH3 [root@localhost ~]# ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/ total 1100 -rw-r--r-- 1 hdfs hdfs 101 Jan 30 00:21 VERSION -rw-r--r-- 1 hdfs hdfs 8 Jan 30 00:21 fstime -rw-r--r-- 1 hdfs hdfs 57248 Jan 30 00:21 fsimage -rw-r--r-- 1 hdfs hdfs 1048580 Jan 31 16:16 edits after checkpoint [root@localhost ~]# ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/ total 84 -rw-r--r-- 1 hdfs hdfs 101 Feb 5 14:37 VERSION -rw-r--r-- 1 hdfs hdfs 8 Feb 5 14:37 fstime -rw-r--r-- 1 hdfs hdfs 66760 Feb 5 14:37 fsimage -rw-r--r-- 1 hdfs hdfs 4 Feb 5 14:37 edits
  • 4. timeline (CDH3) NameNode CheckPoint CheckPoint put file start Done t0 t1 t2 t3 t4 rename fsimage fsimage.ckpt fsimage Update edits edits Update metadata in edits.new edits memory rename create fstime fstime get transfer update time Secondary NameNode fsimage fsimage.ckpt merge edits
  • 5. Secondary NN Web UI (CDH3)
  • 6. HDFS metadata in CDH4 After formatting HDFS -bash-4.1$ ls -l /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/ total 1040 -rw-r--r-- 1 hdfs hdfs 1048576 Feb 5 01:35 edits_inprogress_0000000000000000001 -rw-rw-r-- 1 hdfs hdfs 119 Feb 5 01:33 fsimage_0000000000000000000 -rw-rw-r-- 1 hdfs hdfs 62 Feb 5 01:33 fsimage_0000000000000000000.md5 -rw-r--r-- 1 hdfs hdfs 2 Feb 5 01:35 seen_txid -rw-rw-r-- 1 hdfs hdfs 202 Feb 5 01:33 VERSION -bash-4.1$ hexdump -C /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/seen_txid 00000000 31 0a |1.| 00000002 Transaction ID is included in seen_txid
  • 7. try to add new file [training@localhost ~]$ hadoop fs -put /etc/hosts hosts [training@localhost ~]$
  • 8. oiv - fsimage viewer -bash-4.1$ hdfs oiv -i /var/lib/hadoop- hdfs/cache/hdfs/dfs/name/current/fsimage_000000000000000 0000 -o aaa -bash-4.1$ cat aaa drwxr-xr-x - hdfs supergroup 0 1969-12-31 19:00 / ‘hosts’ file has not written in fsimage before checkpointing
  • 9. oev – edits viewer -bash-4.1$ hdfs oev -i /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_0000000000000000020-0000000000000000027 -o bbb cat bbb <?xml version="1.0" encoding="UTF-8"?> <EDITS> <EDITS_VERSION>-40</EDITS_VERSION> <RECORD> <OPCODE>OP_START_LOG_SEGMENT</OPCODE> <DATA> start <TXID>20</TXID> </DATA> Transaction ID </RECORD> <RECORD> <OPCODE>OP_SET_GENSTAMP</OPCODE> <DATA> <TXID>21</TXID> <GENSTAMP>1003</GENSTAMP> </DATA> </RECORD> Transaction ID <RECORD> <OPCODE>OP_ADD</OPCODE> Put <DATA> <TXID>22</TXID> Transaction <LENGTH>0</LENGTH> <PATH>/user/training/hosts._COPYING_</PATH> <REPLICATION>1</REPLICATION> <MTIME>1360046220628</MTIME> <ATIME>1360046220628</ATIME> <BLOCKSIZE>67108864</BLOCKSIZE> <CLIENT_NAME>DFSClient_NONMAPREDUCE_1911533003_1</CLIENT_NAME> <CLIENT_MACHINE>127.0.0.1</CLIENT_MACHINE> <PERMISSION_STATUS> <USERNAME>training</USERNAME> <GROUPNAME>supergroup</GROUPNAME> <MODE>420</MODE> </PERMISSION_STATUS> </DATA> </RECORD>
  • 10. oev – edits viewer (cont) ファイル名edits_0000000000000000020-0000000000000000027 <RECORD> <OPCODE>OP_SET_GENSTAMP</OPCODE> <DATA> <TXID>23</TXID> <GENSTAMP>1004</GENSTAMP> </DATA> </RECORD> <RECORD> <OPCODE>OP_UPDATE_BLOCKS</OPCODE> <DATA> <TXID>24</TXID> <PATH>/user/training/hosts._COPYING_</PATH> <BLOCK> <BLOCK_ID>-3498739165311848505</BLOCK_ID> <NUM_BYTES>0</NUM_BYTES> <GENSTAMP>1004</GENSTAMP> </BLOCK> </DATA> </RECORD> <RECORD> <OPCODE>OP_CLOSE</OPCODE> <DATA> <TXID>25</TXID> <LENGTH>0</LENGTH> <PATH>/user/training/hosts._COPYING_</PATH> <REPLICATION>1</REPLICATION> <MTIME>1360046220735</MTIME> <ATIME>1360046220628</ATIME> <BLOCKSIZE>67108864</BLOCKSIZE> <CLIENT_NAME></CLIENT_NAME> <CLIENT_MACHINE></CLIENT_MACHINE> <BLOCK> <BLOCK_ID>-3498739165311848505</BLOCK_ID> <NUM_BYTES>83</NUM_BYTES> <GENSTAMP>1004</GENSTAMP> </BLOCK>
  • 11. oev – edits viewer (cont) ファイル名:edits_0000000000000000020-0000000000000000027 <PERMISSION_STATUS> <USERNAME>training</USERNAME> <GROUPNAME>supergroup</GROUPNAME> <MODE>420</MODE> </PERMISSION_STATUS> </DATA> </RECORD> <RECORD> <OPCODE>OP_RENAME_OLD</OPCODE> <DATA> <TXID>26</TXID> <LENGTH>0</LENGTH> <SRC>/user/training/hosts._COPYING_</SRC> <DST>/user/training/hosts</DST> End <TIMESTAMP>1360046220738</TIMESTAMP> </DATA> Transaction ID </RECORD> <RECORD> <OPCODE>OP_END_LOG_SEGMENT</OPCODE> <DATA> <TXID>27</TXID> </DATA> </RECORD> </EDITS>
  • 12. After checkpointing -bash-4.1$ ls -l /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/ total 1376 -rw-r--r-- 1 hdfs hdfs 1317 Feb 5 01:36 edits_0000000000000000001-0000000000000000019 -rw-r--r-- 1 hdfs hdfs 471 Feb 5 01:37 edits_0000000000000000020-0000000000000000027 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:38 edits_0000000000000000028-0000000000000000029 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:39 edits_0000000000000000030-0000000000000000031 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:40 edits_0000000000000000032-0000000000000000033 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 01:41 edits_0000000000000000034-0000000000000000035 (略) -rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:53 edits_0000000000000000178-0000000000000000179 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:54 edits_0000000000000000180-0000000000000000181 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:55 edits_0000000000000000182-0000000000000000183 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:56 edits_0000000000000000184-0000000000000000185 -rw-r--r-- 1 hdfs hdfs 30 Feb 5 02:58 edits_0000000000000000186-0000000000000000187 -rw-r--r-- 1 hdfs hdfs 1048576 Feb 5 02:58 edits_inprogress_0000000000000000188 -rw-rw-r-- 1 hdfs hdfs 119 Feb 5 01:33 fsimage_0000000000000000000 -rw-rw-r-- 1 hdfs hdfs 62 Feb 5 01:33 fsimage_0000000000000000000.md5 -rw-r--r-- 1 hdfs hdfs 1211 Feb 5 02:58 fsimage_0000000000000000187 -rw-r--r-- 1 hdfs hdfs 62 Feb 5 02:58 fsimage_0000000000000000187.md5 -rw-r--r-- 1 hdfs hdfs 4 Feb 5 02:58 seen_txid Transaction ID is -bash-4.1$ hexdump -C /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/seen_txid included in 00000000 31 38 38 0a |188.| 00000004 seen_txid
  • 13. oiv - fsimage viewer -bash-4.1$ hdfs oiv -i /var/lib/hadoop- hdfs/cache/hdfs/dfs/name/current/fsimage_0000000000000000187 -o aaa -bash-4.1$ cat aaa drwxr-xr-x - hdfs supergroup 0 2013-02-05 01:35 / drwxr-xr-x - hdfs supergroup 0 2013-02-05 01:35 /user drwxr-xr-x - mapred supergroup 0 2013-02-05 01:35 /var drwxrwxrwt - hdfs supergroup 0 2013-02-05 01:37 /user/training -rw-r--r-- 1 training supergroup 83 2013-02-05 01:37 /user/training/hosts drwxr-xr-x - mapred supergroup 0 2013-02-05 01:35 /var/lib drwxr-xr-x - mapred supergroup 0 2013-02-05 01:35 /var/lib/hadoop-hdfs drwxr-xr-x - mapred supergroup 0 2013-02-05 01:35 /var/lib/hadoop-hdfs/cache drwxr-xr-x - mapred supergroup 0 2013-02-05 01:35 /var/lib/hadoop- hdfs/cache/mapred drwxr-xr-x - mapred supergroup 0 2013-02-05 01:35 /var/lib/hadoop- hdfs/cache/mapred/mapred drwx------ - mapred supergroup 0 2013-02-05 01:35 /var/lib/hadoop- ‘hosts’file has hdfs/cache/mapred/mapred/system been -rw------- 1 mapred supergroup written in HDFS 2013-02-05 01:35 /var/lib/hadoop- 4 after hdfs/cache/mapred/mapred/system/jobtracker.info checkpoint
  • 14. oev – edits viewer -bash-4.1$ hdfs oev -i /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/edits_inprogress_0000000000000000188 -o bbb -bash-4.1$ cat bbb <?xml version="1.0" encoding="UTF-8"?> <EDITS> <EDITS_VERSION>-40</EDITS_VERSION> Transaction ID <RECORD> <OPCODE>OP_START_LOG_SEGMENT</OPCODE> <DATA> <TXID>188</TXID> </DATA> </RECORD> </EDITS>
  • 15. timeline (CDH4)-1 NameNode t0 (10 times transaction) t1 (22 times transaction) t2 fsimage_0 create new edits_inprogress using new Transaction ID edits_inprogress_1 edits_inprogress_11 edits_inprogress_33 edits_1-10 edits_1-10 finalize and rename edits_11-32 (transaction 1-10) finalize and rename Secondary NameNode (transaction 11-22) trigger a log roll: 1) NN Startup 2) saveNameSpace 3) SecondaryNN CheckPoint 4) storage directroy becomes available 5) admin operation
  • 16. timeline (CDH4)-2 NameNode CheckPoint CheckPoint start Done t4 t5 t6 fsimage_0 fsimage_0 fsimage_ckpt_33 fsimage_33 rename edits_inprogress_33 edits_inprogress_34 edits_1-10 edits_1-10 edits_11-32 edits_11-32 transfer edits_33- Secondary NameNode 33 get fsimage_ckpt_33 merge
  • 17. parameters (CDH4) fsimage_0 The number of image checkpoint files that will be retained fsimage_33 dfs.namenode.num.checkpoints.retained edits_inprogress_34 edits_1-10 edits_11-32 The number of extra transaction which should be retained edits_33-33 dfs.namenode.num.extra.edits.retained interval dfs.namenode.checkpoint.period transcations dfs.namenode.checkpoint.txns Secondary NameNode Poll NameNode every seconds dfs.namenode.checkpoint.check.period *fstime is no longer necessary since it’s all encapsulated in the transaction IDs
  • 18. Secondary NN web UI (CDH4)
  • 19. reference • HDFS paramters • http://archive.cloudera.com/cdh4/cdh/4/hadoop/hadoop- project-dist/hadoop-hdfs/hdfs-default.xml • HDFS-1073 • https://issues.apache.org/jira/secure/attachment/12478323/hdfs 1073.pdf • O’Reilly Hadoop: The definitive Guide, 3rd edition