Constructing blocks and file system relationship in HDFS

I am using a 3 nodes Hadoop cluster running Windows Azure HDInsight for the testing.

In Hadoop we can use fsck utility to diagnose the health of the HDFS file system, to find missing files or blocks or calculate them for integrity.

Lets Running FSCK for the root file system:

c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fsck /

FSCK started by avkash from /10.114.132.17 for path / at Thu Mar 07 05:27:39 GMT 2013
……….Status: HEALTHY
Total size: 552335333 B
Total dirs: 21
Total files: 10
Total blocks (validated): 12 (avg. block size 46027944 B)
Minimally replicated blocks: 12 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 3
FSCK ended at Thu Mar 07 05:27:39 GMT 2013 in 8 milliseconds

The filesystem under path ‘/’ is HEALTHY

 

Now let’s check the total files in the root (/) to verify the files and directories:

 

c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fs -lsr /

drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example/apps
-rw-r–r– 3 avkash supergroup 4608 2013-03-04 21:16 /example/apps/cat.exe
-rw-r–r– 3 avkash supergroup 5120 2013-03-04 21:16 /example/apps/wc.exe
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example/data
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example/data/gutenberg
-rw-r–r– 3 avkash supergroup 1395667 2013-03-04 21:16 /example/data/gutenberg/davinci.txt
-rw-r–r– 3 avkash supergroup 674762 2013-03-04 21:16 /example/data/gutenberg/outlineofscience.txt
-rw-r–r– 3 avkash supergroup 1573044 2013-03-04 21:16 /example/data/gutenberg/ulysses.txt
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:15 /hdfs
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:15 /hdfs/tmp
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:15 /hdfs/tmp/mapred
drwx—— – avkash supergroup 0 2013-03-04 21:15 /hdfs/tmp/mapred/system
-rw——- 3 avkash supergroup 4 2013-03-04 21:15 /hdfs/tmp/mapred/system/jobtracker.info
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /hive
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /hive/warehouse
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /hive/warehouse/hivesampletable
-rw-r–r– 3 avkash supergroup 5015508 2013-03-04 21:16 /hive/warehouse/hivesampletable/HiveSampleData.txt
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /tmp
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /tmp/hive-avkash
drwxrwxrwx – SYSTEM supergroup 0 2013-03-04 21:15 /uploads
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user/SYSTEM
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user/SYSTEM/graph
-rw-r–r– 3 avkash supergroup 80 2013-03-04 21:16 /user/SYSTEM/graph/catepillar_star.edge
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user/SYSTEM/query
-rw-r–r– 3 avkash supergroup 12 2013-03-04 21:16 /user/SYSTEM/query/catepillar_star_rwr.query
drwxr-xr-x – avkash supergroup 0 2013-03-05 07:37 /user/avkash
drwxr-xr-x – avkash supergroup 0 2013-03-04 23:00 /user/avkash/.Trash
-rw-r–r– 3 avkash supergroup 543666528 2013-03-05 07:37 /user/avkash/data_w3c_large.txt

Above we found that there are total 21 directories and 10 files. Now we can dig further to check the total 12 blocks in HDFS for each files:

c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fsck / -files -blocks –racks
FSCK started by avkash from /10.114.132.17 for path / at Thu Mar 07 05:35:44 GMT 2013
/

/example

/example/apps

/example/apps/cat.exe 4608 bytes, 1 block(s): OK

0. blk_9084981204553714951_1008 len=4608 repl=3 [/fd0/ud0/10.114.236.28:50010, /

fd0/ud2/10.114.236.42:50010, /fd1/ud1/10.114.228.35:50010]

/example/apps/wc.exe 5120 bytes, 1 block(s): OK
0. blk_-7951603158243426483_1009 len=5120 repl=3 [/fd1/ud1/10.114.228.35:50010,
/fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/example/data

/example/data/gutenberg

/example/data/gutenberg/davinci.txt 1395667 bytes, 1 block(s): OK

0. blk_3859330889089858864_1005 len=1395667 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/example/data/gutenberg/outlineofscience.txt 674762 bytes, 1 block(s): OK

0. blk_-3790696559021810548_1006 len=674762 repl=3 [/fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010, /fd1/ud1/10.114.228.35:50010]

/example/data/gutenberg/ulysses.txt 1573044 bytes, 1 block(s): OK

0. blk_-8671592324971725227_1007 len=1573044 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/hdfs

/hdfs/tmp

/hdfs/tmp/mapred

/hdfs/tmp/mapred/system

/hdfs/tmp/mapred/system/jobtracker.info 4 bytes, 1 block(s): OK

0. blk_5997185491433558819_1003 len=4 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/hive

/hive/warehouse

/hive/warehouse/hivesampletable

/hive/warehouse/hivesampletable/HiveSampleData.txt 5015508 bytes, 1 block(s): OK

0. blk_44873054283747216_1004 len=5015508 repl=3 [/fd1/ud1/10.114.228.35:50010,/fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/tmp

/tmp/hive-avkash

/uploads

/user

/user/SYSTEM

/user/SYSTEM/graph

/user/SYSTEM/graph/catepillar_star.edge 80 bytes, 1 block(s): OK

0. blk_-6715685143024983574_1010 len=80 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/user/SYSTEM/query

/user/SYSTEM/query/catepillar_star_rwr.query 12 bytes, 1 block(s): OK

0. blk_8102317020509190444_1011 len=12 repl=3 [/fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010, /fd1/ud1/10.114.228.35:50010]

/user/avkash

/user/avkash/.Trash

/user/avkash/data_w3c_large.txt 543666528 bytes, 3 block(s): OK

0. blk_2005027737969478969_1012 len=268435456 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010]

1. blk_1970119524179712436_1012 len=268435456 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010]

2. blk_6223000007391223944_1012 len=6795616 repl=3 [/fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010, /fd1/ud1/10.114.228.35:50010]

Status: HEALTHY
Total size: 552335333 B
Total dirs: 21
Total files: 10
Total blocks (validated): 12 (avg. block size 46027944 B)
Minimally replicated blocks: 12 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 3
FSCK ended at Thu Mar 07 05:35:44 GMT 2013 in 10 milliseconds

The filesystem under path ‘/’ is HEALTHY

Above we can verify that where all total 12 blocks are distributed. 9 blocks are distributed through 9 files and 3 blocks are distributed through 1 file.

 

Keywaords: FSCK, Hadoop,HDFS, Blocks, File System, Replications, HDInsight

Advertisements

2 thoughts on “Constructing blocks and file system relationship in HDFS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s