Understanding HBase tables and HDFS file structure in Hadoop

Learn more on HBase here: http://hbase.apache.org/book.html

Lets create a HBase table first and add some data to it.

[cloudera@localhost ~]$ hbase shell
13/03/27 00:04:31 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help<RETURN>’ for list of supported commands.
Type “exit<RETURN>” to leave the HBase Shell
Version 0.94.2-cdh4.2.0, rUnknown, Fri Feb 15 11:51:18 PST 2013

hbase(main):001:0> create ‘students’, ‘name’
0 row(s) in 2.5020 seconds

=> Hbase::Table – students
hbase(main):002:0> list ‘students’
TABLE
students
1 row(s) in 0.0540 seconds

=> [“students”]
hbase(main):003:0> put ‘students’, ‘row1’, ‘name:id1′,’John’
0 row(s) in 0.0400 seconds

hbase(main):004:0> put ‘students’, ‘row2’, ‘name:id2′,’Jim’
0 row(s) in 0.0070 seconds

hbase(main):005:0> put ‘students’, ‘row3’, ‘name:id3′,’Will’
0 row(s) in 0.0070 seconds

hbase(main):006:0> put ‘students’, ‘row4’, ‘name:id4′,’Henry’
0 row(s) in 0.0040 seconds

hbase(main):007:0> put ‘students’, ‘row5’, ‘name:id5′,’Ken’
0 row(s) in 0.0440 seconds

hbase(main):008:0> scan ‘students’
ROW COLUMN+CELL
row1 column=name:id1, timestamp=1364357135479, value=John
row2 column=name:id2, timestamp=1364357147587, value=Jim
row3 column=name:id3, timestamp=1364357161684, value=Will
row4 column=name:id4, timestamp=1364357173959, value=Henry
row5 column=name:id5, timestamp=1364357189836, value=Ken
5 row(s) in 0.0450 seconds

As you can see above we have table students and only one table name comes when  running list command.

In my Hadoop cluster the HBase is configured to use /hbase folder so now lets check the disk utilization in /hbase folder:

[cloudera@localhost ~]$ hdfs dfs -du /hbase
2868 /hbase/-ROOT-
245 /hbase/.META.
0 /hbase/.archive
0 /hbase/.corrupt
2424 /hbase/.logs
0 /hbase/.oldlogs
0 /hbase/.tmp
38 /hbase/hbase.id
3 /hbase/hbase.version
928 /hbase/students

Above table students is user table however -ROOT- and .META. are HBase catalog tables. These tables are part of HBase configuration where HBase keeps catalog about the user tables. To understand each table structure lets run describe command:

hbase(main):010:0> describe ‘-ROOT-‘
DESCRIPTION ENABLED
{NAME => ‘-ROOT-‘, IS_ROOT => ‘true’, IS_META => ‘t true
rue’, FAMILIES => [{NAME => ‘info’, DATA_BLOCK_ENCO
DING => ‘NONE’, BLOOMFILTER => ‘NONE’, REPLICATION_
SCOPE => ‘0’, COMPRESSION => ‘NONE’, VERSIONS => ‘1
0’, TTL => ‘2147483647’, MIN_VERSIONS => ‘0’, KEEP_
DELETED_CELLS => ‘false’, BLOCKSIZE => ‘8192’, ENCO
DE_ON_DISK => ‘true’, IN_MEMORY => ‘true’, BLOCKCAC
HE => ‘true’}]}
1 row(s) in 0.0700 seconds

hbase(main):011:0> describe ‘.META.’
DESCRIPTION ENABLED
{NAME => ‘.META.’, IS_META => ‘true’, FAMILIES => [ true
{NAME => ‘info’, DATA_BLOCK_ENCODING => ‘NONE’, BLO
OMFILTER => ‘NONE’, REPLICATION_SCOPE => ‘0’, COMPR
ESSION => ‘NONE’, VERSIONS => ’10’, TTL => ‘2147483
647’, MIN_VERSIONS => ‘0’, KEEP_DELETED_CELLS => ‘f
alse’, BLOCKSIZE => ‘8192’, ENCODE_ON_DISK => ‘true
‘, IN_MEMORY => ‘true’, BLOCKCACHE => ‘true’}]}
1 row(s) in 0.0470 seconds

hbase(main):012:0> describe ‘students’
DESCRIPTION ENABLED
{NAME => ‘students’, FAMILIES => [{NAME => ‘name’, true
DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘NONE
‘, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘3’, COMPR
ESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147
483647’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE =
> ‘65536’, IN_MEMORY => ‘false’, ENCODE_ON_DISK =>
‘true’, BLOCKCACHE => ‘true’}]}
1 row(s) in 0.0270 seconds

Now we can check the file structure for the user table ‘students‘ as below:

[cloudera@localhost ~]$ hdfs dfs -du /hbase/students
697 /hbase/students/.tableinfo.0000000001
0 /hbase/students/.tmp
231 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14

 

We can also check the HBase system specific tables structure as well:

[cloudera@localhost ~]$ hdfs dfs -du /hbase/-ROOT-
727 /hbase/-ROOT-/.tableinfo.0000000001
0 /hbase/-ROOT-/.tmp
2141 /hbase/-ROOT-/70236052
[cloudera@localhost ~]$ hdfs dfs -du /hbase/.META.
245 /hbase/.META./1028785192

Now if we dig further to see the file structure for user table students we can learn about regioninfo as below:

[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students
Found 3 items
-rw-r–r– 1 hbase supergroup 697 2013-03-27 00:04 /hbase/students/.tableinfo.0000000001
drwxr-xr-x – hbase supergroup 0 2013-03-27 00:04 /hbase/students/.tmp
drwxr-xr-x – hbase supergroup 0 2013-03-27 00:04 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14
[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students/.tmp

Now here we can see the regioninfo details about the table ‘students’

[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students/b2cd87df288adbb7e9ff2423ca532e14
Found 2 items
-rw-r–r– 1 hbase supergroup 231 2013-03-27 00:04 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/.regioninfo
drwxr-xr-x – hbase supergroup 0 2013-03-27 00:04 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/name

[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/name

[cloudera@localhost ~]$ hdfs dfs -cat /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/.regioninfo
=�&:9students,,1364357097018.b2cd87df288adbb7e9ff2423ca532e14students�}�

{NAME => ‘students,,1364357097018.b2cd87df288adbb7e9ff2423ca532e14.’, STARTKEY => ”, ENDKEY => ”, ENCODED => b2cd87df288adbb7e9ff2423ca532e14,}[cloudera@local[cloudera@localhost ~]$

This is the way we can understand more about HBase user table details in HDFS.

Keywords: Hadoop, HBase, Regions, RegionServer, Catalog, Cloudera, HDFS

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s