Learn more on HBase here: http://hbase.apache.org/book.html
Lets create a HBase table first and add some data to it.
[cloudera@localhost ~]$ hbase shell
13/03/27 00:04:31 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help<RETURN>’ for list of supported commands.
Type “exit<RETURN>” to leave the HBase Shell
Version 0.94.2-cdh4.2.0, rUnknown, Fri Feb 15 11:51:18 PST 2013
hbase(main):001:0> create ‘students’, ‘name’
0 row(s) in 2.5020 seconds
=> Hbase::Table – students
hbase(main):002:0> list ‘students’
TABLE
students
1 row(s) in 0.0540 seconds
=> [“students”]
hbase(main):003:0> put ‘students’, ‘row1’, ‘name:id1′,’John’
0 row(s) in 0.0400 seconds
hbase(main):004:0> put ‘students’, ‘row2’, ‘name:id2′,’Jim’
0 row(s) in 0.0070 seconds
hbase(main):005:0> put ‘students’, ‘row3’, ‘name:id3′,’Will’
0 row(s) in 0.0070 seconds
hbase(main):006:0> put ‘students’, ‘row4’, ‘name:id4′,’Henry’
0 row(s) in 0.0040 seconds
hbase(main):007:0> put ‘students’, ‘row5’, ‘name:id5′,’Ken’
0 row(s) in 0.0440 seconds
hbase(main):008:0> scan ‘students’
ROW COLUMN+CELL
row1 column=name:id1, timestamp=1364357135479, value=John
row2 column=name:id2, timestamp=1364357147587, value=Jim
row3 column=name:id3, timestamp=1364357161684, value=Will
row4 column=name:id4, timestamp=1364357173959, value=Henry
row5 column=name:id5, timestamp=1364357189836, value=Ken
5 row(s) in 0.0450 seconds
As you can see above we have table students and only one table name comes when running list command.
In my Hadoop cluster the HBase is configured to use /hbase folder so now lets check the disk utilization in /hbase folder:
[cloudera@localhost ~]$ hdfs dfs -du /hbase
2868 /hbase/-ROOT-
245 /hbase/.META.
0 /hbase/.archive
0 /hbase/.corrupt
2424 /hbase/.logs
0 /hbase/.oldlogs
0 /hbase/.tmp
38 /hbase/hbase.id
3 /hbase/hbase.version
928 /hbase/students
Above table students is user table however -ROOT- and .META. are HBase catalog tables. These tables are part of HBase configuration where HBase keeps catalog about the user tables. To understand each table structure lets run describe command:
hbase(main):010:0> describe ‘-ROOT-‘
DESCRIPTION ENABLED
{NAME => ‘-ROOT-‘, IS_ROOT => ‘true’, IS_META => ‘t true
rue’, FAMILIES => [{NAME => ‘info’, DATA_BLOCK_ENCO
DING => ‘NONE’, BLOOMFILTER => ‘NONE’, REPLICATION_
SCOPE => ‘0’, COMPRESSION => ‘NONE’, VERSIONS => ‘1
0’, TTL => ‘2147483647’, MIN_VERSIONS => ‘0’, KEEP_
DELETED_CELLS => ‘false’, BLOCKSIZE => ‘8192’, ENCO
DE_ON_DISK => ‘true’, IN_MEMORY => ‘true’, BLOCKCAC
HE => ‘true’}]}
1 row(s) in 0.0700 seconds
hbase(main):011:0> describe ‘.META.’
DESCRIPTION ENABLED
{NAME => ‘.META.’, IS_META => ‘true’, FAMILIES => [ true
{NAME => ‘info’, DATA_BLOCK_ENCODING => ‘NONE’, BLO
OMFILTER => ‘NONE’, REPLICATION_SCOPE => ‘0’, COMPR
ESSION => ‘NONE’, VERSIONS => ’10’, TTL => ‘2147483
647’, MIN_VERSIONS => ‘0’, KEEP_DELETED_CELLS => ‘f
alse’, BLOCKSIZE => ‘8192’, ENCODE_ON_DISK => ‘true
‘, IN_MEMORY => ‘true’, BLOCKCACHE => ‘true’}]}
1 row(s) in 0.0470 seconds
hbase(main):012:0> describe ‘students’
DESCRIPTION ENABLED
{NAME => ‘students’, FAMILIES => [{NAME => ‘name’, true
DATA_BLOCK_ENCODING => ‘NONE’, BLOOMFILTER => ‘NONE
‘, REPLICATION_SCOPE => ‘0’, VERSIONS => ‘3’, COMPR
ESSION => ‘NONE’, MIN_VERSIONS => ‘0’, TTL => ‘2147
483647’, KEEP_DELETED_CELLS => ‘false’, BLOCKSIZE =
> ‘65536’, IN_MEMORY => ‘false’, ENCODE_ON_DISK =>
‘true’, BLOCKCACHE => ‘true’}]}
1 row(s) in 0.0270 seconds
Now we can check the file structure for the user table ‘students‘ as below:
[cloudera@localhost ~]$ hdfs dfs -du /hbase/students
697 /hbase/students/.tableinfo.0000000001
0 /hbase/students/.tmp
231 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14
We can also check the HBase system specific tables structure as well:
[cloudera@localhost ~]$ hdfs dfs -du /hbase/-ROOT-
727 /hbase/-ROOT-/.tableinfo.0000000001
0 /hbase/-ROOT-/.tmp
2141 /hbase/-ROOT-/70236052
[cloudera@localhost ~]$ hdfs dfs -du /hbase/.META.
245 /hbase/.META./1028785192
Now if we dig further to see the file structure for user table students we can learn about regioninfo as below:
[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students
Found 3 items
-rw-r–r– 1 hbase supergroup 697 2013-03-27 00:04 /hbase/students/.tableinfo.0000000001
drwxr-xr-x – hbase supergroup 0 2013-03-27 00:04 /hbase/students/.tmp
drwxr-xr-x – hbase supergroup 0 2013-03-27 00:04 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14
[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students/.tmp
Now here we can see the regioninfo details about the table ‘students’
[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students/b2cd87df288adbb7e9ff2423ca532e14
Found 2 items
-rw-r–r– 1 hbase supergroup 231 2013-03-27 00:04 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/.regioninfo
drwxr-xr-x – hbase supergroup 0 2013-03-27 00:04 /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/name
[cloudera@localhost ~]$ hdfs dfs -ls /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/name
[cloudera@localhost ~]$ hdfs dfs -cat /hbase/students/b2cd87df288adbb7e9ff2423ca532e14/.regioninfo
=�&:9students,,1364357097018.b2cd87df288adbb7e9ff2423ca532e14students�}�
{NAME => ‘students,,1364357097018.b2cd87df288adbb7e9ff2423ca532e14.’, STARTKEY => ”, ENDKEY => ”, ENCODED => b2cd87df288adbb7e9ff2423ca532e14,}[cloudera@local[cloudera@localhost ~]$
This is the way we can understand more about HBase user table details in HDFS.
Keywords: Hadoop, HBase, Regions, RegionServer, Catalog, Cloudera, HDFS
0.000000
0.000000