HBase基本命令总结表(实际操作方式)
进入Hbase:hbase shell
方式一:命令行窗口来操作HBase
1.通用性命令
version 版本信息
status 查看集群当前状态
whoami 查看登入者身份
help 帮助
2.HBase DDL操作(对象级操作)
2.1、namespace命名空间(相当于库)
# 1.【查看】已创建的【所有】命名空间列表
list_namespace
---------------------------
NAMESPACE
default
hbase
hbase_test
【test_hbase】
4 row(s)
Took 0.0631 seconds
---------------------------# 2.【创建】命名空间
create_namespace "test_hbase"# 3.【查看】【指定】命名空间(库)中的表
list_namespace_tables "test_hbase"
---------------------------
TABLE
0 row(s)
Took 0.0301 seconds
=> []
---------------------------# 4.【描述】命名空间的定义
describe_namespace "test_hbase"
---------------------------
DESCRIPTION
{NAME => 'test_hbase'}
Quota is disabled
---------------------------# 5.【删除】命名空间
drop_namespace "test_hbase"
2.2、Table表
# 1.查看所有表
list
---------------------------
TABLE
hbase_test:student_info
1 row(s)
Took 0.0202 seconds
=> ["hbase_test:student_info"]
---------------------------# 2.表是否存在
exists "test_hbase:test_table"
---------------------------
Table test_hbase:test_table does exist
Took 0.0114 seconds
=> true
---------------------------# 3.创建表
1.完整写法:
create "test_hbase:test_table",{NAME => 'base', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'TRUE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'},{NAME => 'sources', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false', VERSIONS => '3', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '655360', REPLICATION_SCOPE => '0'}
说明文字:BLOOMFILTER布隆过滤器有三个参数=>ROW,ROWCOL,NONEROW:只对行键进行BLOOMFILTER检测 => 分裂策略ROWCOL:行健和列键进行BLOOMFILTER检测NONE:不使用BLOOMFILTER,默认值为ROWTTL:TTL的值以秒为单位2.简单写法:✔
create "test_hbase:test_table","base","sources"# 4.查看表的定义
desc "test_hbase:test_table"
---------------------------
Table test_hbase:test_table is ENABLED
test_hbase:test_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'base', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DE
LETED_CELLS => 'TRUE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => '
FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATIO
N_SCOPE => '0'}
{NAME => 'sources', BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false', VERSIONS => '3', K
EEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', T
TL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '655360', RE
PLICATION_SCOPE => '0'}
---------------------------# 5.查看表的状态
is_enabled "test_hbase:test_table" # 是否已启用
is_disabled "test_hbase:test_table" # 是否已禁用
enable "test_hbase:test_table" # 启用表
disable "test_hbase:test_table" # 禁用表# 6.删除表【禁用状态的表才可以删除】
disable "test_hbase:test_table"
drop "test_hbase:test_table"
3.HBase DML操作(数据级操作)
# 1.添加数据=>列插入【一个put只能插入一列】
语法:put "表名","行键","列族:新增的信息","内容"
案例:【单】插入put "test_hbase:test_table","1","base:name","胡桃"put "test_hbase:test_table","1","base:age",17put "test_hbase:test_table","1","base:gender","女"put "test_hbase:test_table","1","sources:English",82put "test_hbase:test_table","1","sources:Math",90# 2.查看全表数据【全表扫描】
scan "test_hbase:test_table"
---------------------------
ROW COLUMN+CELL1 column=base:age, timestamp=2024-03-07T15:07:10.339, value=171 column=base:gender, timestamp=2024-03-07T15:07:14.510, value=\xE5\xA5\xB31 column=base:name, timestamp=2024-03-07T15:07:06.009, value=\xE8\x83\xA1\xE6\xA1\x831 column=sources:English, timestamp=2024-03-07T15:07:17.987, value=861 column=sources:Math, timestamp=2024-03-07T15:07:21.874, value=97
---------------------------# 3.查看表中记录数【行数】
count "test_hbase:test_table"
---------------------------
1 row(s)
Took 0.0194 seconds
=> 1
---------------------------# 4.查看某列值
4.1、查一行get "test_hbase:test_table","1"
---------------------------
COLUMN CELLbase:age timestamp=2024-03-07T15:36:03.061, value=17base:gender timestamp=2024-03-07T15:36:03.115, value=\xE5\xA5\xB3base:name timestamp=2024-03-07T15:36:03.001, value=\xE8\x83\xA1\xE6\xA1\x83sources:English timestamp=2024-03-07T15:36:03.156, value=82sources:Math timestamp=2024-03-07T15:36:03.192, value=90
---------------------------4.2、查一行一个列族get "test_hbase:test_table","1","sources"
---------------------------
COLUMN CELLsources:English timestamp=2024-03-07T15:36:03.156, value=82sources:Math timestamp=2024-03-07T15:36:03.192, value=90
---------------------------4.3、查一行一个列族某个列get "test_hbase:test_table","1","sources:English"
---------------------------
COLUMN CELLsources:English timestamp=2024-03-07T15:36:03.156, value=82
---------------------------# 5.删除数据
5.1、删除【一个单元格】
deleteall | delete "test_hbase:test_table","1","base:name"5.2、删除【整行】
deleteall "test_hbase:test_table","2"5.3、ROEPREFIXFILTEB:支持行键前缀批量删除,CACHE:修改批量的值
deleteall "test_hbase:test_table",{ROEPREFIXFILTEB="时间戳TS|字符串STR",CACHE=>100}5.4、删除表中【所有数据】
disable "test_hbase:test_table"
truncate "test_hbase:test_table"# 6.自增
-- 首次针对不存在的列操作,针对存在的列会报错:Field is not a log,it‘s 10 bytes wide
-- 此后操作可针对【新添列名】进行
6.1、基本语法自增:incr "[命名空间:]表名","行键","列族名:新添列名",增加数N查询:get_counter "[命名空间:]表名","行键","列族名:新添列名"
6.2、案例展示scan "test_hbase:test_table"
---------------------------
ROW COLUMN+CELL1 column=base:age, timestamp=2024-03-07T15:36:03.061, value=171 column=base:gender, timestamp=2024-03-07T15:36:03.115, value=\xE5\xA5\xB31 column=base:name, timestamp=2024-03-07T15:36:03.001, value=\xE8\x83\xA1\xE6\xA1\x831 column=sources:English, timestamp=2024-03-07T15:36:03.156, value=821 column=sources:Math, timestamp=2024-03-07T15:36:03.192, value=90
---------------------------incr "test_hbase:test_table","1","sources:count",2
---------------------------
ROW COLUMN+CELL1 column=base:age, timestamp=2024-03-07T15:36:03.061, value=171 column=base:gender, timestamp=2024-03-07T15:36:03.115, value=\xE5\xA5\xB31 column=base:name, timestamp=2024-03-07T15:36:03.001, value=\xE8\x83\xA1\xE6\xA1\x831 column=sources:English, timestamp=2024-03-07T15:36:03.156, value=821 column=sources:Math, timestamp=2024-03-07T15:36:03.192, value=901 column=sources:count, timestamp=2024-03-11T20:01:16.651, value=\x00\x00\x00\x00\x00\x00\x00\x02
---------------------------# 7.预分区(hbase优化)
7.1、预分区
策略一:【NUMREGIONS:分区数量;SPLITALGO:分裂所采用的算法】create "test_hbase:test_split","t1","t2",{NUMREGIONS=>3,SPLITALGO=>"UniformSplit"}
策略二:【SPLITS:行键取值范围(字母或数字)】###取值范围:0~100,101~200,201~300,301以上create "test_hbase:test_rowkey_split","cf1","cf2",SPLITS=>["100","200","300"]7.2、查看分区
scan "hbase:meta",{STARTROW=>"test_hbase:test_rowkey_split",LIMIT=>10}
---------------------------
#hdfs存储信息
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B .tabledesc
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B .tmp
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 28c38ce5ff401333122c00c05e521ae3
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 4493f765702cc8979678f14cbcff17ff
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 540c8c1f386356cab11f824e74d33fad
#drwxr-xr-x root supergroup 0 B Mar 11 20:31 0 0 B 867157c4f6ab39ba52ac6b3b58e6cbf4
---------------------------
4.TOOLS
## 2个小文件合并为一个大文件
1.compact "[命名空间:]表名"## 所有小的文件合并为一个大文件
2.major_compact "[命名空间:]表名"
方式二:Hive来操作HBase(HBase数据映射至Hive中进行操作)
1.向HBase导入数据
## 基本格式
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator="分隔符" \
-Dimporttsv.columns="HBASE_ROW_KEY,列族:列名..." \
"命名空间:表名" \
文件路径## 案例(在shell命令窗下进行,不在hbase中进行)
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.separator="|" \
-Dimporttsv.columns=HBASE_ROW_KEY,base:name,base:age,sources:English,sources:Math \
test_hbase:test_table \
file:///root/file/hbase_file/students_for_import_2.csv
2.hive 表映射 hbase表(在hive中进行)
# hive中建表并导入数据【hbase数据映射到hive中】
create external table yb12211.student_from_hbase(
stu_id int,
stu_name string,
stu_age int,
score_English int,
score_Math int
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,base:name,base:age,sources:English,sources:Math")
tblproperties("hbase.table.name"="test_hbase:test_table");
方式三:Java来操作HBase——数据迁移
1、应用场景的讲解
Java借助于HBase的API接口来操作HBase。
其核心功能主要是数据迁移。
1.借助于原生的HBase的API接口和Java jdbc的API接口,将传统的关系型数据库(mysql)中的数据导入到HBase中。
2.借助于文件流将普通的文件中的数据导入到HBase中。
2、初步准备工作
2.1:Maven创建
选择quick start,进行Maven创建
2.2:初步配置
一、删除url
二、properties配置
<properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target>
</properties>
三、基本检查,确保版本一致=>都为1.8|8版本
四、依赖(覆盖)
<!-- MySql 驱动 -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.29</version>
</dependency><!-- HBase 驱动 -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.3.5</version>
</dependency><!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>3.1.3</version>
</dependency><!-- zookeeper -->
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.6.3</version>
</dependency><!-- log4j 系统日志 -->
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency><!--json tool-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>2.0.47</version>
</dependency>
3、最终的传参操作(验证操作)
运行配置的设置——传参
步骤一:先点击绿色的小锤子,然后再点击Edit Configurations的选项
步骤二:进行信息的配置