记录一次使用datax一次性导入多张表的经验

2025/7/11 9:48:12 来源：https://blog.csdn.net/qq_37182070/article/details/146974332 浏览: 次关键词：记录一次使用datax一次性导入多张表的经验

一直以来，我都在使用DataX进行表数据迁移，体验非常不错。然而，今天研发团队提供了大量需要迁移的表，如果继续使用DataX的JSON配置文件逐个导入，效率会非常低。为了提高效率，我决定编写一个脚本，实现批量导入功能，并立即着手开始开发。

一、编写json文件

#编写json文件的模板
[worker@cs-nll sync_data]$ vim template.json {                                                                                                                 "job": {"setting": {"speed": {"channel": 5}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "{{src_username}}","password": "{{src_password}}","column": ["*"],"connection": [{"jdbcUrl": ["jdbc:mysql://{{src_host}}:{{src_port}}/{{src_db}}"],"table": ["{{src_table}}"]}],"where": "{{where}}"}},"writer": {"name": "mysqlwriter","parameter": {"batchSize":"2048","batchByteSize":"33554432","username": "{{dest_username}}","password": "{{dest_password}}","column": ["*"],"writeMode": "insert","connection": [{"jdbcUrl": "jdbc:mysql://{{dest_host}}:{{dest_port}}/{{dest_db}}","table": ["{{dest_table}}"]}]}}}]}
}

二、编写shell脚本

[worker@cs-nll sync_data]$ cat sync_data.sh 
#!/bin/bashset -e
#源数据库信息
#src_jdbcUrl="jdbc:mysql://172.18.45.28:53306/cqshzl"
src_host="172.18.45.28"
src_port="53306"
src_db="cqshzl"
src_username="root"
src_password="LKfAtBW\&Oq0Y^H%M"
#目标数据库信息
#dest_jdbcUrl="jdbc:mysql://172.18.45.28:53306/cqshzl"
dest_host="172.18.45.28"
dest_port="53306"
dest_db="cqshzl"
dest_username="root"
dest_password="LKfAtBW\&Oq0Y^H%M"
#TABLES=("user" "order" "product" "log")
#条件
year="2025"
where="YEAR(follow_end_time) ='$year'"
# 待迁移表列表（格式：源表名:目标表名）
tables=("test:test$year""personnel_manage_follow:personnel_manage_follow_$year")# 遍历表名，生成配置文件
for table_pair in "${tables[@]}"; do# 分割源表名和目标表名IFS=':' read -ra TABLE <<< "$table_pair"src_table="${TABLE[0]}"dest_table="${TABLE[1]}"# 替换模板中的占位符sed  -e "s/{{src_table}}/$src_table/g"  \-e "s/{{src_username}}/$src_username/g" \-e "s/{{src_password}}/$src_password/g" \-e "s/{{src_host}}/$src_host/g" \-e "s/{{src_port}}/$src_port/g" \-e "s/{{src_db}}/$src_db/g" \-e "s/{{dest_table}}/$dest_table/g" \-e "s/{{dest_username}}/$dest_username/g" \-e "s/{{dest_password}}/$dest_password/g" \-e "s/{{dest_host}}/$dest_host/g" \-e "s/{{dest_port}}/$dest_port/g" \-e "s/{{dest_db}}/$dest_db/g" \-e "s/{{where}}/$where/g"  \template.json > "job_${dest_table}.json"# 执行 DataX 任务echo "正在迁移表: $src_table => $dest_table ..."python /data/software/datax/bin/datax.py "job_${dest_table}.json" > "${dest_table}.log" 2>&1 &# 检查执行状态if [ $? -eq 0 ]; thenecho "脚本执行成功，请查看日志${dest_table}.log"elseecho "脚本执行失败，请查看日志${dest_table}.log"fi
doneecho "所有表同步任务已启动，请查看日志"#执行脚本
[worker@cs-nll sync_data]$ chmod +x sync_data.sh 
[worker@cs-nll sync_data]$ ./sync_data.sh

至此，完美解决了 datax 批量同步表功能(灵活控制表的个数)

记录一次使用datax一次性导入多张表的经验

一、编写json文件

二、编写shell脚本

相关资讯

热文排行

最新新闻

推荐新闻

热搜词