您好,登錄后才能下訂單哦!
這篇文章主要講解了“在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)”,文中的講解內(nèi)容簡(jiǎn)單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來(lái)研究和學(xué)習(xí)“在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)”吧!
目的:項(xiàng)目組需要對(duì)線上cassandra集群中某張表數(shù)據(jù)進(jìn)行清理,通過(guò)實(shí)驗(yàn)驗(yàn)證truncate是否可行。
1.環(huán)境準(zhǔn)備
阿里云環(huán)境搭建三節(jié)點(diǎn)集群,副本數(shù)為2
172.26.99.152
172.26.99.153
172.26.99.154
安裝java jdk:
如果有遺留的舊版本,需要先刪除
(1)、查看系統(tǒng)自帶jdk是否已安裝:
yum list installed |grep java
若有自帶安裝的jdk,如何卸載系統(tǒng)自帶java環(huán)境:
yum -y remove java-1.7.0-openjdk*
yum -y remove tzdata-java.noarch
(2)、查看yum庫(kù)中的java安裝包
yum -y list java*
(3)、使用yum安裝java環(huán)境(這里是安裝的jdk-1.8.0,如果安裝1.7,后面cassandra啟動(dòng)時(shí)會(huì)報(bào)錯(cuò))
yum install java-1.8.0
(4)、查看剛安裝的java版本信息:
java -version
mkdir /CAS
cd /CAS
tar xzvf apache-cassandra-3.11.1-bin.tar.gz
mv apache-cassandra-3.11.1 cassandra
useradd cassandra
passwd cassandra
chown -R cassandra.cassandra /CAS
chmod 755 -R /CAS/cassandra
su - cassandra
cd /CAS/cassandra/conf
$vi cassandra.yaml
- seeds: "192.26.99.152" --這一行由127.0.0.1改為集群中一個(gè)或多個(gè)節(jié)點(diǎn)的IP,不建議所有IP。因?yàn)榉N子節(jié)點(diǎn)損壞時(shí)修復(fù)方法相對(duì)復(fù)雜
listen_address: 192.168.73.104 --這一行改為當(dāng)前IP
rpc_address: 192.168.73.104 --改為當(dāng)前節(jié)點(diǎn)的IP
$ vi cassandra-env.sh
cassandra-env.sh 文件需要修改的參數(shù):
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=192.168.73.104" --此行默認(rèn)是注釋的,需要去掉注釋?zhuān)ostname改為當(dāng)前IP
配置$JAVA_HOME(java環(huán)境變量 )和$CASSANDRA_HOME(cassandra環(huán)境變量)
一般來(lái)講通過(guò)yum安裝的jdk路徑應(yīng)該在/usr/lib/jvm/下(例如我這里的/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64)
(1)、打開(kāi)環(huán)境變量配置文件,添加內(nèi)容:
cat >> /etc/profile <<EOF
#java path
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre
export JRE_HOME=$JAVA_HOME
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
#cassandra path
CASSANDRA_HOME=/CAS/cassandra
export CASSANDRA_HOME
EOF
(2)、使配置生效:
source /etc/profile
上面操作參考:https://blog.csdn.net/dengjiexian123/article/details/53033119
不過(guò)java_home的設(shè)置不太一樣
啟動(dòng):
su - cassandra
cd /CAS/cassandra/bin
./cassandra
如果java版本低或未安裝成功,此處會(huì)報(bào)錯(cuò)Cassandra 3.0 and later require Java 8u40 or later.
如果報(bào)錯(cuò)Unable to find java executable. Check JAVA_HOME and PATH environment variables.重點(diǎn)檢查JAVA_HOME是否正確,方法是執(zhí)行$JAVA_HOME/bin/java -version
[cassandra@node2 bin]$ ./cqlsh --request-timeout=9000 $HOSTNAME
Connected to Test Cluster at node2:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> desc keyspaces;
system_traces system_schema system_auth system system_distributed
cqlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
--------------------+----------------+-------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
(5 rows)
cqlsh>
cqlsh> create keyspace dbrsk WITH replication = {'class':'NetworkTopologyStrategy','datacenter1':2};
cqlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
--------------------+----------------+---------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
dbrsk | True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '2'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
(6 rows)
從源庫(kù)查看待測(cè)試的表結(jié)構(gòu),并導(dǎo)出數(shù)據(jù):
cqlsh:dbrsk> desc t_card_info;
CREATE TABLE dbrsk.t_card_info (
bankcard text PRIMARY KEY,
bankname text,
cardname text,
cardtype text,
city text,
province text,
updatetime bigint
) WITH bloom_filter_fp_chance = 0.00075
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = '卡信息'
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 0.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
cqlsh:dbrsk> copy t_card_info to '/tmp/t_card_info.csv';
Using 16 child processes
Starting copy of dbrsk.t_card_info with columns [bankcard, bankname, cardname, cardtype, city, province, updatetime].
Processed: 2726962 rows; Rate: 5524 rows/s; Avg. rate: 57918 rows/s
2726962 rows exported to 1 files in 47.165 seconds.
當(dāng)前庫(kù)建表并導(dǎo)入數(shù)據(jù):
cqlsh> use dbrsk;
cqlsh:dbrsk> copy t_card_info from '/tmp/t_card_info.csv';
Using 1 child processes
Starting copy of dbrsk.t_card_info with columns [bankcard, bankname, cardname, cardtype, city, province, updatetime].
Processed: 690000 rows; Rate: 10883 rows/s; Avg. rate: 11617 rows/s
Processed: 1410000 rows; Rate: 13012 rows/s; Avg. rate: 11813 rows/s
Processed: 2115000 rows; Rate: 10324 rows/s; Avg. rate: 11783 rows/s
Processed: 2726962 rows; Rate: 5305 rows/s; Avg. rate: 11893 rows/s
2726962 rows imported from 1 files in 3 minutes and 49.299 seconds (0 skipped).
導(dǎo)入數(shù)據(jù)前:
[root@node2 data]# du -sh *
408K commitlog
1.4M data
4.0K hints
4.0K saved_caches
導(dǎo)入數(shù)據(jù)后:
[root@node2 data]# du -sh *
155M commitlog
98M data
4.0K hints
4.0K saved_caches
執(zhí)行truncate操作并查看效果:
cqlsh:dbrsk> truncate table t_card_info;
[root@node2 dbrsk]# cd t_card_info-9e129520c31c11eab89c515b68839f7c/
[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# ls
backups snapshots
[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# du -sh *
4.0K backups
103M snapshots
[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# cd snapshots/
[root@node2 snapshots]# ls
truncated-1594434747140-t_card_info
[root@node2 snapshots]# cd truncated-1594434747140-t_card_info/
[root@node2 truncated-1594434747140-t_card_info]# ls
manifest.json mc-10-big-Statistics.db mc-11-big-Filter.db mc-12-big-Data.db mc-12-big-TOC.txt mc-9-big-Statistics.db
mc-10-big-CompressionInfo.db mc-10-big-Summary.db mc-11-big-Index.db mc-12-big-Digest.crc32 mc-9-big-CompressionInfo.db mc-9-big-Summary.db
mc-10-big-Data.db mc-10-big-TOC.txt mc-11-big-Statistics.db mc-12-big-Filter.db mc-9-big-Data.db mc-9-big-TOC.txt
mc-10-big-Digest.crc32 mc-11-big-CompressionInfo.db mc-11-big-Summary.db mc-12-big-Index.db mc-9-big-Digest.crc32 schema.cql
mc-10-big-Filter.db mc-11-big-Data.db mc-11-big-TOC.txt mc-12-big-Statistics.db mc-9-big-Filter.db
mc-10-big-Index.db mc-11-big-Digest.crc32 mc-12-big-CompressionInfo.db mc-12-big-Summary.db mc-9-big-Index.db
在其他節(jié)點(diǎn)上,空間占用一致:
[cassandra@node3 t_card_info-9e129520c31c11eab89c515b68839f7c]$ du -sh *
4.0K backups
101M snapshots
數(shù)據(jù)被移動(dòng)到snapshots文件夾中
執(zhí)行repair命令,snapshots中的數(shù)據(jù)不會(huì)被清理
./nodetool repair dbrsk
嘗試從操作系統(tǒng)中刪除snapshots文件夾。刪除后數(shù)據(jù)庫(kù)可以正常使用。
重新導(dǎo)入數(shù)據(jù),并進(jìn)行表刪除操作:
cqlsh:dbrsk> drop table t_card_info ;
[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# du -sh *
4.0K backups
103M snapshots
[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# ls
backups snapshots
[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# cd snapshots/
[root@node2 snapshots]# ls
dropped-1594435864327-t_card_info
可以看到,drop和truncte表后,數(shù)據(jù)分別會(huì)被放入該表下的snapshots/droppedxxxxxx snapshots/truncatedxxxxxx中。
那么如何恢復(fù)呢?
[cassandra@node2 bin]$ ./sstableloader -d 172.26.99.152 /tmp/dbrsk/t_card_info
WARN 11:04:46,472 Only 31.813GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Skipping file mc-21-big-Data.db: table dbrsk.t_card_info doesn't exist
Skipping file mc-22-big-Data.db: table dbrsk.t_card_info doesn't exist
Skipping file mc-23-big-Data.db: table dbrsk.t_card_info doesn't exist
Skipping file mc-24-big-Data.db: table dbrsk.t_card_info doesn't exist
Summary statistics:
Connections per host : 1
Total files transferred : 0
Total bytes transferred : 0.000KiB
Total duration : 2934 ms
Average transfer rate : 0.000KiB/s
Peak transfer rate : 0.000KiB/s
直接使用sstableloader時(shí)如果表不存在,會(huì)報(bào)錯(cuò)。需要手工建表:
[cassandra@node2 bin]$ ./cqlsh --request-timeout=90000 $HOSTNAME
Connected to Test Cluster at node2:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> use dbrsk;
cqlsh:dbrsk> CREATE TABLE dbrsk.t_card_info (
... bankcard text PRIMARY KEY,
... bankname text,
... cardname text,
... cardtype text,
... city text,
... province text,
... updatetime bigint
... ) WITH bloom_filter_fp_chance = 0.00075
... AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
... AND comment = '銀行卡信息數(shù)據(jù)'
... AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
... AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
... AND crc_check_chance = 0.0
... AND dclocal_read_repair_chance = 0.0
... AND default_time_to_live = 0
... AND gc_grace_seconds = 86400
... AND max_index_interval = 2048
... AND memtable_flush_period_in_ms = 0
... AND min_index_interval = 128
... AND read_repair_chance = 0.0
... AND speculative_retry = '99PERCENTILE';
cqlsh:dbrsk> exit
[cassandra@node2 bin]$ ./sstableloader -d 172.26.99.152 /tmp/dbrsk/t_card_info
WARN 11:05:57,753 Only 31.813GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /tmp/dbrsk/t_card_info/mc-21-big-Data.db /tmp/dbrsk/t_card_info/mc-22-big-Data.db /tmp/dbrsk/t_card_info/mc-23-big-Data.db /tmp/dbrsk/t_card_info/mc-24-big-Data.db to [/172.26.99.154, /172.26.99.152, /172.26.99.153]
progress: [/172.26.99.154]0:0/4 0 % [/172.26.99.152]0:1/4 6 % total: 4% 1.172MiB/s (avg: 1.172MiB/s)
progress: [/172.26.99.154]0:0/4 0 % [/172.26.99.152]0:1/4 6 % [/172.26.99.153]0:0/4 0 % total: 3% 65.484MiB/s (avg: 1.257MiB/s)
progress: [/172.26.99.154]0:0/4 0 % [/172.26.99.152]0:1/4 6 % [/172.26.99.153]0:0/4 0 % total: 3% 2.578MiB/s (avg: 1.260MiB/s)
……
progress: [/172.26.99.154]0:0/4 48 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 24% 245.959MiB/s (avg: 8.540MiB/s)
progress: [/172.26.99.154]0:0/4 49 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 24% 1012.976MiB/s (avg: 8.651MiB/s)
progress: [/172.26.99.154]0:0/4 51 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 25% 1.454GiB/s (avg: 8.803MiB/s)
progress: [/172.26.99.154]0:0/4 54 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 25% 161.665MiB/s (avg: 9.091MiB/s)
progress: [/172.26.99.154]0:0/4 56 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 1.643GiB/s (avg: 9.249MiB/s)
progress: [/172.26.99.154]0:0/4 56 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 134.745MiB/s (avg: 9.254MiB/s)
progress: [/172.26.99.154]0:0/4 58 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 1.702GiB/s (avg: 9.371MiB/s)
progress: [/172.26.99.154]0:0/4 58 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 23.592MiB/s (avg: 9.406MiB/s)
……
progress: [/172.26.99.154]0:4/4 100% [/172.26.99.152]0:4/4 100% [/172.26.99.153]0:4/4 100% total: 100% 0.000KiB/s (avg: 10.816MiB/s)
Summary statistics:
Connections per host : 1
Total files transferred : 8
Total bytes transferred : 133.156MiB
Total duration : 12314 ms
Average transfer rate : 10.813MiB/s
Peak transfer rate : 17.530MiB/s
結(jié)論:truncate操作可行,且出現(xiàn)問(wèn)題可以恢復(fù),不過(guò)恢復(fù)的時(shí)間較長(zhǎng)。
cassandra的truncate table與drop table都不會(huì)釋放空間,而是將其放入snapshot文件夾下。
感謝各位的閱讀,以上就是“在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)”的內(nèi)容了,經(jīng)過(guò)本文的學(xué)習(xí)后,相信大家對(duì)在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)這一問(wèn)題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是億速云,小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。