溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶(hù)服務(wù)條款》

在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)

發(fā)布時(shí)間:2021-08-21 10:41:39 來(lái)源:億速云 閱讀:570 作者:chen 欄目:關(guān)系型數(shù)據(jù)庫(kù)

這篇文章主要講解了“在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)”,文中的講解內(nèi)容簡(jiǎn)單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來(lái)研究和學(xué)習(xí)“在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)”吧!

目的:項(xiàng)目組需要對(duì)線上cassandra集群中某張表數(shù)據(jù)進(jìn)行清理,通過(guò)實(shí)驗(yàn)驗(yàn)證truncate是否可行。

1.環(huán)境準(zhǔn)備

阿里云環(huán)境搭建三節(jié)點(diǎn)集群,副本數(shù)為2

172.26.99.152 

172.26.99.153

172.26.99.154 

安裝java jdk:

如果有遺留的舊版本,需要先刪除

(1)、查看系統(tǒng)自帶jdk是否已安裝:

yum list installed |grep java

若有自帶安裝的jdk,如何卸載系統(tǒng)自帶java環(huán)境:

yum -y remove java-1.7.0-openjdk*

yum -y remove tzdata-java.noarch

(2)、查看yum庫(kù)中的java安裝包

yum -y list java*

(3)、使用yum安裝java環(huán)境(這里是安裝的jdk-1.8.0,如果安裝1.7,后面cassandra啟動(dòng)時(shí)會(huì)報(bào)錯(cuò))

yum install java-1.8.0

(4)、查看剛安裝的java版本信息:

java -version

mkdir /CAS

cd /CAS

tar xzvf apache-cassandra-3.11.1-bin.tar.gz

mv apache-cassandra-3.11.1 cassandra

useradd cassandra 

passwd cassandra

chown -R cassandra.cassandra /CAS

chmod 755 -R /CAS/cassandra 

su - cassandra

cd /CAS/cassandra/conf

$vi cassandra.yaml

- seeds: "192.26.99.152"  --這一行由127.0.0.1改為集群中一個(gè)或多個(gè)節(jié)點(diǎn)的IP,不建議所有IP。因?yàn)榉N子節(jié)點(diǎn)損壞時(shí)修復(fù)方法相對(duì)復(fù)雜

listen_address: 192.168.73.104  --這一行改為當(dāng)前IP

rpc_address: 192.168.73.104   --改為當(dāng)前節(jié)點(diǎn)的IP

$ vi cassandra-env.sh

cassandra-env.sh 文件需要修改的參數(shù):

JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=192.168.73.104"  --此行默認(rèn)是注釋的,需要去掉注釋?zhuān)ostname改為當(dāng)前IP                    

配置$JAVA_HOME(java環(huán)境變量 )和$CASSANDRA_HOME(cassandra環(huán)境變量)

一般來(lái)講通過(guò)yum安裝的jdk路徑應(yīng)該在/usr/lib/jvm/下(例如我這里的/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-0.b15.el6_8.x86_64)

(1)、打開(kāi)環(huán)境變量配置文件,添加內(nèi)容:

cat >> /etc/profile <<EOF

#java path

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre

export JRE_HOME=$JAVA_HOME

export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib

export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

#cassandra path

CASSANDRA_HOME=/CAS/cassandra

export CASSANDRA_HOME

EOF

(2)、使配置生效:

source /etc/profile

上面操作參考:https://blog.csdn.net/dengjiexian123/article/details/53033119

不過(guò)java_home的設(shè)置不太一樣

啟動(dòng):

su - cassandra

cd /CAS/cassandra/bin          

./cassandra

如果java版本低或未安裝成功,此處會(huì)報(bào)錯(cuò)Cassandra 3.0 and later require Java 8u40 or later.

如果報(bào)錯(cuò)Unable to find java executable. Check JAVA_HOME and PATH environment variables.重點(diǎn)檢查JAVA_HOME是否正確,方法是執(zhí)行$JAVA_HOME/bin/java -version

[cassandra@node2 bin]$ ./cqlsh --request-timeout=9000 $HOSTNAME

Connected to Test Cluster at node2:9042.

[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help.

cqlsh> desc keyspaces;

system_traces  system_schema  system_auth  system  system_distributed

cqlsh> SELECT * FROM system_schema.keyspaces;

 keyspace_name      | durable_writes | replication

--------------------+----------------+-------------------------------------------------------------------------------------

        system_auth |           True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}

      system_schema |           True |                             {'class': 'org.apache.cassandra.locator.LocalStrategy'}

 system_distributed |           True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}

             system |           True |                             {'class': 'org.apache.cassandra.locator.LocalStrategy'}

      system_traces |           True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}

(5 rows)

cqlsh> 

cqlsh> create keyspace dbrsk WITH replication = {'class':'NetworkTopologyStrategy','datacenter1':2};

cqlsh> SELECT * FROM system_schema.keyspaces;                                                      

 keyspace_name      | durable_writes | replication

--------------------+----------------+---------------------------------------------------------------------------------------

        system_auth |           True |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}

      system_schema |           True |                               {'class': 'org.apache.cassandra.locator.LocalStrategy'}

 system_distributed |           True |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}

             system |           True |                               {'class': 'org.apache.cassandra.locator.LocalStrategy'}

               dbrsk |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '2'}

      system_traces |           True |   {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}

(6 rows)

從源庫(kù)查看待測(cè)試的表結(jié)構(gòu),并導(dǎo)出數(shù)據(jù):

cqlsh:dbrsk> desc t_card_info;

CREATE TABLE dbrsk.t_card_info (

    bankcard text PRIMARY KEY,

    bankname text,

    cardname text,

    cardtype text,

    city text,

    province text,

    updatetime bigint

) WITH bloom_filter_fp_chance = 0.00075

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = '卡信息'

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 0.0

    AND dclocal_read_repair_chance = 0.0

    AND default_time_to_live = 0

    AND gc_grace_seconds = 86400

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';

cqlsh:dbrsk>  copy t_card_info to '/tmp/t_card_info.csv';

Using 16 child processes

Starting copy of dbrsk.t_card_info with columns [bankcard, bankname, cardname, cardtype, city, province, updatetime].

Processed: 2726962 rows; Rate:    5524 rows/s; Avg. rate:   57918 rows/s

2726962 rows exported to 1 files in 47.165 seconds.

當(dāng)前庫(kù)建表并導(dǎo)入數(shù)據(jù):

cqlsh> use dbrsk;

cqlsh:dbrsk> copy t_card_info from '/tmp/t_card_info.csv';

Using 1 child processes

Starting copy of dbrsk.t_card_info with columns [bankcard, bankname, cardname, cardtype, city, province, updatetime].

Processed: 690000 rows; Rate:   10883 rows/s; Avg. rate:   11617 rows/s

Processed: 1410000 rows; Rate:   13012 rows/s; Avg. rate:   11813 rows/s

Processed: 2115000 rows; Rate:   10324 rows/s; Avg. rate:   11783 rows/s

Processed: 2726962 rows; Rate:    5305 rows/s; Avg. rate:   11893 rows/s

2726962 rows imported from 1 files in 3 minutes and 49.299 seconds (0 skipped).

導(dǎo)入數(shù)據(jù)前:

[root@node2 data]# du -sh *

408K    commitlog

1.4M    data

4.0K    hints

4.0K    saved_caches

導(dǎo)入數(shù)據(jù)后:

[root@node2 data]# du -sh *

155M    commitlog

98M     data

4.0K    hints

4.0K    saved_caches

執(zhí)行truncate操作并查看效果:

cqlsh:dbrsk> truncate table t_card_info;

[root@node2 dbrsk]# cd t_card_info-9e129520c31c11eab89c515b68839f7c/

[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# ls

backups  snapshots

[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# du -sh *

4.0K    backups

103M    snapshots

[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# cd snapshots/

[root@node2 snapshots]# ls

truncated-1594434747140-t_card_info

[root@node2 snapshots]# cd truncated-1594434747140-t_card_info/

[root@node2 truncated-1594434747140-t_card_info]# ls

manifest.json                 mc-10-big-Statistics.db       mc-11-big-Filter.db           mc-12-big-Data.db        mc-12-big-TOC.txt            mc-9-big-Statistics.db

mc-10-big-CompressionInfo.db  mc-10-big-Summary.db          mc-11-big-Index.db            mc-12-big-Digest.crc32   mc-9-big-CompressionInfo.db  mc-9-big-Summary.db

mc-10-big-Data.db             mc-10-big-TOC.txt             mc-11-big-Statistics.db       mc-12-big-Filter.db      mc-9-big-Data.db             mc-9-big-TOC.txt

mc-10-big-Digest.crc32        mc-11-big-CompressionInfo.db  mc-11-big-Summary.db          mc-12-big-Index.db       mc-9-big-Digest.crc32        schema.cql

mc-10-big-Filter.db           mc-11-big-Data.db             mc-11-big-TOC.txt             mc-12-big-Statistics.db  mc-9-big-Filter.db

mc-10-big-Index.db            mc-11-big-Digest.crc32        mc-12-big-CompressionInfo.db  mc-12-big-Summary.db     mc-9-big-Index.db

在其他節(jié)點(diǎn)上,空間占用一致:

[cassandra@node3 t_card_info-9e129520c31c11eab89c515b68839f7c]$ du -sh *

4.0K    backups

101M    snapshots

數(shù)據(jù)被移動(dòng)到snapshots文件夾中

執(zhí)行repair命令,snapshots中的數(shù)據(jù)不會(huì)被清理

./nodetool repair dbrsk

嘗試從操作系統(tǒng)中刪除snapshots文件夾。刪除后數(shù)據(jù)庫(kù)可以正常使用。

重新導(dǎo)入數(shù)據(jù),并進(jìn)行表刪除操作:

cqlsh:dbrsk> drop table t_card_info ;

[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# du -sh *

4.0K    backups

103M    snapshots

[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# ls

backups  snapshots

[root@node2 t_card_info-9e129520c31c11eab89c515b68839f7c]# cd snapshots/

[root@node2 snapshots]# ls

dropped-1594435864327-t_card_info

可以看到,drop和truncte表后,數(shù)據(jù)分別會(huì)被放入該表下的snapshots/droppedxxxxxx snapshots/truncatedxxxxxx中。

那么如何恢復(fù)呢?

[cassandra@node2 bin]$ ./sstableloader -d 172.26.99.152 /tmp/dbrsk/t_card_info

WARN  11:04:46,472 Only 31.813GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots

Established connection to initial hosts

Opening sstables and calculating sections to stream

Skipping file mc-21-big-Data.db: table dbrsk.t_card_info doesn't exist

Skipping file mc-22-big-Data.db: table dbrsk.t_card_info doesn't exist

Skipping file mc-23-big-Data.db: table dbrsk.t_card_info doesn't exist

Skipping file mc-24-big-Data.db: table dbrsk.t_card_info doesn't exist

Summary statistics: 

   Connections per host    : 1         

   Total files transferred : 0         

   Total bytes transferred : 0.000KiB  

   Total duration          : 2934 ms   

   Average transfer rate   : 0.000KiB/s

   Peak transfer rate      : 0.000KiB/s

直接使用sstableloader時(shí)如果表不存在,會(huì)報(bào)錯(cuò)。需要手工建表:

[cassandra@node2 bin]$ ./cqlsh --request-timeout=90000 $HOSTNAME

Connected to Test Cluster at node2:9042.

[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help.

cqlsh> use dbrsk;

cqlsh:dbrsk> CREATE TABLE dbrsk.t_card_info (

        ...     bankcard text PRIMARY KEY,

        ...     bankname text,

        ...     cardname text,

        ...     cardtype text,

        ...     city text,

        ...     province text,

        ...     updatetime bigint

        ... ) WITH bloom_filter_fp_chance = 0.00075

        ...     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

        ...     AND comment = '銀行卡信息數(shù)據(jù)'

        ...     AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

        ...     AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

        ...     AND crc_check_chance = 0.0

        ...     AND dclocal_read_repair_chance = 0.0

        ...     AND default_time_to_live = 0

        ...     AND gc_grace_seconds = 86400

        ...     AND max_index_interval = 2048

        ...     AND memtable_flush_period_in_ms = 0

        ...     AND min_index_interval = 128

        ...     AND read_repair_chance = 0.0

        ...     AND speculative_retry = '99PERCENTILE';

cqlsh:dbrsk> exit

[cassandra@node2 bin]$ ./sstableloader -d 172.26.99.152 /tmp/dbrsk/t_card_info

WARN  11:05:57,753 Only 31.813GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots

Established connection to initial hosts

Opening sstables and calculating sections to stream

Streaming relevant part of /tmp/dbrsk/t_card_info/mc-21-big-Data.db /tmp/dbrsk/t_card_info/mc-22-big-Data.db /tmp/dbrsk/t_card_info/mc-23-big-Data.db /tmp/dbrsk/t_card_info/mc-24-big-Data.db to [/172.26.99.154, /172.26.99.152, /172.26.99.153]

progress: [/172.26.99.154]0:0/4 0  % [/172.26.99.152]0:1/4 6  % total: 4% 1.172MiB/s (avg: 1.172MiB/s)

progress: [/172.26.99.154]0:0/4 0  % [/172.26.99.152]0:1/4 6  % [/172.26.99.153]0:0/4 0  % total: 3% 65.484MiB/s (avg: 1.257MiB/s)

progress: [/172.26.99.154]0:0/4 0  % [/172.26.99.152]0:1/4 6  % [/172.26.99.153]0:0/4 0  % total: 3% 2.578MiB/s (avg: 1.260MiB/s)

……

progress: [/172.26.99.154]0:0/4 48 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 24% 245.959MiB/s (avg: 8.540MiB/s)

progress: [/172.26.99.154]0:0/4 49 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 24% 1012.976MiB/s (avg: 8.651MiB/s)

progress: [/172.26.99.154]0:0/4 51 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 25% 1.454GiB/s (avg: 8.803MiB/s)

progress: [/172.26.99.154]0:0/4 54 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 25% 161.665MiB/s (avg: 9.091MiB/s)

progress: [/172.26.99.154]0:0/4 56 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 1.643GiB/s (avg: 9.249MiB/s)

progress: [/172.26.99.154]0:0/4 56 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 134.745MiB/s (avg: 9.254MiB/s)

progress: [/172.26.99.154]0:0/4 58 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 1.702GiB/s (avg: 9.371MiB/s)

progress: [/172.26.99.154]0:0/4 58 % [/172.26.99.152]0:2/4 16 % [/172.26.99.153]0:1/4 13 % total: 26% 23.592MiB/s (avg: 9.406MiB/s)

……

progress: [/172.26.99.154]0:4/4 100% [/172.26.99.152]0:4/4 100% [/172.26.99.153]0:4/4 100% total: 100% 0.000KiB/s (avg: 10.816MiB/s)

Summary statistics: 

   Connections per host    : 1         

   Total files transferred : 8         

   Total bytes transferred : 133.156MiB

   Total duration          : 12314 ms  

   Average transfer rate   : 10.813MiB/s

   Peak transfer rate      : 17.530MiB/s

結(jié)論:truncate操作可行,且出現(xiàn)問(wèn)題可以恢復(fù),不過(guò)恢復(fù)的時(shí)間較長(zhǎng)。

cassandra的truncate table與drop table都不會(huì)釋放空間,而是將其放入snapshot文件夾下。

感謝各位的閱讀,以上就是“在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)”的內(nèi)容了,經(jīng)過(guò)本文的學(xué)習(xí)后,相信大家對(duì)在Cassandra集群中表的數(shù)據(jù)清理與恢復(fù)這一問(wèn)題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是億速云,小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI