您好,登錄后才能下訂單哦!
本篇內(nèi)容介紹了“PostgreSQL中Old Master節(jié)點(diǎn)分析”的有關(guān)知識(shí),在實(shí)際案例的操作過(guò)程中,不少人都會(huì)遇到這樣的困境,接下來(lái)就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!
基于streaming replication搭建的PostgreSQL HA環(huán)境,如出現(xiàn)網(wǎng)絡(luò)訪問(wèn)/硬件故障等原因?qū)е耂tandby節(jié)點(diǎn)升級(jí)為Master節(jié)點(diǎn),但Old Master節(jié)點(diǎn)數(shù)據(jù)庫(kù)并未損壞,在排除故障后Old Master節(jié)點(diǎn)可以通過(guò)pg_rewind工具而不需要通過(guò)備份的方式成為New Master節(jié)點(diǎn)的Standby節(jié)點(diǎn).
在執(zhí)行命令pg_rewind時(shí),到底做了什么?
在PostgreSQL HA環(huán)境中,Standby節(jié)點(diǎn)升級(jí)為Master節(jié)點(diǎn)后,時(shí)間線會(huì)切換為新的時(shí)間線,比如從1變?yōu)?.而Old Master節(jié)點(diǎn)的時(shí)間線仍然為原來(lái)的時(shí)間線,比如仍為1,那么使用pg_rewind工具,Old Master節(jié)點(diǎn)如何從New Master節(jié)點(diǎn)讀取相關(guān)的數(shù)據(jù)成為新的Standby節(jié)點(diǎn)?
簡(jiǎn)單來(lái)說(shuō),有以下幾步:
1.確定New Master和Old Master數(shù)據(jù)一致性的Checkpoint位置.在該位置上,New Master和Old Master數(shù)據(jù)完全一致.這可以通過(guò)讀取新Old Master節(jié)點(diǎn)時(shí)間線歷史文件可以獲得,該文件位于$PGDATA/pg_wal/目錄下,文件名稱(chēng)為XX.history
2.Old Master節(jié)點(diǎn)根據(jù)上一步獲取的Checkpoint讀取本機(jī)日志文件WAL Record,獲取在此Checkpoint之后出現(xiàn)變化的Block,并以鏈表的方式存儲(chǔ)Block編號(hào)等信息
3.根據(jù)第2步獲取的Block信息從New Master節(jié)點(diǎn)拷貝相應(yīng)的Block,替換Old Master節(jié)點(diǎn)相應(yīng)的Block
4.拷貝New Master節(jié)點(diǎn)上除數(shù)據(jù)文件外的所有其他文件,包括配置文件等(如果拷貝數(shù)據(jù)文件,與備份方式搭建區(qū)別不大)
5.Old Master啟動(dòng)數(shù)據(jù)庫(kù),應(yīng)用從Checkpoint開(kāi)始后的WAL Record.
在執(zhí)行主備切換后,New Master節(jié)點(diǎn)的時(shí)間線切換為n + 1,通過(guò)pg_rewind可使Old Master在分叉點(diǎn)開(kāi)始與New Master同步,成為New Standby節(jié)點(diǎn).
XLogRecPtr
64bit的WAL Record尋址空間地址.
/* * Pointer to a location in the XLOG. These pointers are 64 bits wide, * because we don't want them ever to overflow. * 指向XLOG中的位置. * 這些指針大小為64bit,以確保指針不會(huì)溢出. */ typedef uint64 XLogRecPtr;
TimeLineID
時(shí)間線ID
typedef uint32 TimeLineID;
pg_rewind的源碼較為簡(jiǎn)單,詳細(xì)請(qǐng)參考注釋.
int main(int argc, char **argv) { static struct option long_options[] = { {"help", no_argument, NULL, '?'}, {"target-pgdata", required_argument, NULL, 'D'}, {"source-pgdata", required_argument, NULL, 1}, {"source-server", required_argument, NULL, 2}, {"version", no_argument, NULL, 'V'}, {"dry-run", no_argument, NULL, 'n'}, {"no-sync", no_argument, NULL, 'N'}, {"progress", no_argument, NULL, 'P'}, {"debug", no_argument, NULL, 3}, {NULL, 0, NULL, 0} };//命令選項(xiàng) int option_index;//選項(xiàng)編號(hào) int c;//字符ASCII碼 XLogRecPtr divergerec;//分支點(diǎn) int lastcommontliIndex; XLogRecPtr chkptrec;//checkpoint Record位置 TimeLineID chkpttli;//時(shí)間線 XLogRecPtr chkptredo;checkpoint REDO位置 size_t size; char *buffer;//緩沖區(qū) bool rewind_needed;//是否需要rewind XLogRecPtr endrec;//結(jié)束點(diǎn) TimeLineID endtli;//結(jié)束時(shí)間線 ControlFileData ControlFile_new;//新的控制文件 set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_rewind")); progname = get_progname(argv[0]); /* Process command-line arguments */ //處理命令行參數(shù) if (argc > 1) { if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0) { usage(progname); exit(0); } if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0) { puts("pg_rewind (PostgreSQL) " PG_VERSION); exit(0); } } while ((c = getopt_long(argc, argv, "D:nNP", long_options, &option_index)) != -1) { switch (c) { case '?': fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); case 'P': showprogress = true; break; case 'n': dry_run = true; break; case 'N': do_sync = false; break; case 3: debug = true; break; case 'D': /* -D or --target-pgdata */ datadir_target = pg_strdup(optarg); break; case 1: /* --source-pgdata */ datadir_source = pg_strdup(optarg); break; case 2: /* --source-server */ connstr_source = pg_strdup(optarg); break; } } if (datadir_source == NULL && connstr_source == NULL) { fprintf(stderr, _("%s: no source specified (--source-pgdata or --source-server)\n"), progname); fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); } if (datadir_source != NULL && connstr_source != NULL) { fprintf(stderr, _("%s: only one of --source-pgdata or --source-server can be specified\n"), progname); fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); } if (datadir_target == NULL) { fprintf(stderr, _("%s: no target data directory specified (--target-pgdata)\n"), progname); fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); } if (optind < argc) { fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"), progname, argv[optind]); fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); } /* * Don't allow pg_rewind to be run as root, to avoid overwriting the * ownership of files in the data directory. We need only check for root * -- any other user won't have sufficient permissions to modify files in * the data directory. * 不需要以root用戶(hù)運(yùn)行pg_rewind,避免覆蓋數(shù)據(jù)目錄中的文件owner. * 只需要檢查root用戶(hù),其他用戶(hù)沒(méi)有足夠的權(quán)限更新數(shù)據(jù)目錄中的文件. */ #ifndef WIN32 if (geteuid() == 0) { //root用戶(hù) fprintf(stderr, _("cannot be executed by \"root\"\n")); fprintf(stderr, _("You must run %s as the PostgreSQL superuser.\n"), progname); exit(1); } #endif get_restricted_token(progname); /* Set mask based on PGDATA permissions */ //根據(jù)PGDATA的權(quán)限設(shè)置權(quán)限mask if (!GetDataDirectoryCreatePerm(datadir_target)) { fprintf(stderr, _("%s: could not read permissions of directory \"%s\": %s\n"), progname, datadir_target, strerror(errno)); exit(1); } umask(pg_mode_mask); /* Connect to remote server */ //連接到遠(yuǎn)程服務(wù)器 if (connstr_source) libpqConnect(connstr_source); /* * Ok, we have all the options and we're ready to start. Read in all the * information we need from both clusters. * 現(xiàn)在,我們有了相關(guān)的執(zhí)行運(yùn)行,準(zhǔn)備開(kāi)始運(yùn)行. * 從兩個(gè)db clusters中讀取所有需要的信息. */ //讀取目標(biāo)控制文件 buffer = slurpFile(datadir_target, "global/pg_control", &size); digestControlFile(&ControlFile_target, buffer, size); pg_free(buffer); //讀取源控制文件 buffer = fetchFile("global/pg_control", &size); digestControlFile(&ControlFile_source, buffer, size); pg_free(buffer); sanityChecks(); /* * If both clusters are already on the same timeline, there's nothing to * do. * 如果兩個(gè)clusters已經(jīng)是同一個(gè)時(shí)間線,沒(méi)有什么好做的了,報(bào)錯(cuò). */ if (ControlFile_target.checkPointCopy.ThisTimeLineID == ControlFile_source.checkPointCopy.ThisTimeLineID) { printf(_("source and target cluster are on the same timeline\n")); rewind_needed = false; } else { //找到分叉點(diǎn) findCommonAncestorTimeline(&divergerec, &lastcommontliIndex); printf(_("servers diverged at WAL location %X/%X on timeline %u\n"), (uint32) (divergerec >> 32), (uint32) divergerec, targetHistory[lastcommontliIndex].tli); /* * Check for the possibility that the target is in fact a direct * ancestor of the source. In that case, there is no divergent history * in the target that needs rewinding. * 檢查目標(biāo)是源的直接祖先的可能性. * 在這種情況下,在需要調(diào)整的目標(biāo)中就沒(méi)有不同的歷史. */ if (ControlFile_target.checkPoint >= divergerec) { //如果目標(biāo)的checkpoint > 分叉點(diǎn),則需要rewind rewind_needed = true; } else { //目標(biāo)的checkpoint <= 分叉點(diǎn) XLogRecPtr chkptendrec; /* Read the checkpoint record on the target to see where it ends. */ //讀取目標(biāo)的checkpoint記錄,檢查在哪結(jié)束? chkptendrec = readOneRecord(datadir_target, ControlFile_target.checkPoint, targetNentries - 1); /* * If the histories diverged exactly at the end of the shutdown * checkpoint record on the target, there are no WAL records in * the target that don't belong in the source's history, and no * rewind is needed. * 如果正好在shutdown checkpoint Record處出現(xiàn)分叉, * 那么在目標(biāo)cluster中沒(méi)有WAL Record屬于源cluster歷史, * 不需要進(jìn)行rewind操作,否則需要rewind. */ if (chkptendrec == divergerec) rewind_needed = false; else rewind_needed = true; } } if (!rewind_needed) { //不需要rewind,退出 printf(_("no rewind required\n")); exit(0); } //找到目標(biāo)cluster最后的checkpoint點(diǎn) findLastCheckpoint(datadir_target, divergerec, lastcommontliIndex, &chkptrec, &chkpttli, &chkptredo); printf(_("rewinding from last common checkpoint at %X/%X on timeline %u\n"), (uint32) (chkptrec >> 32), (uint32) chkptrec, chkpttli); /* * Build the filemap, by comparing the source and target data directories. * 通過(guò)對(duì)比源和目標(biāo)數(shù)據(jù)目錄構(gòu)建filemap */ //創(chuàng)建filemap filemap_create(); pg_log(PG_PROGRESS, "reading source file list\n"); fetchSourceFileList(); pg_log(PG_PROGRESS, "reading target file list\n"); traverse_datadir(datadir_target, &process_target_file); /* * Read the target WAL from last checkpoint before the point of fork, to * extract all the pages that were modified on the target cluster after * the fork. We can stop reading after reaching the final shutdown record. * XXX: If we supported rewinding a server that was not shut down cleanly, * we would need to replay until the end of WAL here. * 從在分叉點(diǎn)之前的最后一個(gè)checkpoint開(kāi)始讀取目標(biāo)WAL Record, * 提取目標(biāo)cluster上在分叉后所有被修改的pages. * 在到達(dá)最后一個(gè)shutdown record時(shí)停止讀取. * XXX: 如果我們支持非正常關(guān)閉的數(shù)據(jù)庫(kù)rewind,需要在這里重放WAL Record到WAL的末尾. */ //構(gòu)造filemap pg_log(PG_PROGRESS, "reading WAL in target\n"); extractPageMap(datadir_target, chkptrec, lastcommontliIndex, ControlFile_target.checkPoint); filemap_finalize(); if (showprogress) calculate_totals(); /* this is too verbose even for verbose mode */ //如為debug模式,則打印filemap if (debug) print_filemap(); /* * Ok, we're ready to start copying things over. * 現(xiàn)在可以開(kāi)始拷貝了. */ if (showprogress) { pg_log(PG_PROGRESS, "need to copy %lu MB (total source directory size is %lu MB)\n", (unsigned long) (filemap->fetch_size / (1024 * 1024)), (unsigned long) (filemap->total_size / (1024 * 1024))); fetch_size = filemap->fetch_size; fetch_done = 0; } /* * This is the point of no return. Once we start copying things, we have * modified the target directory and there is no turning back! * 到了這里,已無(wú)回頭路可走了. * 一旦開(kāi)始拷貝,就必須更新目標(biāo)路徑,無(wú)法回頭! */ // executeFileMap(); progress_report(true); //創(chuàng)建backup_label文件并更新控制文件 pg_log(PG_PROGRESS, "\ncreating backup label and updating control file\n"); createBackupLabel(chkptredo, chkpttli, chkptrec); /* * Update control file of target. Make it ready to perform archive * recovery when restarting. * 更新目標(biāo)控制文件.在重啟時(shí)可執(zhí)行歸檔恢復(fù). * * minRecoveryPoint is set to the current WAL insert location in the * source server. Like in an online backup, it's important that we recover * all the WAL that was generated while we copied the files over. * minRecoveryPoint設(shè)置為目標(biāo)服務(wù)器上當(dāng)前WAL插入的位置. * 與在線backup類(lèi)似,在拷貝和覆蓋文件時(shí)根據(jù)所有生成的WAL日志進(jìn)行恢復(fù)是很重要的. */ //更新控制文件 memcpy(&ControlFile_new, &ControlFile_source, sizeof(ControlFileData)); if (connstr_source) { //獲取源WAL插入的位置 endrec = libpqGetCurrentXlogInsertLocation(); //獲取時(shí)間線 endtli = ControlFile_source.checkPointCopy.ThisTimeLineID; } else { endrec = ControlFile_source.checkPoint; endtli = ControlFile_source.checkPointCopy.ThisTimeLineID; } //更新控制文件 ControlFile_new.minRecoveryPoint = endrec; ControlFile_new.minRecoveryPointTLI = endtli; ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; update_controlfile(datadir_target, progname, &ControlFile_new, do_sync); pg_log(PG_PROGRESS, "syncing target data directory\n"); //同步數(shù)據(jù)目錄(除數(shù)據(jù)文件之外) syncTargetDirectory(); printf(_("Done!\n")); return 0; }
“PostgreSQL中Old Master節(jié)點(diǎn)分析”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注億速云網(wǎng)站,小編將為大家輸出更多高質(zhì)量的實(shí)用文章!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。