溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶(hù)服務(wù)條款》

MongoDB導(dǎo)出場(chǎng)景查詢(xún)優(yōu)化 #1

發(fā)布時(shí)間:2020-06-18 08:15:17 來(lái)源:網(wǎng)絡(luò) 閱讀:579 作者:acoder2013 欄目:MongoDB數(shù)據(jù)庫(kù)

原始鏈接:https://github.com/aCoder2013/blog/issues/1 轉(zhuǎn)載請(qǐng)注明出處

引言

前段時(shí)間遇到一個(gè)類(lèi)似導(dǎo)出數(shù)據(jù)場(chǎng)景,觀(guān)察下來(lái)發(fā)現(xiàn)速度會(huì)越來(lái)越慢,導(dǎo)出100萬(wàn)數(shù)據(jù)需要耗費(fèi)40-60分鐘,從日志觀(guān)察發(fā)現(xiàn),耗時(shí)也是越來(lái)越高。

原因

從代碼邏輯上看,這里采取了分批次導(dǎo)出的方式,類(lèi)似前端的分頁(yè),具體是通過(guò)skip+limit的方式實(shí)現(xiàn)的,那么采用這種方式會(huì)有什么問(wèn)題呢?我們google一下這兩個(gè)接口的文檔:

The?cursor.skip()?method is often expensive because it requires the server to walk from the 
beginning of the collection or index to get the offset or skip position before beginning to return 
results. As the offset (e.g.?pageNumber?above) increases,?cursor.skip()?will become slower and 
more CPU intensive. With larger collections,?cursor.skip()?may become IO bound.

簡(jiǎn)單來(lái)說(shuō),隨著頁(yè)數(shù)的增長(zhǎng),skip()會(huì)變得越來(lái)越慢,但是具體就我們這里導(dǎo)出的場(chǎng)景來(lái)說(shuō),按理說(shuō)應(yīng)該沒(méi)必要每次都去重復(fù)計(jì)算,做一些無(wú)用功,我的理解應(yīng)該可以拿到一個(gè)指針,慢慢遍歷,簡(jiǎn)單google之后,我們發(fā)現(xiàn)果然是可以這樣做的。

我們可以在持久層新增一個(gè)方法,返回一個(gè)cursor專(zhuān)門(mén)供上層去遍歷數(shù)據(jù),這樣就不用再去遍歷已經(jīng)導(dǎo)出過(guò)的結(jié)果集,從O(N2)優(yōu)化到了O(N),這里還可以指定一個(gè)batchSize,設(shè)置一次從MongoDB中抓取的數(shù)據(jù)量(元素個(gè)數(shù)),注意這里最大是4M.

/**
     * <p>Limits the number of elements returned in one batch. A cursor 
     * typically fetches a batch of result objects and store them
     * locally.</p>
     *
     * <p>If {@code batchSize} is positive, it represents the size of each batch of objects retrieved. It can be adjusted to optimize
     * performance and limit data transfer.</p>
     *
     * <p>If {@code batchSize} is negative, it will limit of number objects returned, that fit within the max batch size limit (usually
     * 4MB), and cursor will be closed. For example if {@code batchSize} is -10, then the server will return a maximum of 10 documents and
     * as many as can fit in 4MB, then close the cursor. Note that this feature is different from limit() in that documents must fit within
     * a maximum size, and it removes the need to send a request to close the cursor server-side.</p>
*/

比如說(shuō)我這里配置的8000,那么mongo客戶(hù)端就會(huì)去默認(rèn)抓取這么多的數(shù)據(jù)量:

MongoDB導(dǎo)出場(chǎng)景查詢(xún)優(yōu)化 #1

經(jīng)過(guò)本地簡(jiǎn)單的測(cè)試,我們發(fā)現(xiàn)性能已經(jīng)有了飛躍的提升,導(dǎo)出30萬(wàn)數(shù)據(jù),采用之前的方式,翻頁(yè)到后面平均要500ms,總耗時(shí)60039ms。而優(yōu)化后的方式,平均耗時(shí)在100ms-200ms之間,總耗時(shí)16667ms(中間包括業(yè)務(wù)邏輯的耗時(shí))。

使用

DBCursor cursor = collection.find(query).batchSize(8000);
while (dbCursor.hasNext()) {
  DBObject nextItem = dbCursor.next();
  //業(yè)務(wù)代碼
  ... 
  //
}

那么我們?cè)倏纯磆asNext內(nèi)部的邏輯好嗎?好的.

    @Override
    public boolean hasNext() {
        if (closed) {
            throw new IllegalStateException("Cursor has been closed");
        }

        if (nextBatch != null) {
            return true;
        }

        if (limitReached()) {
            return false;
        }

        while (serverCursor != null) {
            //這里會(huì)向mongo發(fā)送一條指令去抓取數(shù)據(jù)
            getMore();
            if (nextBatch != null) {
                return true;
            }
        }

        return false;
    }

    private void getMore() {
        Connection connection = connectionSource.getConnection();
        try {
            if(serverIsAtLeastVersionThreeDotTwo(connection.getDescription()){
                try {
//可以看到這里其實(shí)是調(diào)用了`nextBatch`指令        
initFromCommandResult(connection.command(namespace.getDatabaseName(),
                                                             asGetMoreCommandDocument(),
                                                             false,
                                                             new NoOpFieldNameValidator(),
                                                             CommandResultDocumentCodec.create(decoder, "nextBatch")));
                } catch (MongoCommandException e) {
                    throw translateCommandException(e, serverCursor);
                }
            } else {
                initFromQueryResult(connection.getMore(namespace, serverCursor.getId(),
                                                       getNumberToReturn(limit, batchSize, count),
                                                       decoder));
            }
            if (limitReached()) {
                killCursor(connection);
            }
        } finally {
            connection.release();
        }
    }

最后initFromCommandResult 拿到結(jié)果并解析成Bson對(duì)象

總結(jié)

我們平常寫(xiě)代碼的時(shí)候,最好都能夠針對(duì)每個(gè)方法、接口甚至是更細(xì)的粒度加上埋點(diǎn),也可以設(shè)置成debug級(jí)別,這樣利用log4j/logback等日志框架動(dòng)態(tài)更新級(jí)別,可以隨時(shí)查看耗時(shí),從而更能夠針對(duì)性的優(yōu)化,對(duì)于本文說(shuō)的這個(gè)場(chǎng)景,我們首先看看是不是代碼的邏輯有問(wèn)題,然后看看是不是數(shù)據(jù)庫(kù)的問(wèn)題,比如說(shuō)沒(méi)建索引、數(shù)據(jù)量過(guò)大等,再去想辦法針對(duì)性的優(yōu)化,而不要上來(lái)就擼代碼。

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀(guān)點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI