Python怎么使用LRU緩存策略進(jìn)行緩存

發(fā)布時間：2023-05-17 14:07:26 來源：億速云閱讀：115 作者：iii 欄目：編程語言

本文小編為大家詳細(xì)介紹“Python怎么使用LRU緩存策略進(jìn)行緩存”，內(nèi)容詳細(xì)，步驟清晰，細(xì)節(jié)處理妥當(dāng)，希望這篇“Python怎么使用LRU緩存策略進(jìn)行緩存”文章能幫助大家解決疑惑，下面跟著小編的思路慢慢深入，一起來學(xué)習(xí)新知識吧。

一、Python 緩存

① 緩存作用

緩存是一種優(yōu)化技術(shù)，可以在應(yīng)用程序中使用它來將最近或經(jīng)常使用的數(shù)據(jù)保存在內(nèi)存中，通過這種方式來訪問數(shù)據(jù)的速度比直接讀取磁盤文件的高很多。
假設(shè)我們搭建了一個新聞聚合網(wǎng)站，類似于 Feedly，其獲取不同來源的新聞然后聚合展示。當(dāng)用戶瀏覽新聞的時候，后臺程序會將文章下載然后顯示到用戶屏幕上。如果不使用緩存技術(shù)的話，當(dāng)用戶多次切換瀏覽相同文章的時候，必須多次下載，效率低下且很不友好。
更好的做法就是在獲取每篇文章之后，在本地進(jìn)行內(nèi)容的存儲，比如存儲在數(shù)據(jù)庫中；然后，當(dāng)用戶下次打開同一篇文章的時候，后臺程序可以從本地存儲打開內(nèi)容，而不是再次下載源文件，即這種技術(shù)稱為緩存。

② 使用 Python 字典實現(xiàn)緩存

以新聞聚合網(wǎng)站為例，不必每次都去下載文章內(nèi)容，而是先檢查緩存數(shù)據(jù)中是否存在對應(yīng)的內(nèi)容，只有當(dāng)沒有時，才會讓服務(wù)器下載文章。

如下的示例程序，就是使用 Python 字典實現(xiàn)緩存的，將文章的 URL 作為鍵，并將其內(nèi)容作為值；執(zhí)行之后，可以看到當(dāng)?shù)诙螆?zhí)行 get_article 函數(shù)的時候，直接就返回結(jié)果并沒有讓服務(wù)器下載：

import requests

cache = dict()

def get_article_from_server(url):
    print("Fetching article from server...")
    response = requests.get(url)
    return response.text

def get_article(url):
    print("Getting article...")
    if url not in cache:
        cache[url] = get_article_from_server(url)
    return cache[url]

get_article("https://www.escapelife.site/love-python.html")
get_article("https://www.escapelife.site/love-python.html")

將此代碼保存到一個 caching.py 文件中，安裝 requests 庫，然后運行腳本：

# 安裝依賴
$ pip install requests

# 執(zhí)行腳本
$ python python_caching.py
Getting article...
Fetching article from server...
Getting article...

盡管調(diào)用 get_article() 兩次（第 17 行和第 18 行）字符串“Fetching article from server…”，但仍然只輸出一次。發(fā)生這種情況的原因是，在第一次訪問文章之后，將其 URL 和內(nèi)容放入緩存字典中，第二次時代碼不需要再次從服務(wù)器獲取項目。

③ 使用字典來做緩存的弊端

上面這種緩存實現(xiàn)存在一個非常大的問題，那就是字典的內(nèi)容將會無限增長，即大量用戶連續(xù)瀏覽文章的時候，后臺程序?qū)⒉粩嘞蜃值渲腥胄枰鎯Φ膬?nèi)容，服務(wù)器內(nèi)存被擠爆，最終導(dǎo)致應(yīng)用程序崩潰。

上面這種緩存實現(xiàn)存在一個非常大的問題，那就是字典的內(nèi)容將會無限增長，即大量用戶連續(xù)瀏覽文章的時候，后臺程序?qū)⒉粩嘞蜃值渲腥胄枰鎯Φ膬?nèi)容，服務(wù)器內(nèi)存被擠爆，最終導(dǎo)致應(yīng)用程序崩潰。
要解決上述這個問題，就需要有一種策略來決定哪些文章應(yīng)該留在內(nèi)存中，哪些文章應(yīng)該被刪除掉，這些緩存策略其實是一些算法，它們用于管理緩存的信息，并選擇丟棄哪些項以為新項騰出空間。
當(dāng)然這里不必去實現(xiàn)管理緩存的算法，只需要使用不同的策略來從緩存中移除項，并防止其增長超過其最大大小。五種常見的緩存算法如下所示：

緩存策略	英文名稱	淘汰條件	在什么時候最有用
先進(jìn)先出算法(FIFO)	First-In/First-Out	淘汰最舊的條目	較新的條目最有可能被重用
后進(jìn)先出算法(LIFO)	Last-In/First-Out	淘汰最新的條目	較舊的條目最有可能被重用
最近最少使用算法(LRU)	Least Recently Used	淘汰最近使用最少的條目	最近使用的條目最有可能被重用
最近最多使用算法(MRU)	Most Recently Used	淘汰最近使用最多的條目	最近不用的條目最有可能被重用
最近最少命中算法(LFU)	Least Frequently Used	淘汰最不經(jīng)常訪問的條目	命中率很高的條目更有可能被重用

看了上述五種緩存算法，是不是看到 LRU 和 LFU 的時候有點懵，主要是通過中文對應(yīng)的解釋很難理解其真實的含義，看看英文的話就不難理解了。LRU 和 LFU 算法的不同之處在于：

LRU 基于訪問時間的淘汰規(guī)則，根據(jù)數(shù)據(jù)的歷史訪問記錄來進(jìn)行淘汰數(shù)據(jù)，如果數(shù)據(jù)最近被訪問過，那么將來被訪問的幾率也更高；
LFU 基于訪問次數(shù)的淘汰規(guī)則，根據(jù)數(shù)據(jù)的歷史訪問頻率來淘汰數(shù)據(jù)，如果數(shù)據(jù)過去被訪問多次，那么將來被訪問的頻率也更高；

比如，以十分鐘為一個節(jié)點，每分鐘進(jìn)行一次頁面調(diào)度，當(dāng)所需的頁面走向為 2 1 2 4 2 3 4 時，且調(diào)頁面 4 時會發(fā)生缺頁中斷；若按 LRU 算法的話，應(yīng)換頁面 1（十分鐘內(nèi)頁面 1 最久未被使用），但按 LFU 算法的話，應(yīng)換頁面 3（十分鐘內(nèi)頁面 3 只使用一次）。

二、深入理解 LRU 算法

① 查看 LRU 緩存的特點

使用 LRU 策略實現(xiàn)的緩存是按照使用順序進(jìn)行排序的，每次訪問條目時，LRU 算法就會將其移到緩存的頂部。通過這種方式，算法可以通過查看列表的底部，快速識別出最長時間未使用的條目。

如下所示，用戶從網(wǎng)絡(luò)上請求第一篇文章的 LRU 策略存儲記錄：

Python怎么使用LRU緩存策略進(jìn)行緩存

在將文章提供給用戶之前，緩存如何將其存儲在最近的槽中？如下所示，用戶請求第二篇文章時發(fā)生的情況，第二篇文章存儲到最上層的位置，即第二篇文章采用了最近的位置，將第一篇文章推到列表下方：

Python怎么使用LRU緩存策略進(jìn)行緩存

LRU 策略假定使用的對象越新，將來使用該對象的可能性就越大，因此它嘗試將該對象保留在緩存中的時間最長，即如果發(fā)生條目淘汰的話，會優(yōu)先淘汰第一篇文檔的緩存存儲記錄。

② 查看 LRU 緩存的結(jié)構(gòu)

在 Python 中實現(xiàn) LRU 緩存的一種方法就是使用雙向鏈表（doubly linked list）和哈希映射（hash map），雙向鏈表的頭元素將指向最近使用的條目，而其尾部將指向最近使用最少的條目。LRU 緩存實現(xiàn)邏輯結(jié)構(gòu)如下：

Python怎么使用LRU緩存策略進(jìn)行緩存

通過使用哈希映射，可以將每個條目映射到雙鏈表中的特定位置，從而確保對緩存中的每個項的訪問。這個策略非?？?，訪問最近最少使用的項和更新緩存的復(fù)雜度均為 O(1) 操作。

而從 Python3.2 版本開始，Python 新增 @lru_cache 這個裝飾器用于實現(xiàn) LRU 策略，從此可以使用這個裝飾器來裝飾函數(shù)并緩存其計算結(jié)果。

三、使用 lru_cache 裝飾器

① @lru_cache 裝飾器的實現(xiàn)原理

有很多方法可以實現(xiàn)應(yīng)用程序的快速響應(yīng)，而使用緩存就是一種非常常見的方法。如果能夠正確使用緩存的話，可以使響應(yīng)變得更快且減少計算資源的額外負(fù)載。
在 Python 中 functools 模塊自帶了 @lru_cache 這個裝飾器來做緩存，其能夠使用最近最少使用（LRU）策略來緩存函數(shù)的計算結(jié)果，這是一種簡單但功能強大的技術(shù)：

實現(xiàn) @lru_cache 裝飾器；
了解 LRU 策略的工作運作原理；
使用 @lru_cache 裝飾器來提高性能；
擴展 @lru_cache 裝飾器的功能并使其在特定時間后過期。

Python怎么使用LRU緩存策略進(jìn)行緩存

就像先前實現(xiàn)的緩存方案一樣，Python 中的 @lru_cache 裝飾器存儲也是使用字典來做為存儲對象的，它將函數(shù)的執(zhí)行結(jié)果緩存在字典的 key 里面，該 key 由對該函數(shù)的調(diào)用（包括函數(shù)的參數(shù)）組成，這就意味著這些函數(shù)的參數(shù)必須是可哈希的，裝飾器才能正常工作。

② 斐波拉契數(shù)列

我們都應(yīng)該知道斐波拉契數(shù)列的計算方式，常見的解決方式就是使用遞歸的思路：

0、1、1、2、3、5, 8、13、21、34 ……；
2 是上兩項的和 ->（1+1）；
3 是上兩項的和 ->（1+2）；
5 是上兩項的和 ->（2+3）。

Python怎么使用LRU緩存策略進(jìn)行緩存

遞歸的計算簡潔并且直觀，但是由于存在大量重復(fù)計算，實際運行效率很低，并且會占用較多的內(nèi)存。但是這里并不是需要關(guān)注的重點，只是來作為演示示例而已：

# 匿名函數(shù)
fib = lambda n: 1 if n <= 1 else fib(n-1) + fib(n-2)

# 將時間復(fù)雜度降低到線性
fib = lambda n, a=1, b=1: a if n == 0 else fib(n-1, b, a+b)

# 保證了匿名函數(shù)的匿名性
fib = lambda n, fib: 1 if n <= 1 else fib(n-1, fib) + fib(n-2, fib)

③ 使用 @lru_cache 緩存輸出結(jié)果

使用 @lru_cache 裝飾器來緩存的話，可以將函數(shù)調(diào)用結(jié)果存儲在內(nèi)存中，以便再次請求時直接返回結(jié)果：

from functools import lru_cache

@lru_cache
def fib(n):
    if n==1 or n==2:
        return 1
    else:
        return fib(n-1) + fib(n-2)

④ 限制 @lru_cache 裝飾器大小

Python 的 @lru_cache 裝飾器提供了一個 maxsize 屬性，該屬性定義了在緩存開始淘汰舊條目之前的最大條目數(shù)，默認(rèn)情況下，maxsize 設(shè)置為 128。

如果將 maxsize 設(shè)置為 None 的話，則緩存將無限期增長，并且不會驅(qū)逐任何條目。

from functools import lru_cache

@lru_cache(maxsize=16)
def fib(n):
    if n==1 or n==2:
        return 1
    else:
        return fib(n-1) + fib(n-2)

# 查看緩存列表
>>> print(steps_to.cache_info())
CacheInfo(hits=52, misses=30, maxsize=16, currsize=16)

⑤ 使用 @lru_cache 實現(xiàn) LRU 緩存

就像在前面實現(xiàn)的緩存解決方案一樣，@lru_cache 在底層使用一個字典，它將函數(shù)的結(jié)果緩存在一個鍵下，該鍵包含對函數(shù)的調(diào)用，包括提供的參數(shù)。這意味著這些參數(shù)必須是可哈希的，才能讓 decorator 工作。

示例：玩樓梯：

想象一下，你想通過一次跳上一個、兩個或三個樓梯來確定到達(dá)樓梯中的一個特定樓梯的所有不同方式，到第四個樓梯有多少條路？所有不同的組合如下所示：

Python怎么使用LRU緩存策略進(jìn)行緩存

可以這樣描述，為了到達(dá)當(dāng)前的樓梯，你可以從下面的一個、兩個或三個樓梯跳下去，將能夠到達(dá)這些點的跳躍組合的數(shù)量相加，便能夠獲得到達(dá)當(dāng)前位置的所有可能方法。

例如到達(dá)第四個樓梯的組合數(shù)量將等于你到達(dá)第三、第二和第一個樓梯的不同方式的總數(shù)。如下所示，有七種不同的方法可以到達(dá)第四層樓梯：

Python怎么使用LRU緩存策略進(jìn)行緩存

注意給定階梯的解是如何建立在較小子問題的答案之上的，在這種情況下，為了確定到達(dá)第四個樓梯的不同路徑，可以將到達(dá)第三個樓梯的四種路徑、到達(dá)第二個樓梯的兩種路徑以及到達(dá)第一個樓梯的一種路徑相加。這種方法稱為遞歸，下面是一個實現(xiàn)這個遞歸的函數(shù)：

def steps_to(stair):
    if stair == 1:
        # You can reach the first stair with only a single step
        # from the floor.
        return 1
    elif stair == 2:
        # You can reach the second stair by jumping from the
        # floor with a single two-stair hop or by jumping a single
        # stair a couple of times.
        return 2
    elif stair == 3:
        # You can reach the third stair using four possible
        # combinations:
        # 1. Jumping all the way from the floor
        # 2. Jumping two stairs, then one
        # 3. Jumping one stair, then two
        # 4. Jumping one stair three times
        return 4
    else:
        # You can reach your current stair from three different places:
        # 1. From three stairs down
        # 2. From two stairs down
        # 2. From one stair down
        #
        # If you add up the number of ways of getting to those
        # those three positions, then you should have your solution.
        return (
            steps_to(stair - 3)
            + steps_to(stair - 2)
            + steps_to(stair - 1)
        )
print(steps_to(4))

將此代碼保存到一個名為 stairs.py 的文件中，并使用以下命令運行它：

$ python stairs.py
7

太棒了，這個代碼適用于 4 個樓梯，但是數(shù)一下要走多少步才能到達(dá)樓梯上更高的地方呢？將第 33 行中的樓梯數(shù)更改為 30，并重新運行腳本：

$ python stairs.py
53798080

可以看到結(jié)果超過 5300 萬個組合，這可真的有點多。

時間代碼：

當(dāng)找到第 30 個樓梯的解決方案時，腳本花了相當(dāng)多的時間來完成。要獲得基線，可以度量代碼運行的時間，要做到這一點，可以使用 Python 的 timeit module，在第 33 行之后添加以下代碼：

setup_code = "from __main__ import steps_to"
36stmt = "steps_to(30)"
37times = repeat(setup=setup_code, stmt=stmt, repeat=3, number=10)
38print(f"Minimum execution time: {min(times)}")

還需要在代碼的頂部導(dǎo)入 timeit module：

from timeit import repeat

以下是對這些新增內(nèi)容的逐行解釋：

第 35 行導(dǎo)入 steps_to() 的名稱，以便 time.com .repeat() 知道如何調(diào)用它；
第 36 行用希望到達(dá)的樓梯數(shù)（在本例中為 30）準(zhǔn)備對函數(shù)的調(diào)用，這是將要執(zhí)行和計時的語句；
第 37 行使用設(shè)置代碼和語句調(diào)用 time.repeat()，這將調(diào)用該函數(shù) 10 次，返回每次執(zhí)行所需的秒數(shù)；
第 38 行標(biāo)識并打印返回的最短時間。現(xiàn)在再次運行腳本：

$ python stairs.py
53798080
Minimum execution time: 40.014977024000004

可以看到的秒數(shù)取決于特定硬件，在我的系統(tǒng)上，腳本花了 40 秒，這對于 30 級樓梯來說是相當(dāng)慢的。

使用記憶來改進(jìn)解決方案：

這種遞歸實現(xiàn)通過將其分解為相互構(gòu)建的更小的步驟來解決這個問題，如下所示是一個樹，其中每個節(jié)點表示對 steps_to() 的特定調(diào)用：

Python怎么使用LRU緩存策略進(jìn)行緩存

注意需要如何使用相同的參數(shù)多次調(diào)用 steps_to()，例如 steps_to(5) 計算兩次，steps_to(4) 計算四次，steps_to(3) 計算七次，steps_to(2) 計算六次，多次調(diào)用同一個函數(shù)會增加不必要的計算周期，結(jié)果總是相同的。

為了解決這個問題，可以使用一種叫做記憶的技術(shù)，這種方法將函數(shù)的結(jié)果存儲在內(nèi)存中，然后在需要時引用它，從而確保函數(shù)不會為相同的輸入運行多次，這個場景聽起來像是使用 Python 的 @lru_cache 裝飾器的絕佳機會。

只要做兩個改變，就可以大大提高算法的運行時間：

從 functools module 導(dǎo)入 @lru_cache 裝飾器；
使用 @lru_cache 裝飾 steps_to()。

下面是兩個更新后的腳本頂部的樣子：

from functools import lru_cache
from timeit import repeat
 
@lru_cache
def steps_to(stair):
	if stair == 1:

運行更新后的腳本產(chǎn)生如下結(jié)果：

$ python stairs.py
53798080
Minimum execution time: 7.999999999987184e-07

緩存函數(shù)的結(jié)果會將運行時從 40 秒降低到 0.0008 毫秒，這是一個了不起的進(jìn)步。@lru_cache 裝飾器存儲了每個不同輸入的 steps_to() 的結(jié)果，每次代碼調(diào)用帶有相同參數(shù)的函數(shù)時，它都直接從內(nèi)存中返回正確的結(jié)果，而不是重新計算一遍答案，這解釋了使用 @lru_cache 時性能的巨大提升。

⑥ 解包 @lru_cache 的功能

有了@lru_cache 裝飾器，就可以將每個調(diào)用和應(yīng)答存儲在內(nèi)存中，以便以后再次請求時進(jìn)行訪問，但是在內(nèi)存耗盡之前，可以節(jié)省多少次調(diào)用呢？

Python 的 @lru_cache 裝飾器提供了一個 maxsize 屬性，它定義了在緩存開始清除舊條目之前的最大條目數(shù)，缺省情況下，maxsize 設(shè)置為 128，如果將 maxsize 設(shè)置為 None，那么緩存將無限增長，并且不會驅(qū)逐任何條目。如果在內(nèi)存中存儲大量不同的調(diào)用，這可能會成為一個問題。

如下是 @lru_cache 使用 maxsize 屬性：

from functools import lru_cache
from timeit import repeat

@lru_cache(maxsize=16)
def steps_to(stair):
    if stair == 1:

在本例中，將緩存限制為最多 16 個條目，當(dāng)一個新調(diào)用傳入時，decorator 的實現(xiàn)將會從現(xiàn)有的 16 個條目中刪除最近最少使用的條目，為新條目騰出位置。

要查看添加到代碼中的新內(nèi)容會發(fā)生什么，可以使用 @lru_cache 裝飾器提供的 cache_info() 來檢查命中和未命中的次數(shù)以及當(dāng)前緩存的大小。為了清晰起見，刪除乘以函數(shù)運行時的代碼，以下是修改后的最終腳本：

from functools import lru_cache
from timeit import repeat

@lru_cache(maxsize=16)
def steps_to(stair):
    if stair == 1:
        # You can reach the first stair with only a single step
        # from the floor.
        return 1
    elif stair == 2:
        # You can reach the second stair by jumping from the
        # floor with a single two-stair hop or by jumping a single
        # stair a couple of times.
        return 2
    elif stair == 3:
        # You can reach the third stair using four possible
        # combinations:
        # 1. Jumping all the way from the floor
        # 2. Jumping two stairs, then one
        # 3. Jumping one stair, then two
        # 4. Jumping one stair three times
        return 4
    else:
        # You can reach your current stair from three different places:
        # 1. From three stairs down
        # 2. From two stairs down
        # 2. From one stair down
        #
        # If you add up the number of ways of getting to those
        # those three positions, then you should have your solution.
        return (
            steps_to(stair - 3)
            + steps_to(stair - 2)
            + steps_to(stair - 1)
        )
print(steps_to(30))
print(steps_to.cache_info())

如果再次調(diào)用腳本，可以看到如下結(jié)果：

$ python stairs.py
53798080
CacheInfo(hits=52, misses=30, maxsize=16, currsize=16)

可以使用 cache_info() 返回的信息來了解緩存是如何執(zhí)行的，并對其進(jìn)行微調(diào)，以找到速度和存儲之間的適當(dāng)平衡。下面是 cache_info() 提供的屬性的詳細(xì)說明：

hits=52 是 @lru_cache 直接從內(nèi)存中返回的調(diào)用數(shù)，因為它們存在于緩存中；
misses =30 是被計算的不是來自內(nèi)存的調(diào)用數(shù)，因為試圖找到到達(dá)第 30 級樓梯的臺階數(shù)，所以每次調(diào)用都在第一次調(diào)用時錯過了緩存是有道理的；
maxsize =16 是用裝飾器的 maxsize 屬性定義的緩存的大?。?/p>
currsize =16 是當(dāng)前緩存的大小，在本例中它表明緩存已滿。

如果需要從緩存中刪除所有條目，那么可以使用 @lru_cache 提供的 cache_clear()。

四、添加緩存過期

假設(shè)想要開發(fā)一個腳本來監(jiān)視 Real Python 并在任何包含單詞 Python 的文章中打印字符數(shù)。真正的 Python 提供了一個 Atom feed，因此可以使用 feedparser 庫來解析提要，并使用請求庫來加載本文的內(nèi)容。

如下是監(jiān)控腳本的實現(xiàn)：

import feedparser
import requests
import ssl
import time

if hasattr(ssl, "_create_unverified_context"):
    ssl._create_default_https_context = ssl._create_unverified_context

def get_article_from_server(url):
    print("Fetching article from server...")
    response = requests.get(url)
    return response.text

def monitor(url):
    maxlen = 45
    while True:
        print("\nChecking feed...")
        feed = feedparser.parse(url)

        for entry in feed.entries[:5]:
            if "python" in entry.title.lower():
                truncated_title = (
                    entry.title[:maxlen] + "..."
                    if len(entry.title) > maxlen
                    else entry.title
                )
                print(
                    "Match found:",
                    truncated_title,
                    len(get_article_from_server(entry.link)),
                )
        time.sleep(5)
monitor("https://realpython.com/atom.xml")

將此腳本保存到一個名為 monitor.py 的文件中，安裝 feedparser 和請求庫，然后運行該腳本，它將持續(xù)運行，直到在終端窗口中按 Ctrl+C 停止它：

$ pip install feedparser requests
$ python monitor.py

Checking feed...
Fetching article from server...
The Real Python Podcast – Episode #28: Using ... 29520
Fetching article from server...
Python Community Interview With David Amos 54256
Fetching article from server...
Working With Linked Lists in Python 37099
Fetching article from server...
Python Practice Problems: Get Ready for Your ... 164888
Fetching article from server...
The Real Python Podcast – Episode #27: Prepar... 30784

Checking feed...
Fetching article from server...
The Real Python Podcast – Episode #28: Using ... 29520
Fetching article from server...
Python Community Interview With David Amos 54256
Fetching article from server...
Working With Linked Lists in Python 37099
Fetching article from server...
Python Practice Problems: Get Ready for Your ... 164888
Fetching article from server...
The Real Python Podcast – Episode #27: Prepar... 30784

代碼解釋：

第 6 行和第 7 行：當(dāng) feedparser 試圖訪問通過 HTTPS 提供的內(nèi)容時，這是一個解決方案；
第 16 行：monitor() 將無限循環(huán)；
第 18 行：使用 feedparser，代碼從真正的 Python 加載并解析提要；
第 20 行：循環(huán)遍歷列表中的前 5 個條目；
第 21 到 31 行：如果單詞 python 是標(biāo)題的一部分，那么代碼將連同文章的長度一起打印它；
第 33 行：代碼在繼續(xù)之前休眠了 5 秒鐘；
第 35 行：這一行通過將 Real Python 提要的 URL 傳遞給 monitor() 來啟動監(jiān)視過程。

每當(dāng)腳本加載一篇文章時，“Fetching article from server…”的消息就會打印到控制臺，如果讓腳本運行足夠長的時間，那么將看到這條消息是如何反復(fù)顯示的，即使在加載相同的鏈接時也是如此。

這是一個很好的機會來緩存文章的內(nèi)容，并避免每五秒鐘訪問一次網(wǎng)絡(luò)，可以使用 @lru_cache 裝飾器，但是如果文章的內(nèi)容被更新，會發(fā)生什么呢？第一次訪問文章時，裝飾器將存儲文章的內(nèi)容，并在以后每次返回相同的數(shù)據(jù)；如果更新了帖子，那么監(jiān)視器腳本將永遠(yuǎn)無法實現(xiàn)它，因為它將提取存儲在緩存中的舊副本。要解決這個問題，可以將緩存條目設(shè)置為過期。

from functools import lru_cache, wraps
from datetime import datetime, timedelta

def timed_lru_cache(seconds: int, maxsize: int = 128):
    def wrapper_cache(func):
        func = lru_cache(maxsize=maxsize)(func)
        func.lifetime = timedelta(seconds=seconds)
        func.expiration = datetime.utcnow() + func.lifetime

        @wraps(func)
        def wrapped_func(*args, **kwargs):
            if datetime.utcnow() >= func.expiration:
                func.cache_clear()
                func.expiration = datetime.utcnow() + func.lifetime
            return func(*args, **kwargs)
        return wrapped_func
    return wrapper_cache

@timed_lru_cache(10)
def get_article_from_server(url):
    ...

代碼解釋：

第 4 行：@timed_lru_cache 裝飾器將支持緩存中條目的生命周期(以秒為單位)和緩存的最大大??；

第 6 行：代碼用 lru_cache 裝飾器包裝了裝飾函數(shù)，這允許使用 lru_cache 已經(jīng)提供的緩存功能；

第 7 行和第 8 行：這兩行用兩個表示緩存生命周期和它將過期的實際日期的屬性來修飾函數(shù)；

第 12 到 14 行：在訪問緩存中的條目之前，裝飾器檢查當(dāng)前日期是否超過了過期日期，如果是這種情況，那么它將清除緩存并重新計算生存期和過期日期。

請注意，當(dāng)條目過期時，此裝飾器如何清除與該函數(shù)關(guān)聯(lián)的整個緩存，生存期適用于整個緩存，而不適用于單個項目，此策略的更復(fù)雜實現(xiàn)將根據(jù)條目的單個生存期將其逐出。

在程序中，如果想要實現(xiàn)不同緩存策略，可以查看 cachetools 這個庫，該庫提供了幾個集合和修飾符，涵蓋了一些最流行的緩存策略。

使用新裝飾器緩存文章：

現(xiàn)在可以將新的 @timed_lru_cache 裝飾器與監(jiān)視器腳本一起使用，以防止每次訪問時獲取文章的內(nèi)容。為了簡單起見，把代碼放在一個腳本中，可以得到以下結(jié)果：

import feedparser
import requests
import ssl
import time

from functools import lru_cache, wraps
from datetime import datetime, timedelta

if hasattr(ssl, "_create_unverified_context"):
    ssl._create_default_https_context = ssl._create_unverified_context

def timed_lru_cache(seconds: int, maxsize: int = 128):
    def wrapper_cache(func):
        func = lru_cache(maxsize=maxsize)(func)
        func.lifetime = timedelta(seconds=seconds)
        func.expiration = datetime.utcnow() + func.lifetime

        @wraps(func)
        def wrapped_func(*args, **kwargs):
            if datetime.utcnow() >= func.expiration:
                func.cache_clear()
                func.expiration = datetime.utcnow() + func.lifetime

            return func(*args, **kwargs)

        return wrapped_func

    return wrapper_cache

@timed_lru_cache(10)
def get_article_from_server(url):
    print("Fetching article from server...")
    response = requests.get(url)
    return response.text

def monitor(url):
    maxlen = 45
    while True:
        print("\nChecking feed...")
        feed = feedparser.parse(url)

        for entry in feed.entries[:5]:
            if "python" in entry.title.lower():
                truncated_title = (
                    entry.title[:maxlen] + "..."
                    if len(entry.title) > maxlen
                    else entry.title
                )
                print(
                    "Match found:",
                    truncated_title,
                    len(get_article_from_server(entry.link)),
                )
        time.sleep(5)
monitor("https://realpython.com/atom.xml")

請注意第 30 行如何使用 @timed_lru_cache 裝飾 get_article_from_server() 并指定 10 秒的有效性。在獲取文章后的 10 秒內(nèi)，任何試圖從服務(wù)器訪問同一篇文章的嘗試都將從緩存中返回內(nèi)容，而不會到達(dá)網(wǎng)絡(luò)。

運行腳本并查看結(jié)果：

$ python monitor.py

Checking feed...
Fetching article from server...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Fetching article from server...
Match found: Python Community Interview With David Amos 54254
Fetching article from server...
Match found: Working With Linked Lists in Python 37100
Fetching article from server...
Match found: Python Practice Problems: Get Ready for Your ... 164887
Fetching article from server...
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Checking feed...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Match found: Python Community Interview With David Amos 54254
Match found: Working With Linked Lists in Python 37100
Match found: Python Practice Problems: Get Ready for Your ... 164887
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Checking feed...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Match found: Python Community Interview With David Amos 54254
Match found: Working With Linked Lists in Python 37100
Match found: Python Practice Problems: Get Ready for Your ... 164887
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

Checking feed...
Fetching article from server...
Match found: The Real Python Podcast – Episode #28: Using ... 29521
Fetching article from server...
Match found: Python Community Interview With David Amos 54254
Fetching article from server...
Match found: Working With Linked Lists in Python 37099
Fetching article from server...
Match found: Python Practice Problems: Get Ready for Your ... 164888
Fetching article from server...
Match found: The Real Python Podcast – Episode #27: Prepar... 30783

請注意，代碼在第一次訪問匹配的文章時是如何打印“Fetching article from server…”這條消息的。之后，根據(jù)網(wǎng)絡(luò)速度和計算能力，腳本將從緩存中檢索文章一兩次，然后再次訪問服務(wù)器。

該腳本試圖每 5 秒訪問這些文章，緩存每 10 秒過期一次。對于實際的應(yīng)用程序來說，這些時間可能太短，因此可以通過調(diào)整這些配置來獲得顯著的改進(jìn)。

五、@lru_cache 裝飾器的官方實現(xiàn)

簡單理解，其實就是一個裝飾器：

def lru_cache(maxsize=128, typed=False):
    if isinstance(maxsize, int):
        if maxsize < 0:
            maxsize = 0
    elif callable(maxsize) and isinstance(typed, bool):
        user_function, maxsize = maxsize, 128
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)
    elif maxsize is not None:
        raise TypeError('Expected first argument to be an integer, a callable, or None')

    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)
    return decorating_function

_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    sentinel = object()          # unique object used to signal cache misses
    make_key = _make_key         # build a key from the function arguments
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3   # names for the link fields

    cache = {}  # 存儲也使用的字典
    hits = misses = 0
    full = False
    cache_get = cache.get
    cache_len = cache.__len__
    lock = RLock()                      # 因為雙向鏈表的更新不是線程安全的所以需要加鎖
    root = []                           # 雙向鏈表
    root[:] = [root, root, None, None]  # 初始化雙向鏈表

    if maxsize == 0:
        def wrapper(*args, **kwds):
            # No caching -- just a statistics update
            nonlocal misses
            misses += 1
            result = user_function(*args, **kwds)
            return result

    elif maxsize is None:
        def wrapper(*args, **kwds):
            # Simple caching without ordering or size limit
            nonlocal hits, misses
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            misses += 1
            result = user_function(*args, **kwds)
            cache[key] = result
            return result

    else:
        def wrapper(*args, **kwds):
            # Size limited caching that tracks accesses by recency
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
                link = cache_get(key)
                if link is not None:
                    # Move the link to the front of the circular queue
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
                misses += 1
            result = user_function(*args, **kwds)
            with lock:
                if key in cache:
                    pass
                elif full:
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    oldresult = root[RESULT]
                    root[KEY] = root[RESULT] = None
                    del cache[oldkey]
                    cache[key] = oldroot
                else:
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    full = (cache_len() >= maxsize)
            return result

    def cache_info():
        """Report cache statistics"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """Clear the cache and cache statistics"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

讀到這里，這篇“Python怎么使用LRU緩存策略進(jìn)行緩存”文章已經(jīng)介紹完畢，想要掌握這篇文章的知識點還需要大家自己動手實踐使用過才能領(lǐng)會，如果想了解更多相關(guān)內(nèi)容的文章，歡迎關(guān)注億速云行業(yè)資訊頻道。

向AI問一下細(xì)節(jié)

Python怎么使用LRU緩存策略進(jìn)行緩存

一、Python 緩存

① 緩存作用

② 使用 Python 字典實現(xiàn)緩存

二、深入理解 LRU 算法

① 查看 LRU 緩存的特點

② 查看 LRU 緩存的結(jié)構(gòu)

三、使用 lru_cache 裝飾器

① @lru_cache 裝飾器的實現(xiàn)原理

② 斐波拉契數(shù)列

③ 使用 @lru_cache 緩存輸出結(jié)果

④ 限制 @lru_cache 裝飾器大小

⑤ 使用 @lru_cache 實現(xiàn) LRU 緩存

⑥ 解包 @lru_cache 的功能

四、添加緩存過期

五、@lru_cache 裝飾器的官方實現(xiàn)

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

二、深入理解 LRU 算法

五、@lru_cache 裝飾器的官方實現(xiàn)