Skip to main content

Caching

LangChain为聊天模型提供可选的缓存层。这有两个用途:

它可以通过减少对LLM提供者的API调用次数来节省费用,如果您经常多次请求相同的完成。 它可以通过减少对LLM提供者的API调用次数来加快应用程序的速度。

import langchain
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()

In Memory Cache

from langchain.cache import InMemoryCache

langchain.llm_cache = InMemoryCache()

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
    CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
Wall time: 4.83 s

"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
    CPU times: user 238 µs, sys: 143 µs, total: 381 µs
Wall time: 1.76 ms

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLite Cache

rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

langchain.llm_cache = SQLiteCache(database_path=".langchain.db")
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
    CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
Wall time: 825 ms

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
    CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
Wall time: 2.67 ms

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'