Skip to main content

时间加权向量存储检索器 time_weighted_vectorstore

LangChain

该检索器使用语义相似度和时间衰减的组合。

评分算法如下:

semantic_similarity + (1.0 - decay_rate) ^ hours_passed

值得注意的是,hours_passed 指的是检索器中的对象 上次被访问 以来经过的小时数,而不是它被创建后经过的小时数。这意味着经常访问的对象保持“新鲜”。

import faiss

from datetime import datetime, timedelta
from langchain.docstore import InMemoryDocstore
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain.schema import Document
from langchain.vectorstores import FAISS

低衰减率 (Low Decay Rate)

低衰减率 (Low Decay Rate)(在这里,为了极端起见,我们将它设置为接近 0)意味着记忆将会更长时间地 "记住"。衰减率 为 0 意味着记忆永远不会被遗忘,使得这个检索器等效于向量查找。

Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.0000000000000000000000001, k=1)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents([Document(page_content="hello world", metadata={"last_accessed_at": yesterday})])
retriever.add_documents([Document(page_content="hello foo")])
    ['d7f85756-2371-4bdf-9140-052780a0f9b3']
"Hello World" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough
retriever.get_relevant_documents("hello world")
    [Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 678341), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]

高衰减率 (High Decay Rate)

使用高 高衰减率 (High Decay Rate)(例如,多个 9),最近分数 迅速降为 0!如果将其全部设置为 1,对于所有对象,最近性 都是 0,再次使得这等效于向量查找。

Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.999, k=1)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents([Document(page_content="hello world", metadata={"last_accessed_at": yesterday})])
retriever.add_documents([Document(page_content="hello foo")])
    ['40011466-5bbe-4101-bfd1-e22e7f505de2']
"Hello Foo" is returned first because "hello world" is mostly forgotten
retriever.get_relevant_documents("hello world")
    [Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 494798), 'created_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 178722), 'buffer_idx': 1})]

虚拟时间 (Virtual Time)

使用 LangChain 中的一些实用工具,您可以模拟出时间组件

from langchain.utils import mock_now
import datetime
Notice the last access time is that date time
with mock_now(datetime.datetime(2011, 2, 3, 10, 11)):
print(retriever.get_relevant_documents("hello world"))
    [Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2011, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]