时间加权向量存储检索器
此检索器结合了语义相似度和时间衰减进行检索。
评分算法为:
语义相似度 + (1.0 - 衰减率) ^ 经过的小时数
值得注意的是,经过的小时数
是指自从检索器中的对象上次访问以来经过的小时数,而不是自从它被创建以来的小时数。这意味着经常访问的对象保持“新鲜”。
import faiss
from datetime import datetime, timedelta
from langchain.docstore import InMemoryDocstore
from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain.schema import Document
from langchain.vectorstores import FAISS
低衰减率 (Low Decay Rate)
低 低衰减率 (Low Decay Rate)
(在这里,为了极端起见,我们将它设置为接近 0)意味着记忆将会更长时间地 "记住"。衰减率
为 0 意味着记忆永远不会被遗忘,使得这个检索器等效于向量查找。
Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.0000000000000000000000001, k=1)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents([Document(page_content="hello world", metadata={"last_accessed_at": yesterday})])
retriever.add_documents([Document(page_content="hello foo")])
['d7f85756-2371-4bdf-9140-052780a0f9b3']
"Hello World" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough
retriever.get_relevant_documents("hello world")
[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 678341), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]
高衰减率 (High Decay Rate)
使用高 高衰减率 (High Decay Rate)
(例如,多个 9),最近分数
迅速降为 0!如果将其全部设置为 1,对于所有对象,最近性
都是 0,再次使得这等效于向量查找。
Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(vectorstore=vectorstore, decay_rate=.999, k=1)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents([Document(page_content="hello world", metadata={"last_accessed_at": yesterday})])
retriever.add_documents([Document(page_content="hello foo")])
['40011466-5bbe-4101-bfd1-e22e7f505de2']
"Hello Foo" is returned first because "hello world" is mostly forgotten
retriever.get_relevant_documents("hello world")
[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 494798), 'created_at': datetime.datetime(2023, 4, 16, 22, 9, 2, 178722), 'buffer_idx': 1})]
虚拟时间 (Virtual Time)
使用 LangChain 中的一些实用工具,您可以模拟出时间组件
from langchain.utils import mock_now
import datetime
Notice the last access time is that date time
with mock_now(datetime.datetime(2011, 2, 3, 10, 11)):
print(retriever.get_relevant_documents("hello world"))
[Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2011, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 5, 13, 21, 0, 27, 279596), 'buffer_idx': 0})]