Kurzfristige Agent-Speicher-APIs mit LangGraph verwenden

LangGraph-Anwendungen müssen häufig den aktuellen Arbeitskontext beibehalten, ohne die vollständige Konversation bei jeder Kurve an das Modell zurückzugeben.

Agent Memory stellt zwei verschiedene kurzfristige Helfer für dieses Problem zur Verfügung:

In diesem Artikel verwenden Sie die LangGraph-Middleware um einen vordefinierten Agent, sodass Agent Memory automatisch Turns persistieren und eine Oracle-Kontextkarte injizieren kann, wenn der ausgeführte Prompt zu groß wird. Die Middleware komprimiert die Eingabeaufforderung, nachdem sie einen konfigurierten Schwellenwert überschritten hat. In diesem Beispiel wird get_context_card() gewählt, da die Komprimierung den Retrieval-bezogenen Kontext beibehalten soll, nicht nur eine Transkript-Zusammenfassung.

Warnung: Zusammenfassungen, Kontextkarten, abgerufene Datensätze und automatisch extrahierte Speicher sind vom Modell abgeleiteter oder abgerufener Text und müssen als nicht vertrauenswürdig behandelt werden. Wenn die automatische Extraktion oder Zusammenfassung aktiviert ist, kann dieser Text vom SDK auch in späteren Eingabeaufforderungen wie Speicherextraktion, Zusammenfassung, Kontextkarte oder Agent-Eingabeaufforderungen wiederverwendet werden, bevor die Anwendung die Möglichkeit hat, den spezifischen Zwischenwert zu prüfen. Prüfen Sie die Ausgaben, die Ihre Anwendung verbraucht, vermeiden Sie, dass vom Speicher abgeleiteter Text privilegierte Aktionen autorisiert, und verwenden Sie extract_memories=False oder explizite Speicherschreibvorgänge, wenn Ihr Workflow geprüft werden muss, bevor abgeleiteter Text die spätere Extraktion oder Kontexterstellung beeinflussen kann.

In diesem Artikel erfahren Sie, wie Sie:

Tipp: Informationen zum Packagesetup finden Sie unter Erste Schritte mit Agent-Speicher. Wenn Sie für dieses Beispiel eine lokale Oracle AI Database benötigen, lesen Sie Oracle AI Database lokal ausführen.

Agent-Speicher und LangGraph-Modelle konfigurieren

Create an Agent Memory client with an Oracle DB connection or pool, configure an Embedder for vector search, provide an Oracle memory LLM for context-card resolution, and use ChatOpenAI for the LangGraph agent.

from typing import Any

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, RemoveMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_openai import ChatOpenAI
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.runtime import Runtime

from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory

embedder = Embedder(
    model="YOUR_EMBEDDING_MODEL",
    api_base="YOUR_EMBEDDING_BASE_URL",
    api_key="YOUR_EMBEDDING_API_KEY",
)
memory_llm = Llm(
    model="YOUR_MEMORY_LLM_MODEL",
    api_base="YOUR_MEMORY_LLM_BASE_URL",
    api_key="YOUR_MEMORY_LLM_API_KEY",
    temperature=0,
)
langgraph_llm = ChatOpenAI(
    model="YOUR_CHAT_MODEL",
    base_url="YOUR_CHAT_BASE_URL",
    api_key="YOUR_CHAT_API_KEY",
    temperature=0,
)
db_pool = ...  #an oracledb connection or connection pool


agent_memory = OracleAgentMemory(
    connection=db_pool,
    embedder=embedder,
    llm=memory_llm,
)
thread_id = "langgraph_short_term_demo"
user_id = "user_123"
agent_id = "assistant_456"

Middleware und einen vordefinierten Agent konfigurieren

Der neue Benutzer und Assistent wird in Agent Memory umgewandelt. Sobald die ausgeführte Eingabeaufforderung einen Tokenschwellenwert überschreitet, wird der Status komprimiert, indem die vollständige Nachrichtenliste durch eine synthetische memory_context_card-Nachricht plus einen kleinen Schwanz der letzten Raw-Wendungen ersetzt wird. Dadurch bleibt der LangGraph-Zustand kompakt, während der vordefinierte Agent-Abruf-konformer Kurzzeitkontext erhalten bleibt.

In diesem Artikel wird die tokenbasierte Komprimierung verwendet. Sie können jedoch dasselbe Muster an andere Policys anpassen, z.B. alle paar Umdrehungen oder nach einem anwendungsspezifischen Trigger. Wenn Sie eine reine Transkriptkomprimierung implementieren, rufen Sie summary = thread.get_summary(...) auf, und lesen Sie summary.content. Behandeln Sie get_summary() nicht als eine Liste von Nachrichten.

def _message_text(message: BaseMessage | Any) -> str:
    content = getattr(message, "content", "")
    if isinstance(content, str):
        return content
    return str(content)


def _is_context_card_message(message: BaseMessage) -> bool:
    return isinstance(message, HumanMessage) and (
        getattr(message, "name", None) == "memory_context_card"
    )


class OracleShortTermMemoryMiddleware(AgentMiddleware):
    """Persist LangGraph turns and compact prompts with an OracleAgentMemory context card.

    Notes
    -----
    - ``before_model()`` receives the current LangGraph message state for this turn.
      After compaction, that state already includes the synthetic ``memory_context_card``
      message returned by a previous ``before_model()`` call.
    - The middleware strips that synthetic message back out before persisting or
      measuring token usage so OracleAgentMemory only stores real user/assistant turns
      and the compaction threshold is based on the organic conversation.
    - When compaction triggers, the middleware replaces the message history with one
      context-card message plus the most recent raw turns. On the next turn, that
      same injected message is seen again and filtered out before recomputing the
      next compacted prompt.
    """

    def __init__(
        self,
        memory: OracleAgentMemory,
        thread_id: str,
        user_id: str,
        agent_id: str,
        compaction_token_trigger: int,
        kept_message_count: int,
    ) -> None:
        self._thread = memory.create_thread(
            thread_id=thread_id,
            user_id=user_id,
            agent_id=agent_id,
            context_summary_update_frequency=4,
        )
        self._compaction_token_trigger = int(compaction_token_trigger)
        self._kept_message_count = int(kept_message_count)
        self._persisted_message_ids: set[str] = set()

    def before_model(
        self,
        state: dict[str, Any],
        runtime: Runtime[Any],
    ) -> dict[str, Any] | None:
        del runtime
        messages = list(state["messages"])
        #^ This will contain the context card message once the compaction occurs
        raw_messages = [message for message in messages if not _is_context_card_message(message)]
        self._persist_new_messages(raw_messages)

        #we exclude the context card from the token counting
        if count_tokens_approximately(raw_messages) < self._compaction_token_trigger:
            return None

        context_card = self._thread.get_context_card().content
        if not context_card:
            context_card = "<context_card>\n  No relevant short-term context yet.\n</context_card>"
        return {
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES),  #Clear existing message state.
                HumanMessage(content=context_card, name="memory_context_card"),
                *raw_messages[-self._kept_message_count :],
            ]
        }

    def _persist_new_messages(self, messages: list[BaseMessage]) -> None:
        persisted: list[dict[str, str]] = []
        for message in messages:
            #Persist only the conversational roles that map directly to short-
            #term memory turns. Tool/system/synthetic messages are skipped here.
            role = (
                "user"
                if isinstance(message, HumanMessage)
                else "assistant" if isinstance(message, AIMessage) else None
            )
            if role is None:
                continue

            content = _message_text(message).strip()
            if not content:
                continue

            #LangGraph messages usually have stable IDs. When they do not, fall back
            #to a content-derived key so the same turn is not persisted repeatedly if
            #the caller reuses the returned message list across later invocations.
            message_id = str(getattr(message, "id", "") or f"{role}:{hash(content)}")
            if message_id in self._persisted_message_ids:
                continue

            #Track what this middleware instance has already written so each real turn
            #is added to Oracle once even though later turns may still carry the same
            #messages in the LangGraph state.
            self._persisted_message_ids.add(message_id)
            persisted.append({"role": role, "content": content})

        if persisted:
            self._thread.add_messages(persisted)


short_term_middleware = OracleShortTermMemoryMiddleware(
    memory=agent_memory,
    thread_id=thread_id,
    user_id=user_id,
    agent_id=agent_id,
    compaction_token_trigger=120,
    kept_message_count=3,
)
agent = create_agent(
    model=langgraph_llm,
    tools=[],
    middleware=[short_term_middleware],
)

Spätere Antwort mit Middleware-injiziertem Kontext

"Benutzer anhängen" wechselt zur aktiven Nachrichtenliste des vordefinierten Agents und lässt die Middleware entscheiden, wann eine Kontextkarte injiziert werden soll. Wenn der spätere Zug ankommt, kann der Agent aus einem kompakten Zustand antworten, der noch den Kurzzeitkontext "Agent Memory" enthält. Das Beispiel druckt die eingespritzte Kontextkarte und enthält ein getrimmtes Beispiel, sodass Sie prüfen können, welche Verdichtung in den Prompt eingefügt wurde, ohne den vollständigen Block inline abzugeben.

messages: list[BaseMessage] = []


def print_current_context_card(messages: list[BaseMessage]) -> None:
    for message in messages:
        if _is_context_card_message(message):
            print(_message_text(message))
            return
    print("<context_card>\n  No injected context card yet.\n</context_card>")


def run_turn(user_text: str) -> str:
    messages.append(HumanMessage(content=user_text))
    result = agent.invoke({"messages": messages})
    messages[:] = list(result["messages"])
    assistant_message = next(
        message for message in reversed(messages) if isinstance(message, AIMessage)
    )
    return _message_text(assistant_message)


run_turn(
    "I'm Maya. I'm migrating our nightly invoice reconciliation workflow "
    "from cron jobs to LangGraph."
)
run_turn("The failing step right now is ledger enrichment after reconciliation.")
final_answer = run_turn(
    "What workflow am I migrating, which step is failing, and who am I?"
)

print_current_context_card(messages)
#<context_card>
#<topics>
#<topic>invoice reconciliation migration</topic>
#<topic>ledger enrichment failure</topic>
#...
#</topics>
#<summary>
#Maya is migrating the nightly invoice reconciliation workflow from cron jobs
#to LangGraph. The failing step is ledger enrichment after reconciliation.
#</summary>
#...
#</context_card>
print(final_answer)
#You're Maya, migrating your nightly invoice reconciliation workflow from cron jobs
#to LangGraph, and the ledger-enrichment step after reconciliation is currently failing.

Schlussfolgerung

In diesem Artikel haben Sie gelernt, wie Sie get_summary().content von get_context_card().content unterscheiden, den Kurzzeitkontext des Agent-Speichers um einen vordefinierten LangGraph-Agent konfigurieren und Middleware die Eingabeaufforderung mit einer Kontextkarte komprimieren lassen, wenn die Unterhaltung zu groß wird, um sie wörtlich zu halten.

Tipp: Nachdem Sie gelernt haben, wie Sie einem LangGraph-Ablauf kurzfristigen Threadkontext hinzufügen, können Sie jetzt mit Agent-Speicher mit LangGraph verwenden fortfahren.

Vollständiger Code

#Copyright © 2026 Oracle and/or its affiliates.
#This software is under the Apache License 2.0
#(LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
#(UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.

#Oracle Agent Memory Code Example - LangGraph Short-Term Memory
#--------------------------------------------------------------

##Configure Oracle Agent Memory and LangGraph models for short term context

from typing import Any

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, RemoveMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_openai import ChatOpenAI
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.runtime import Runtime

from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory

embedder = Embedder(
    model="YOUR_EMBEDDING_MODEL",
    api_base="YOUR_EMBEDDING_BASE_URL",
    api_key="YOUR_EMBEDDING_API_KEY",
)
memory_llm = Llm(
    model="YOUR_MEMORY_LLM_MODEL",
    api_base="YOUR_MEMORY_LLM_BASE_URL",
    api_key="YOUR_MEMORY_LLM_API_KEY",
    temperature=0,
)
langgraph_llm = ChatOpenAI(
    model="YOUR_CHAT_MODEL",
    base_url="YOUR_CHAT_BASE_URL",
    api_key="YOUR_CHAT_API_KEY",
    temperature=0,
)
db_pool = ...  #an oracledb connection or connection pool

agent_memory = OracleAgentMemory(
    connection=db_pool,
    embedder=embedder,
    llm=memory_llm,
)
thread_id = "langgraph_short_term_demo"
user_id = "user_123"
agent_id = "assistant_456"

##Configure short term memory middleware and a prebuilt LangGraph agent

def _message_text(message: BaseMessage | Any) -> str:
    content = getattr(message, "content", "")
    if isinstance(content, str):
        return content
    return str(content)

def _is_context_card_message(message: BaseMessage) -> bool:
    return isinstance(message, HumanMessage) and (
        getattr(message, "name", None) == "memory_context_card"
    )

class OracleShortTermMemoryMiddleware(AgentMiddleware):
    """Persist LangGraph turns and compact prompts with an OracleAgentMemory context card.

    Notes
    -----
    - ``before_model()`` receives the current LangGraph message state for this turn.
      After compaction, that state already includes the synthetic ``memory_context_card``
      message returned by a previous ``before_model()`` call.
    - The middleware strips that synthetic message back out before persisting or
      measuring token usage so OracleAgentMemory only stores real user/assistant turns
      and the compaction threshold is based on the organic conversation.
    - When compaction triggers, the middleware replaces the message history with one
      context-card message plus the most recent raw turns. On the next turn, that
      same injected message is seen again and filtered out before recomputing the
      next compacted prompt.
    """

    def __init__(
        self,
        memory: OracleAgentMemory,
        thread_id: str,
        user_id: str,
        agent_id: str,
        compaction_token_trigger: int,
        kept_message_count: int,
    ) -> None:
        self._thread = memory.create_thread(
            thread_id=thread_id,
            user_id=user_id,
            agent_id=agent_id,
            context_summary_update_frequency=4,
        )
        self._compaction_token_trigger = int(compaction_token_trigger)
        self._kept_message_count = int(kept_message_count)
        self._persisted_message_ids: set[str] = set()

    def before_model(
        self,
        state: dict[str, Any],
        runtime: Runtime[Any],
    ) -> dict[str, Any] | None:
        del runtime
        messages = list(state["messages"])
        #^ This will contain the context card message once the compaction occurs
        raw_messages = [message for message in messages if not _is_context_card_message(message)]
        self._persist_new_messages(raw_messages)

        #we exclude the context card from the token counting
        if count_tokens_approximately(raw_messages) < self._compaction_token_trigger:
            return None

        context_card = self._thread.get_context_card().content
        if not context_card:
            context_card = "<context_card>\n  No relevant short-term context yet.\n</context_card>"
        return {
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES),  #Clear existing message state.
                HumanMessage(content=context_card, name="memory_context_card"),
                *raw_messages[-self._kept_message_count :],
            ]
        }

    def _persist_new_messages(self, messages: list[BaseMessage]) -> None:
        persisted: list[dict[str, str]] = []
        for message in messages:
            #Persist only the conversational roles that map directly to short-
            #term memory turns. Tool/system/synthetic messages are skipped here.
            role = (
                "user"
                if isinstance(message, HumanMessage)
                else "assistant" if isinstance(message, AIMessage) else None
            )
            if role is None:
                continue

            content = _message_text(message).strip()
            if not content:
                continue

            #LangGraph messages usually have stable IDs. When they do not, fall back
            #to a content-derived key so the same turn is not persisted repeatedly if
            #the caller reuses the returned message list across later invocations.
            message_id = str(getattr(message, "id", "") or f"{role}:{hash(content)}")
            if message_id in self._persisted_message_ids:
                continue

            #Track what this middleware instance has already written so each real turn
            #is added to Oracle once even though later turns may still carry the same
            #messages in the LangGraph state.
            self._persisted_message_ids.add(message_id)
            persisted.append({"role": role, "content": content})

        if persisted:
            self._thread.add_messages(persisted)

short_term_middleware = OracleShortTermMemoryMiddleware(
    memory=agent_memory,
    thread_id=thread_id,
    user_id=user_id,
    agent_id=agent_id,
    compaction_token_trigger=120,
    kept_message_count=3,
)
agent = create_agent(
    model=langgraph_llm,
    tools=[],
    middleware=[short_term_middleware],
)

##Answer later turns with the middleware backed agent

messages: list[BaseMessage] = []

def print_current_context_card(messages: list[BaseMessage]) -> None:
    for message in messages:
        if _is_context_card_message(message):
            print(_message_text(message))
            return
    print("<context_card>\n  No injected context card yet.\n</context_card>")

def run_turn(user_text: str) -> str:
    messages.append(HumanMessage(content=user_text))
    result = agent.invoke({"messages": messages})
    messages[:] = list(result["messages"])
    assistant_message = next(
        message for message in reversed(messages) if isinstance(message, AIMessage)
    )
    return _message_text(assistant_message)

run_turn(
    "I'm Maya. I'm migrating our nightly invoice reconciliation workflow "
    "from cron jobs to LangGraph."
)
run_turn("The failing step right now is ledger enrichment after reconciliation.")
final_answer = run_turn(
    "What workflow am I migrating, which step is failing, and who am I?"
)

print_current_context_card(messages)
#<context_card>
#<topics>
#<topic>invoice reconciliation migration</topic>
#<topic>ledger enrichment failure</topic>
#...
#</topics>
#<summary>
#Maya is migrating the nightly invoice reconciliation workflow from cron jobs
#to LangGraph. The failing step is ledger enrichment after reconciliation.
#</summary>
#...
#</context_card>
print(final_answer)
#You're Maya, migrating your nightly invoice reconciliation workflow from cron jobs
#to LangGraph, and the ledger-enrichment step after reconciliation is currently failing.