Usa API a breve termine memoria agente con LangGraph

Le applicazioni LangGraph spesso hanno bisogno di preservare il contesto di lavoro recente senza passare la conversazione completa al modello ad ogni turno.

Agent Memory espone due diversi aiutanti a breve termine per questo problema:

get_summary() restituisce un oggetto OracleSummary il cui content comprime la trascrizione del thread. Preferiscilo quando la compattazione richiede solo la compressione della trascrizione.
get_context_card() restituisce un oggetto OracleContextCard il cui content è un blocco di contesto pronto per il prompt con riepilogo dei thread, argomenti di recupero, record permanenti rilevanti e messaggi raw recenti. Preferiscilo quando la compattazione dovrebbe mantenere il contesto di recupero-consapevole per il turno corrente.

In questo articolo, verrà utilizzato il middleware LangGraph intorno a un agente predefinito in modo che la memoria dell'agente possa rendere persistenti i turni automaticamente e iniettare una scheda di contesto Oracle quando il prompt in esecuzione diventa troppo grande. Il middleware compatta il prompt dopo aver superato una soglia configurata. Questo esempio sceglie get_context_card() perché la compattazione deve preservare il contesto che riconosce il recupero, non solo un riepilogo della trascrizione.

Avvertenza: i sintetici, le schede contesto, i record recuperati e le memorie estratte automaticamente sono testo derivato dal modello o recuperato e devono essere considerati non attendibili. Quando l'estrazione o il riepilogo automatico è abilitato, tale testo può essere riutilizzato anche dall'SDK in prompt successivi, quali l'estrazione della memoria, il riepilogo, la scheda di contesto o i prompt agente, prima che l'applicazione abbia l'opportunità di rivedere il valore intermedio specifico. Esaminare gli output consumati dall'applicazione, evitare di lasciare che il testo derivato dalla memoria autorizzi azioni con privilegi e utilizzare extract_memories=False o scritture di memoria esplicite quando il flusso di lavoro richiede la revisione prima che il testo derivato possa influenzare l'estrazione successiva o la costruzione del contesto.

In questo articolo imparerai come:

configurare la memoria agente con un modello Embedder, un LLM di memoria Oracle e un modello ChatOpenAI LangGraph
avvolgere un agente LangGraph predefinito con middleware che persiste nuovi turni e inietta l'output get_context_card().content quando la pressione del token aumenta e viene avviata la compattazione
risposta a turni successivi con contesto a breve termine dal thread di memoria dell'agente invece di reinviare la trascrizione completa

Suggerimento: per l'impostazione dei package, vedere Introduzione alla memoria agente. Se hai bisogno di un Oracle AI Database locale per questo esempio, vedere Esegui Oracle AI Database localmente.

Configura memoria agente e modelli LangGraph

Creare un client di memoria agente con una connessione o un pool Oracle DB, configurare un valore Embedder per la ricerca vettoriale, fornire un LLM di memoria Oracle per la risoluzione delle schede di contesto e utilizzare ChatOpenAI per l'agente LangGraph.

from typing import Any

import oracledb

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, RemoveMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_openai import ChatOpenAI
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.runtime import Runtime

from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory

embedder = Embedder(
    model="YOUR_EMBEDDING_MODEL",
    api_base="YOUR_EMBEDDING_BASE_URL",
    api_key="YOUR_EMBEDDING_API_KEY",
)
memory_llm = Llm(
    model="YOUR_MEMORY_LLM_MODEL",
    api_base="YOUR_MEMORY_LLM_BASE_URL",
    api_key="YOUR_MEMORY_LLM_API_KEY",
    temperature=0,
)
langgraph_llm = ChatOpenAI(
    model="YOUR_CHAT_MODEL",
    base_url="YOUR_CHAT_BASE_URL",
    api_key="YOUR_CHAT_API_KEY",
    temperature=0,
)
db_pool = oracledb.SessionPool(
    user="YOUR DB USER",
    password="YOUR DB PASSWORD",
    dsn="YOUR DB CONNECT STRING",
)

agent_memory = OracleAgentMemory(
    connection=db_pool,
    embedder=embedder,
    llm=memory_llm,
)
thread_id = "langgraph_short_term_demo"
user_id = "user_123"
agent_id = "assistant_456"

Configura middleware e un agente predefinito

Il middleware rimane nuovo utente e l'assistente si trasforma in memoria dell'agente. Una volta che il prompt in esecuzione supera la soglia di un token, compatta lo stato sostituendo l'elenco completo dei messaggi con un messaggio memory_context_card sintetico più una piccola coda delle curve raw più recenti. Ciò mantiene lo stato LangGraph compatto pur offrendo all'agente predefinito un contesto a breve termine consapevole del recupero.

Questo articolo utilizza la compattazione basata su token, ma è possibile adattare lo stesso pattern ad altri criteri, ad esempio compattando ogni pochi giri o dopo un trigger specifico dell'applicazione. Se si implementa la compattazione solo trascrizione, chiamare summary = thread.get_summary(...) e leggere summary.content; non considerare get_summary() come un elenco di messaggi.

def _message_text(message: BaseMessage | Any) -> str:
    content = getattr(message, "content", "")
    if isinstance(content, str):
        return content
    return str(content)


def _is_context_card_message(message: BaseMessage) -> bool:
    return isinstance(message, HumanMessage) and (
        getattr(message, "name", None) == "memory_context_card"
    )


class OracleShortTermMemoryMiddleware(AgentMiddleware):
    """Persist LangGraph turns and compact prompts with an OracleAgentMemory context card.

    Notes
    -----
    - ``before_model()`` receives the current LangGraph message state for this turn.
      After compaction, that state already includes the synthetic ``memory_context_card``
      message returned by a previous ``before_model()`` call.
    - The middleware strips that synthetic message back out before persisting or
      measuring token usage so OracleAgentMemory only stores real user/assistant turns
      and the compaction threshold is based on the organic conversation.
    - When compaction triggers, the middleware replaces the message history with one
      context-card message plus the most recent raw turns. On the next turn, that
      same injected message is seen again and filtered out before recomputing the
      next compacted prompt.
    """

    def __init__(
        self,
        memory: OracleAgentMemory,
        thread_id: str,
        user_id: str,
        agent_id: str,
        compaction_token_trigger: int,
        kept_message_count: int,
    ) -> None:
        self._thread = memory.create_thread(
            thread_id=thread_id,
            user_id=user_id,
            agent_id=agent_id,
            context_summary_update_frequency=4,
        )
        self._compaction_token_trigger = int(compaction_token_trigger)
        self._kept_message_count = int(kept_message_count)
        self._persisted_message_ids: set[str] = set()

    def before_model(
        self,
        state: dict[str, Any],
        runtime: Runtime[Any],
    ) -> dict[str, Any] | None:
        del runtime
        messages = list(state["messages"])
        #^ This will contain the context card message once the compaction occurs
        raw_messages = [message for message in messages if not _is_context_card_message(message)]
        self._persist_new_messages(raw_messages)

        #we exclude the context card from the token counting
        if count_tokens_approximately(raw_messages) < self._compaction_token_trigger:
            return None

        context_card = self._thread.get_context_card().content
        if not context_card:
            context_card = "<context_card>\n  No relevant short-term context yet.\n</context_card>"
        return {
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES),  #Clear existing message state.
                HumanMessage(content=context_card, name="memory_context_card"),
                *raw_messages[-self._kept_message_count :],
            ]
        }

    def _persist_new_messages(self, messages: list[BaseMessage]) -> None:
        persisted: list[dict[str, str]] = []
        for message in messages:
            #Persist only the conversational roles that map directly to short-
            #term memory turns. Tool/system/synthetic messages are skipped here.
            role = (
                "user"
                if isinstance(message, HumanMessage)
                else "assistant" if isinstance(message, AIMessage) else None
            )
            if role is None:
                continue

            content = _message_text(message).strip()
            if not content:
                continue

            #LangGraph messages usually have stable IDs. When they do not, fall back
            #to a content-derived key so the same turn is not persisted repeatedly if
            #the caller reuses the returned message list across later invocations.
            message_id = str(getattr(message, "id", "") or f"{role}:{hash(content)}")
            if message_id in self._persisted_message_ids:
                continue

            #Track what this middleware instance has already written so each real turn
            #is added to Oracle once even though later turns may still carry the same
            #messages in the LangGraph state.
            self._persisted_message_ids.add(message_id)
            persisted.append({"role": role, "content": content})

        if persisted:
            self._thread.add_messages(persisted)


short_term_middleware = OracleShortTermMemoryMiddleware(
    memory=agent_memory,
    thread_id=thread_id,
    user_id=user_id,
    agent_id=agent_id,
    compaction_token_trigger=120,
    kept_message_count=3,
)
agent = create_agent(
    model=langgraph_llm,
    tools=[],
    middleware=[short_term_middleware],
)

Risposta in un secondo momento con contesto iniettato da Middleware

L'aggiunta dell'utente passa all'elenco dei messaggi in esecuzione dell'agente predefinito e consente al middleware di decidere quando inserire una scheda contesto. Quando arriva il turno successivo, l'agente può rispondere da uno stato compatto che contiene ancora il contesto a breve termine della memoria dell'agente. L'esempio stampa la scheda di contesto iniettata e include un campione tagliato in modo da poter ispezionare la compattazione inserita nel prompt senza scaricare l'intero blocco in linea.

messages: list[BaseMessage] = []


def print_current_context_card(messages: list[BaseMessage]) -> None:
    for message in messages:
        if _is_context_card_message(message):
            print(_message_text(message))
            return
    print("<context_card>\n  No injected context card yet.\n</context_card>")


def run_turn(user_text: str) -> str:
    messages.append(HumanMessage(content=user_text))
    result = agent.invoke({"messages": messages})
    messages[:] = list(result["messages"])
    assistant_message = next(
        message for message in reversed(messages) if isinstance(message, AIMessage)
    )
    return _message_text(assistant_message)


run_turn(
    "I'm Maya. I'm migrating our nightly invoice reconciliation workflow "
    "from cron jobs to LangGraph."
)
run_turn("The failing step right now is ledger enrichment after reconciliation.")
final_answer = run_turn(
    "What workflow am I migrating, which step is failing, and who am I?"
)

print_current_context_card(messages)
#<context_card>
#<topics>
#<topic>invoice reconciliation migration</topic>
#<topic>ledger enrichment failure</topic>
#...
#</topics>
#<summary>
#Maya is migrating the nightly invoice reconciliation workflow from cron jobs
#to LangGraph. The failing step is ledger enrichment after reconciliation.
#</summary>
#...
#</context_card>
print(final_answer)
#You're Maya, migrating your nightly invoice reconciliation workflow from cron jobs
#to LangGraph, and the ledger-enrichment step after reconciliation is currently failing.

Conclusione

In questo articolo, è stato descritto come distinguere get_summary().content da get_context_card().content, configurare il contesto a breve termine della memoria agente attorno a un agente LangGraph predefinito e consentire al middleware di compattare il prompt con una scheda contesto quando la conversazione diventa troppo grande per mantenerla parola per parola.

Suggerimento: dopo aver appreso come aggiungere un contesto thread a breve termine a un flusso LangGraph, è ora possibile passare a Usa memoria agente con LangGraph.

Codice completo

#Copyright © 2026 Oracle and/or its affiliates.
#This software is under the Apache License 2.0
#(LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
#(UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.

#Oracle Agent Memory Code Example - LangGraph Short-Term Memory
#--------------------------------------------------------------

##Configure Oracle Agent Memory and LangGraph models for short term context

from typing import Any

import oracledb

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, RemoveMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_openai import ChatOpenAI
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.runtime import Runtime

from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory

embedder = Embedder(
    model="YOUR_EMBEDDING_MODEL",
    api_base="YOUR_EMBEDDING_BASE_URL",
    api_key="YOUR_EMBEDDING_API_KEY",
)
memory_llm = Llm(
    model="YOUR_MEMORY_LLM_MODEL",
    api_base="YOUR_MEMORY_LLM_BASE_URL",
    api_key="YOUR_MEMORY_LLM_API_KEY",
    temperature=0,
)
langgraph_llm = ChatOpenAI(
    model="YOUR_CHAT_MODEL",
    base_url="YOUR_CHAT_BASE_URL",
    api_key="YOUR_CHAT_API_KEY",
    temperature=0,
)
db_pool = oracledb.SessionPool(
    user="YOUR DB USER",
    password="YOUR DB PASSWORD",
    dsn="YOUR DB CONNECT STRING",
)

agent_memory = OracleAgentMemory(
    connection=db_pool,
    embedder=embedder,
    llm=memory_llm,
)
thread_id = "langgraph_short_term_demo"
user_id = "user_123"
agent_id = "assistant_456"

##Configure short term memory middleware and a prebuilt LangGraph agent

def _message_text(message: BaseMessage | Any) -> str:
    content = getattr(message, "content", "")
    if isinstance(content, str):
        return content
    return str(content)

def _is_context_card_message(message: BaseMessage) -> bool:
    return isinstance(message, HumanMessage) and (
        getattr(message, "name", None) == "memory_context_card"
    )

class OracleShortTermMemoryMiddleware(AgentMiddleware):
    """Persist LangGraph turns and compact prompts with an OracleAgentMemory context card.

    Notes
    -----
    - ``before_model()`` receives the current LangGraph message state for this turn.
      After compaction, that state already includes the synthetic ``memory_context_card``
      message returned by a previous ``before_model()`` call.
    - The middleware strips that synthetic message back out before persisting or
      measuring token usage so OracleAgentMemory only stores real user/assistant turns
      and the compaction threshold is based on the organic conversation.
    - When compaction triggers, the middleware replaces the message history with one
      context-card message plus the most recent raw turns. On the next turn, that
      same injected message is seen again and filtered out before recomputing the
      next compacted prompt.
    """

    def __init__(
        self,
        memory: OracleAgentMemory,
        thread_id: str,
        user_id: str,
        agent_id: str,
        compaction_token_trigger: int,
        kept_message_count: int,
    ) -> None:
        self._thread = memory.create_thread(
            thread_id=thread_id,
            user_id=user_id,
            agent_id=agent_id,
            context_summary_update_frequency=4,
        )
        self._compaction_token_trigger = int(compaction_token_trigger)
        self._kept_message_count = int(kept_message_count)
        self._persisted_message_ids: set[str] = set()

    def before_model(
        self,
        state: dict[str, Any],
        runtime: Runtime[Any],
    ) -> dict[str, Any] | None:
        del runtime
        messages = list(state["messages"])
        #^ This will contain the context card message once the compaction occurs
        raw_messages = [message for message in messages if not _is_context_card_message(message)]
        self._persist_new_messages(raw_messages)

        #we exclude the context card from the token counting
        if count_tokens_approximately(raw_messages) < self._compaction_token_trigger:
            return None

        context_card = self._thread.get_context_card().content
        if not context_card:
            context_card = "<context_card>\n  No relevant short-term context yet.\n</context_card>"
        return {
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES),  #Clear existing message state.
                HumanMessage(content=context_card, name="memory_context_card"),
                *raw_messages[-self._kept_message_count :],
            ]
        }

    def _persist_new_messages(self, messages: list[BaseMessage]) -> None:
        persisted: list[dict[str, str]] = []
        for message in messages:
            #Persist only the conversational roles that map directly to short-
            #term memory turns. Tool/system/synthetic messages are skipped here.
            role = (
                "user"
                if isinstance(message, HumanMessage)
                else "assistant" if isinstance(message, AIMessage) else None
            )
            if role is None:
                continue

            content = _message_text(message).strip()
            if not content:
                continue

            #LangGraph messages usually have stable IDs. When they do not, fall back
            #to a content-derived key so the same turn is not persisted repeatedly if
            #the caller reuses the returned message list across later invocations.
            message_id = str(getattr(message, "id", "") or f"{role}:{hash(content)}")
            if message_id in self._persisted_message_ids:
                continue

            #Track what this middleware instance has already written so each real turn
            #is added to Oracle once even though later turns may still carry the same
            #messages in the LangGraph state.
            self._persisted_message_ids.add(message_id)
            persisted.append({"role": role, "content": content})

        if persisted:
            self._thread.add_messages(persisted)

short_term_middleware = OracleShortTermMemoryMiddleware(
    memory=agent_memory,
    thread_id=thread_id,
    user_id=user_id,
    agent_id=agent_id,
    compaction_token_trigger=120,
    kept_message_count=3,
)
agent = create_agent(
    model=langgraph_llm,
    tools=[],
    middleware=[short_term_middleware],
)

##Answer later turns with the middleware backed agent

messages: list[BaseMessage] = []

def print_current_context_card(messages: list[BaseMessage]) -> None:
    for message in messages:
        if _is_context_card_message(message):
            print(_message_text(message))
            return
    print("<context_card>\n  No injected context card yet.\n</context_card>")

def run_turn(user_text: str) -> str:
    messages.append(HumanMessage(content=user_text))
    result = agent.invoke({"messages": messages})
    messages[:] = list(result["messages"])
    assistant_message = next(
        message for message in reversed(messages) if isinstance(message, AIMessage)
    )
    return _message_text(assistant_message)

run_turn(
    "I'm Maya. I'm migrating our nightly invoice reconciliation workflow "
    "from cron jobs to LangGraph."
)
run_turn("The failing step right now is ledger enrichment after reconciliation.")
final_answer = run_turn(
    "What workflow am I migrating, which step is failing, and who am I?"
)

print_current_context_card(messages)
#<context_card>
#<topics>
#<topic>invoice reconciliation migration</topic>
#<topic>ledger enrichment failure</topic>
#...
#</topics>
#<summary>
#Maya is migrating the nightly invoice reconciliation workflow from cron jobs
#to LangGraph. The failing step is ledger enrichment after reconciliation.
#</summary>
#...
#</context_card>
print(final_answer)
#You're Maya, migrating your nightly invoice reconciliation workflow from cron jobs
#to LangGraph, and the ledger-enrichment step after reconciliation is currently failing.