Utiliser des API à court terme de mémoire d'agent avec LangGraph

Les applications LangGraph doivent souvent préserver le contexte de travail récent sans transmettre la conversation complète au modèle à chaque tour.

La mémoire de l'agent présente deux aides à court terme différentes pour ce problème :

Dans cet article, vous allez utiliser le middleware LangGraph autour d'un agent prédéfini afin que la mémoire de l'agent puisse persister automatiquement et injecter une carte de contexte Oracle lorsque l'invite d'exécution devient trop volumineuse. Le middleware compacte l'invite après avoir dépassé un seuil configuré. Cet exemple choisit get_context_card() car le compactage doit conserver le contexte de récupération, et pas seulement un récapitulatif de transcription.

Avertissement : Les résumés, les cartes de contexte, les enregistrements extraits et les mémoires extraites automatiquement sont du texte dérivé du modèle ou extrait et doivent être traités comme non sécurisés. Lorsque l'extraction automatique ou l'agrégation est activée, ce texte peut également être réutilisé par le kit SDK dans des invites ultérieures, telles que l'extraction de mémoire, l'agrégation, la carte contextuelle ou les invites d'agent, avant que l'application ne puisse vérifier la valeur intermédiaire spécifique. Vérifiez les sorties consommées par votre application, évitez de laisser du texte dérivé de la mémoire autoriser des actions privilégiées et utilisez extract_memories=False ou des écritures de mémoire explicites lorsque votre workflow nécessite une révision avant que le texte dérivé puisse influencer l'extraction ultérieure ou la construction du contexte.

Dans cet article, vous apprendrez à :

A savoir : Pour la configuration des packages, reportez-vous à Introduction à la mémoire de l'agent. Si vous avez besoin d'une instance Oracle AI Database locale pour cet exemple, reportez-vous à Exécution locale d'Oracle AI Database.

Configurer la mémoire de l'agent et les modèles LangGraph

Créez un client de mémoire d'agent avec une connexion ou un pool Oracle DB, configurez une valeur Embedder pour la recherche vectorielle, fournissez un LLM de mémoire Oracle pour la résolution de carte contextuelle et utilisez ChatOpenAI pour l'agent LangGraph.

from typing import Any

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, RemoveMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_openai import ChatOpenAI
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.runtime import Runtime

from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory

embedder = Embedder(
    model="YOUR_EMBEDDING_MODEL",
    api_base="YOUR_EMBEDDING_BASE_URL",
    api_key="YOUR_EMBEDDING_API_KEY",
)
memory_llm = Llm(
    model="YOUR_MEMORY_LLM_MODEL",
    api_base="YOUR_MEMORY_LLM_BASE_URL",
    api_key="YOUR_MEMORY_LLM_API_KEY",
    temperature=0,
)
langgraph_llm = ChatOpenAI(
    model="YOUR_CHAT_MODEL",
    base_url="YOUR_CHAT_BASE_URL",
    api_key="YOUR_CHAT_API_KEY",
    temperature=0,
)
db_pool = ...  #an oracledb connection or connection pool


agent_memory = OracleAgentMemory(
    connection=db_pool,
    embedder=embedder,
    llm=memory_llm,
)
thread_id = "langgraph_short_term_demo"
user_id = "user_123"
agent_id = "assistant_456"

Configurer le middleware et un agent prédéfini

Le middleware conserve les nouveaux utilisateurs et l'assistant se transforme en mémoire d'agent. Une fois que l'invite d'exécution dépasse un seuil de jeton, elle compacte l'état en remplaçant la liste complète des messages par un message memory_context_card synthétique plus une petite queue des derniers virages bruts. Cela permet de garder l'état LangGraph compact tout en fournissant le contexte à court terme prédéfini de récupération d'agent.

Cet article utilise le compactage par jeton, mais vous pouvez adapter le même modèle à d'autres stratégies, telles que le compactage de quelques tours ou après un déclencheur propre à l'application. Si vous implémentez le compactage par transcription uniquement, appelez summary = thread.get_summary(...) et lisez summary.content. Ne traitez pas get_summary() comme une liste de messages.

def _message_text(message: BaseMessage | Any) -> str:
    content = getattr(message, "content", "")
    if isinstance(content, str):
        return content
    return str(content)


def _is_context_card_message(message: BaseMessage) -> bool:
    return isinstance(message, HumanMessage) and (
        getattr(message, "name", None) == "memory_context_card"
    )


class OracleShortTermMemoryMiddleware(AgentMiddleware):
    """Persist LangGraph turns and compact prompts with an OracleAgentMemory context card.

    Notes
    -----
    - ``before_model()`` receives the current LangGraph message state for this turn.
      After compaction, that state already includes the synthetic ``memory_context_card``
      message returned by a previous ``before_model()`` call.
    - The middleware strips that synthetic message back out before persisting or
      measuring token usage so OracleAgentMemory only stores real user/assistant turns
      and the compaction threshold is based on the organic conversation.
    - When compaction triggers, the middleware replaces the message history with one
      context-card message plus the most recent raw turns. On the next turn, that
      same injected message is seen again and filtered out before recomputing the
      next compacted prompt.
    """

    def __init__(
        self,
        memory: OracleAgentMemory,
        thread_id: str,
        user_id: str,
        agent_id: str,
        compaction_token_trigger: int,
        kept_message_count: int,
    ) -> None:
        self._thread = memory.create_thread(
            thread_id=thread_id,
            user_id=user_id,
            agent_id=agent_id,
            context_summary_update_frequency=4,
        )
        self._compaction_token_trigger = int(compaction_token_trigger)
        self._kept_message_count = int(kept_message_count)
        self._persisted_message_ids: set[str] = set()

    def before_model(
        self,
        state: dict[str, Any],
        runtime: Runtime[Any],
    ) -> dict[str, Any] | None:
        del runtime
        messages = list(state["messages"])
        #^ This will contain the context card message once the compaction occurs
        raw_messages = [message for message in messages if not _is_context_card_message(message)]
        self._persist_new_messages(raw_messages)

        #we exclude the context card from the token counting
        if count_tokens_approximately(raw_messages) < self._compaction_token_trigger:
            return None

        context_card = self._thread.get_context_card().content
        if not context_card:
            context_card = "<context_card>\n  No relevant short-term context yet.\n</context_card>"
        return {
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES),  #Clear existing message state.
                HumanMessage(content=context_card, name="memory_context_card"),
                *raw_messages[-self._kept_message_count :],
            ]
        }

    def _persist_new_messages(self, messages: list[BaseMessage]) -> None:
        persisted: list[dict[str, str]] = []
        for message in messages:
            #Persist only the conversational roles that map directly to short-
            #term memory turns. Tool/system/synthetic messages are skipped here.
            role = (
                "user"
                if isinstance(message, HumanMessage)
                else "assistant" if isinstance(message, AIMessage) else None
            )
            if role is None:
                continue

            content = _message_text(message).strip()
            if not content:
                continue

            #LangGraph messages usually have stable IDs. When they do not, fall back
            #to a content-derived key so the same turn is not persisted repeatedly if
            #the caller reuses the returned message list across later invocations.
            message_id = str(getattr(message, "id", "") or f"{role}:{hash(content)}")
            if message_id in self._persisted_message_ids:
                continue

            #Track what this middleware instance has already written so each real turn
            #is added to Oracle once even though later turns may still carry the same
            #messages in the LangGraph state.
            self._persisted_message_ids.add(message_id)
            persisted.append({"role": role, "content": content})

        if persisted:
            self._thread.add_messages(persisted)


short_term_middleware = OracleShortTermMemoryMiddleware(
    memory=agent_memory,
    thread_id=thread_id,
    user_id=user_id,
    agent_id=agent_id,
    compaction_token_trigger=120,
    kept_message_count=3,
)
agent = create_agent(
    model=langgraph_llm,
    tools=[],
    middleware=[short_term_middleware],
)

Réponse ultérieure avec contexte d'injection de middleware

L'utilisateur d'ajout se tourne vers la liste des messages en cours d'exécution de l'agent prédéfini et laisse le middleware décider quand injecter une carte de contexte. Lorsque le tour suivant arrive, l'agent peut répondre à partir d'un état compact qui contient toujours le contexte à court terme de la mémoire de l'agent. Cet exemple imprime la carte de contexte injectée et inclut un échantillon tronqué afin que vous puissiez inspecter le compactage inséré dans l'invite sans vider le bloc complet en ligne.

messages: list[BaseMessage] = []


def print_current_context_card(messages: list[BaseMessage]) -> None:
    for message in messages:
        if _is_context_card_message(message):
            print(_message_text(message))
            return
    print("<context_card>\n  No injected context card yet.\n</context_card>")


def run_turn(user_text: str) -> str:
    messages.append(HumanMessage(content=user_text))
    result = agent.invoke({"messages": messages})
    messages[:] = list(result["messages"])
    assistant_message = next(
        message for message in reversed(messages) if isinstance(message, AIMessage)
    )
    return _message_text(assistant_message)


run_turn(
    "I'm Maya. I'm migrating our nightly invoice reconciliation workflow "
    "from cron jobs to LangGraph."
)
run_turn("The failing step right now is ledger enrichment after reconciliation.")
final_answer = run_turn(
    "What workflow am I migrating, which step is failing, and who am I?"
)

print_current_context_card(messages)
#<context_card>
#<topics>
#<topic>invoice reconciliation migration</topic>
#<topic>ledger enrichment failure</topic>
#...
#</topics>
#<summary>
#Maya is migrating the nightly invoice reconciliation workflow from cron jobs
#to LangGraph. The failing step is ledger enrichment after reconciliation.
#</summary>
#...
#</context_card>
print(final_answer)
#You're Maya, migrating your nightly invoice reconciliation workflow from cron jobs
#to LangGraph, and the ledger-enrichment step after reconciliation is currently failing.

Conclusion

Dans cet article, vous avez appris à distinguer get_summary().content de get_context_card().content, à configurer le contexte à court terme de la mémoire de l'agent autour d'un agent LangGraph prédéfini et à laisser le middleware compacter l'invite avec une carte de contexte lorsque la conversation devient trop volumineuse pour rester mot pour mot.

A savoir : Après avoir appris à ajouter un contexte de thread à court terme à un flux LangGraph, vous pouvez maintenant passer à Utiliser la mémoire d'agent avec LangGraph.

Code complet

#Copyright © 2026 Oracle and/or its affiliates.
#This software is under the Apache License 2.0
#(LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
#(UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.

#Oracle Agent Memory Code Example - LangGraph Short-Term Memory
#--------------------------------------------------------------

##Configure Oracle Agent Memory and LangGraph models for short term context

from typing import Any

from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, RemoveMessage
from langchain_core.messages.utils import count_tokens_approximately
from langchain_openai import ChatOpenAI
from langgraph.graph.message import REMOVE_ALL_MESSAGES
from langgraph.runtime import Runtime

from oracleagentmemory.core.embedders.embedder import Embedder
from oracleagentmemory.core.llms.llm import Llm
from oracleagentmemory.core.oracleagentmemory import OracleAgentMemory

embedder = Embedder(
    model="YOUR_EMBEDDING_MODEL",
    api_base="YOUR_EMBEDDING_BASE_URL",
    api_key="YOUR_EMBEDDING_API_KEY",
)
memory_llm = Llm(
    model="YOUR_MEMORY_LLM_MODEL",
    api_base="YOUR_MEMORY_LLM_BASE_URL",
    api_key="YOUR_MEMORY_LLM_API_KEY",
    temperature=0,
)
langgraph_llm = ChatOpenAI(
    model="YOUR_CHAT_MODEL",
    base_url="YOUR_CHAT_BASE_URL",
    api_key="YOUR_CHAT_API_KEY",
    temperature=0,
)
db_pool = ...  #an oracledb connection or connection pool

agent_memory = OracleAgentMemory(
    connection=db_pool,
    embedder=embedder,
    llm=memory_llm,
)
thread_id = "langgraph_short_term_demo"
user_id = "user_123"
agent_id = "assistant_456"

##Configure short term memory middleware and a prebuilt LangGraph agent

def _message_text(message: BaseMessage | Any) -> str:
    content = getattr(message, "content", "")
    if isinstance(content, str):
        return content
    return str(content)

def _is_context_card_message(message: BaseMessage) -> bool:
    return isinstance(message, HumanMessage) and (
        getattr(message, "name", None) == "memory_context_card"
    )

class OracleShortTermMemoryMiddleware(AgentMiddleware):
    """Persist LangGraph turns and compact prompts with an OracleAgentMemory context card.

    Notes
    -----
    - ``before_model()`` receives the current LangGraph message state for this turn.
      After compaction, that state already includes the synthetic ``memory_context_card``
      message returned by a previous ``before_model()`` call.
    - The middleware strips that synthetic message back out before persisting or
      measuring token usage so OracleAgentMemory only stores real user/assistant turns
      and the compaction threshold is based on the organic conversation.
    - When compaction triggers, the middleware replaces the message history with one
      context-card message plus the most recent raw turns. On the next turn, that
      same injected message is seen again and filtered out before recomputing the
      next compacted prompt.
    """

    def __init__(
        self,
        memory: OracleAgentMemory,
        thread_id: str,
        user_id: str,
        agent_id: str,
        compaction_token_trigger: int,
        kept_message_count: int,
    ) -> None:
        self._thread = memory.create_thread(
            thread_id=thread_id,
            user_id=user_id,
            agent_id=agent_id,
            context_summary_update_frequency=4,
        )
        self._compaction_token_trigger = int(compaction_token_trigger)
        self._kept_message_count = int(kept_message_count)
        self._persisted_message_ids: set[str] = set()

    def before_model(
        self,
        state: dict[str, Any],
        runtime: Runtime[Any],
    ) -> dict[str, Any] | None:
        del runtime
        messages = list(state["messages"])
        #^ This will contain the context card message once the compaction occurs
        raw_messages = [message for message in messages if not _is_context_card_message(message)]
        self._persist_new_messages(raw_messages)

        #we exclude the context card from the token counting
        if count_tokens_approximately(raw_messages) < self._compaction_token_trigger:
            return None

        context_card = self._thread.get_context_card().content
        if not context_card:
            context_card = "<context_card>\n  No relevant short-term context yet.\n</context_card>"
        return {
            "messages": [
                RemoveMessage(id=REMOVE_ALL_MESSAGES),  #Clear existing message state.
                HumanMessage(content=context_card, name="memory_context_card"),
                *raw_messages[-self._kept_message_count :],
            ]
        }

    def _persist_new_messages(self, messages: list[BaseMessage]) -> None:
        persisted: list[dict[str, str]] = []
        for message in messages:
            #Persist only the conversational roles that map directly to short-
            #term memory turns. Tool/system/synthetic messages are skipped here.
            role = (
                "user"
                if isinstance(message, HumanMessage)
                else "assistant" if isinstance(message, AIMessage) else None
            )
            if role is None:
                continue

            content = _message_text(message).strip()
            if not content:
                continue

            #LangGraph messages usually have stable IDs. When they do not, fall back
            #to a content-derived key so the same turn is not persisted repeatedly if
            #the caller reuses the returned message list across later invocations.
            message_id = str(getattr(message, "id", "") or f"{role}:{hash(content)}")
            if message_id in self._persisted_message_ids:
                continue

            #Track what this middleware instance has already written so each real turn
            #is added to Oracle once even though later turns may still carry the same
            #messages in the LangGraph state.
            self._persisted_message_ids.add(message_id)
            persisted.append({"role": role, "content": content})

        if persisted:
            self._thread.add_messages(persisted)

short_term_middleware = OracleShortTermMemoryMiddleware(
    memory=agent_memory,
    thread_id=thread_id,
    user_id=user_id,
    agent_id=agent_id,
    compaction_token_trigger=120,
    kept_message_count=3,
)
agent = create_agent(
    model=langgraph_llm,
    tools=[],
    middleware=[short_term_middleware],
)

##Answer later turns with the middleware backed agent

messages: list[BaseMessage] = []

def print_current_context_card(messages: list[BaseMessage]) -> None:
    for message in messages:
        if _is_context_card_message(message):
            print(_message_text(message))
            return
    print("<context_card>\n  No injected context card yet.\n</context_card>")

def run_turn(user_text: str) -> str:
    messages.append(HumanMessage(content=user_text))
    result = agent.invoke({"messages": messages})
    messages[:] = list(result["messages"])
    assistant_message = next(
        message for message in reversed(messages) if isinstance(message, AIMessage)
    )
    return _message_text(assistant_message)

run_turn(
    "I'm Maya. I'm migrating our nightly invoice reconciliation workflow "
    "from cron jobs to LangGraph."
)
run_turn("The failing step right now is ledger enrichment after reconciliation.")
final_answer = run_turn(
    "What workflow am I migrating, which step is failing, and who am I?"
)

print_current_context_card(messages)
#<context_card>
#<topics>
#<topic>invoice reconciliation migration</topic>
#<topic>ledger enrichment failure</topic>
#...
#</topics>
#<summary>
#Maya is migrating the nightly invoice reconciliation workflow from cron jobs
#to LangGraph. The failing step is ledger enrichment after reconciliation.
#</summary>
#...
#</context_card>
print(final_answer)
#You're Maya, migrating your nightly invoice reconciliation workflow from cron jobs
#to LangGraph, and the ledger-enrichment step after reconciliation is currently failing.