会話に対する短期メモリー圧縮の使用

会話APIでは、短期的なメモリー圧縮を有効にすると、OCI Generative AIは、会話の拡大に応じて、以前の会話履歴を自動的により小さな表現に圧縮します。これにより、トークンの使用状況とレイテンシを削減しながら、重要なコンテキストを維持できます。

リクエストを送信する場合、圧縮を管理する必要はありません。同じ会話IDでリクエストの送信を続行でき、サービスは圧縮を処理します。

例:

# first turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="I'm planning a team offsite. We prefer outdoor activities, a moderate budget, and vegetarian-friendly food options.",
    conversation=conversation1.id
)

# second turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="We also need the location to be within a two-hour drive from San Francisco.",
    conversation=conversation1.id
)

# third turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Please avoid destinations that are usually crowded on weekends.",
    conversation=conversation1.id
)

# fourth turn
response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Now recommend three offsite options based on those preferences.",
    conversation=conversation1.id
)

会話が増えるにつれ、OCI Generative AIは、早期のターンを自動的に圧縮しながら、後のレスポンスに必要な重要な詳細情報を保持できます。

Oracle Cloud Infrastructureドキュメント

会話に対する短期メモリー圧縮の使用