OCI Generative AI Now Supports AI Guardrails for On-Demand Mode

You can now enable AI guardrails for content moderation (CM), prompt injection (PI), and personally identifiable information (PII) in OCI Generative AI. This feature is available through API for on-demand chat and text embedding models in commercial regions.

Key Features

  • Content Moderation
    Aims to classify harmful content, such as hate speech, harassment, violence, and explicit material in prompts and responses using an internal model. Includes binary categories: OVERALL (unsafe language) and BLOCKLIST (predefined blocked words).
  • Prompt Injection Defense
    Aims to detect malicious instructions in prompts and embedded contexts (for example, hidden within documents) to help prevent unauthorized overrides, providing a binary risk score (0.0 for safe, 1.0 for risky).
  • Personally Identifiable Information and Privacy Protection
    Aims to identify sensitive data such as names (PERSON), email addresses (EMAIL), telephone numbers (TELEPHONE_NUMBER), and more. Results include details such as the detected text, its label, location (offset and length), and confidence score. For example, if Jane Smith is in the data, then you might get {"length": 10, "offset": 0, "text": "Jane Smith", "label": "PERSON", "score": 0.9990621507167816}.
Usage Options
  • On-Demand Models (API Only)
    For real-time evaluation without endpoints, use the ApplyGuardrails API to check inputs alongside inference. This applies to all chat and embedding models offered by OCI Generative AI in commercial regions. It returns detailed results such as moderation categories, PII entities, PI score for programmatic handling, without blocking by default.

For API examples and setup, see the Generative AI documentation.

Important

Disclaimer

Our Content Moderation (CM) and Prompt Injection (PI) guardrails have been evaluated on a range of multilingual benchmark datasets. However, actual performance may vary depending on the specific languages, domains, data distributions, and usage patterns present in customer-provided data as the content is generated by AI and may contain errors or omissions. Accordingly, it is intended for informational purposes only, should not be considered professional advice and OCI makes no guarantees that identical performance characteristics will be observed in all real-world deployments. The OCI Responsible AI team is continuously improving these models.

Our content moderation capabilities have been evaluated against RTPLX, one of the largest publicly available multilingual benchmarking datasets, covering more than 38 languages. However, these results should be interpreted with appropriate caution as the content is generated by AI and may contain errors or omissions. Multilingual evaluations are inherently bounded by the scope, representativeness, and annotation practices of public datasets, and performance observed on RTPLX may not fully generalize to all real-world contexts, domains, dialects, or usage patterns. Accordingly, the findings are intended to be informational purposes only and should not be considered professional advice.