Hosting an Endpoint in Generative AI

To use a Generative AI custom model for inference, you must first create an endpoint for that model. You can also create endpoints for the pretrained models that are available in the playground. An endpoint is a designated point on a dedicated AI cluster where a model can accept user requests and send back responses such as the model's generated text. You create an endpoint on a hosting dedicated AI cluster.

Note

  • Custom models are listed in the playground's model list after you create an active endpoint for them.
  • Each custom model can have more than one endpoint.
  • Each hosting dedicated AI cluster can host many endpoints. See the remaining endpoints on the dedicated AI cluster's detail page. If you no longer need a custom model's endpoint, you can delete that endpoint and use its associated dedicated AI cluster to host a new endpoint.

You can perform the following tasks to create and manage endpoints for custom models: