Web Crawler Connector

Governed, real-time access to web content, protected by Content Intelligence security models, is now available to your AI Agent teams.  This connector offers:

  • Hybrid lexical and semantic search
  • Specify starting URL and crawl depth
  • Include/exclude file patterns
  • Basic authentication (username/password)

Business Benefit

  • Faster knowledge discovery and decision-making
  • Reduced manual search and content triage
  • Improved content reuse and consistency

Steps to enable and configure

Create one or more Web Crawler connectors

  • Access AI Agent Studio
  • Navigate to Credentials > Connectors
  • Add a new connector
  • Select "WebCrawler"
  • Fill in all of the required fields

Schedule the process to crawl

The "Content Intelligence SharePoint Connector Job" scheduled process should be scheduled to run regularly in order to regularly crawl your site.  If you are crawling a large site, note that the first execution may take quite a bit of time.  Subsequent runs will be incremental and therefore should complete more quickly.  It is recommended, as you are first setting up the connector, that you run this job ad hoc until you have everything configured correctly and your first crawl has completed.  Afterwards, run it no more frequently than every 30 minutes so that it has time to complete between executions.  Note that this same scheduled process is required for the SharePoint connector's sync, you only need to schedule it once to cover both.

(Optional) Update user groups of crawled pages

If you want the crawled pages to have different user groups from the one selected in the connector setup you will need to access authoring to review and update the user groups.  You can also mass update user groups.  You can access authoring by going to https://<your domain>/fscmUI/redwood/knowledgeauthoring/main?contentType=<connector code>.  Make sure to replace the bolded parts of the example url with your actual data.

Use connector in one or more agent teams

  • Create a tool for every Web Crawler connector that you have configured. 
  • Add your connector tool(s) to whatever agent(s) they will be useful in
  • Via your agent's prompt, tell it to use the tool in the circumstances where you want that data consulted

Tips and considerations

Ingesting and indexing large amounts of content will contribute to the total size of Content Intelligence content, which has a 4TB limit.