Search bots crawl site pages and index the pages based on content available in the HTML markup. Because Commerce Store Accelerator is a single-page application based on JavaScript, the full HTML markup for any given page is not available until after the JavaScript for the page has been processed. In order to provide a search bot with the markup it needs, a Commerce Store Accelerator application pre-generates the HTML markup for its pages, using the PhantomJS headless browser on the server side to execute the JavaScript. The pre-generated content is stored in the SEORepository
. Commerce Store Accelerator uses a servlet filter called SEOFilter
to identify search bot requests and then it retrieves the pre-generated HTML content from the repository and returns it to the search bot. If the requested page does not exist in the repository, the HTML content for it is generated, stored in the repository, and then served back to the search bot.
The SEOFilter
, which is of class atg.filter.dspjsp.SEOFilter
, uses two components, /atg/repository/seo/UserAgentDetector
and /atg/repository/seo/PhantomjsRenderer
, to identify search bot requests and render the complete markup for the requested URL. Both are configured as init-params
on the SEOFilter
element in the application’s web.xml
file.
The /atg/repository/seo/UserAgentDetector
component is a request-scoped component of class atg.repository.seo.UserAgentDetector
. It has a browserType
property, set to /atg/dynamo/servlet/pipeline/BrowserTypes/Robot
, that defines the pattern an incoming request must match to in order to be considered a search bot request.
The /atg/repository/seo/PhantomjsRenderer
component is of class atg.repository.seo.PhantomjsRenderer
which is an implementation of the atg.repository.seo.MarkupRenderer
interface. The MarkupRenderer
interface requires that all implementing classes have a getHtmlContent()
method that gets the complete HTML markup for a page. The PhantomjsRenderer
class’s implementation of this method uses PhantomJS to perform the rendering. The PhantomjsRenderer
component has the following properties:
waitTime
: The amount of time, in milliseconds, that Commerce Store Accelerator waits for PhantomJS to render the HTML for a page. Set this property to accommodate the maximum amount of time a complex page on your site takes to render in PhantomJS. The default is 3000 milliseconds. Whatever markup is rendered at the end of thewaitTime
time period is retrieved.phantomExecutablePath
: The path to the PhantomJS executable file.
In order to boost performance, Commerce Store Accelerator pages are pre-rendered and their content is stored in the SEORepository
. When a search bot requests content for a page, Commerce Store Accelerator attempts to retrieve the content from the SEORepository
first. If the page does not exist in the repository, Commerce Store Accelerator uses the PhantomJS headless browser to render the full HTML markup for the page, stores that markup in the SEORepository
for future reference, and then returns the markup to the search bot.
In order to populate the SEORepository
with pre-genererated pages, Commerce Store Accelerator uses two components. The /atg/endeca/sitemap/SiteLinksGenerator
component, which is of class atg.endeca.sitemap.SiteLinksGenerator
, generates a complete list of site links for the application. Then the /atg/repository/seo/SitemapPageCacheRenderer
component, which is of class atg.repository.seo.SitemapPageCacheRenderer
, invokes the PhantomJS headless browser for each link to generate the complete HTML markup and stores the pre-generated content in the SEORepository
with the page URL as the key.