atg.repository.search.indexing
Class BulkLoaderImpl

java.lang.Object
  extended by atg.nucleus.logging.VariableArgumentApplicationLoggingImpl
      extended by atg.nucleus.GenericService
          extended by atg.repository.search.indexing.LoaderImpl
              extended by atg.repository.search.indexing.BulkLoaderImpl
All Implemented Interfaces:
NameContextBindingListener, NameContextElement, NameResolver, AdminableService, ApplicationLogging, atg.nucleus.logging.ApplicationLoggingSender, atg.nucleus.logging.TraceApplicationLogging, VariableArgumentApplicationLogging, ComponentNameResolver, Service, ServiceListener, BulkLoader, java.util.EventListener

public class BulkLoaderImpl
extends LoaderImpl
implements BulkLoader

Implements BulkLoader to load do a full load of repository.

Created: February 16 2005


Field Summary
static java.lang.String CLASS_VERSION
          Class version string
 
Fields inherited from class atg.repository.search.indexing.LoaderImpl
DELETE_DOCUMENT, GET_GENERATOR_PROPERTY, GET_META_PROPERTY, GET_SUB_PROPERTY_VALUE, GET_TEXT_PROPERTY, MY_RESOURCE_NAME, PROCESS_PROPERTIES, REGENERATE_FOR_CHANGE, sResourceBundle, START_QUERY
 
Fields inherited from class atg.nucleus.GenericService
SERVICE_INFO_KEY
 
Fields inherited from interface atg.nucleus.logging.TraceApplicationLogging
DEFAULT_LOG_TRACE_STATUS
 
Fields inherited from interface atg.nucleus.logging.ApplicationLogging
DEFAULT_LOG_DEBUG_STATUS, DEFAULT_LOG_ERROR_STATUS, DEFAULT_LOG_INFO_STATUS, DEFAULT_LOG_WARNING_STATUS
 
Constructor Summary
BulkLoaderImpl()
           
 
Method Summary
 boolean anyItemOfType(java.util.Set pItems, java.util.Set pItemTypes, java.util.Set pIgnoredTypeCache)
          Return true if set of AssetVersion items (pItems) contains at least one instance of a repository item which is of any type, or any subtype, of the types specified in pItemTypes.
 java.util.List assetVersionsToRepositoryItems(java.util.Collection pAssetVersions)
          Return a potentially empty or null list of repository items from the collection of AssetVersion objects.
 atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig)
          Do a bulk load, using the query pQuery for the top-level repository items.
 atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig, Query pQuery)
          Do a bulk load, using the query pQuery.
 atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig, Query pQuery, atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession)
          Do a bulk load, using the query pQuery.
 atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig, Query pQuery, atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession, boolean pDeleteGenerations, atg.search.index.IndexInfo pIndexInfo)
          Do a bulk load, using the query pQuery.
 atg.repository.search.indexing.BulkLoaderResults bulkLoadWithRetry(IndexingOutputConfig pOutputConfig, atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession, int pTimeoutMinutes, atg.search.index.IndexInfo pIndexInfo)
          Attempt a bulk load, retrying if someone else is currently updating.
 void deleteGenerationChanges(IndexingOutputConfig pOutputConfig, java.lang.String pContentId, int pGeneration)
          Delete changes for IndexingOutputConfig and all generations in generation set.
 atg.repository.search.indexing.DocumentSubmitterSession generateTestDocument(IndexingOutputConfig pOutputConfig, DocumentSubmitter pDocSubmitter, RepositoryItem pItem, atg.search.index.IndexInfo pIndexInfo)
          Generate a single output document for specified item, return the sesion.
 java.util.Collection getDevelopmentLines(IndexingOutputConfig pOutputConfig)
          Return the development lines to load.
 int getGcCallingRangeSize()
          Get range size for calling garbage collector, if any.
 atg.xml.jaxb.JaxbInvoker getJaxbUnmarshaller()
          The jaxb unmarshaller used by IndexingOutputConfigs.
protected  Query getPagedQuery(RepositoryView pView, RepositoryItemGroup pGroup, java.lang.Object pLastRepositoryId)
          Get the batched query.
protected  QueryOptions getPagedQueryOptions(java.lang.String pRepoIdPropName)
          Get the query options for paged queries.
 int getPagedQuerySize()
          Get query page size for the top-level query, if any.
protected  RepositoryPropertyDescriptor getRepositoryIdPropertyDescriptor(RepositoryItemDescriptor pItemDesc)
          Return the string name of the repository id.
 boolean isItemOfTypeInSet(java.util.Set pTypes, RepositoryItemDescriptor pType)
          Return true of item descriptor is of a type, or subtype of a type, in the set of types.
protected  void loadIteration(IndexingOutputConfig pOutputConfig, Context pContext, Query pQuery, atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession, RepositoryItem[] pItems)
           
protected  void loadPagedIteration(IndexingOutputConfig pOutputConfig, Context pContext, Query pQuery, atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession)
          Perform a page iteration over the set of top-level repository items.
 void postIndexingCleanup(IndexingOutputConfig pOutputConfig, boolean pSuccess, atg.repository.search.indexing.BulkLoaderResults pResults)
          Called after a SearchAdmin indexing invocation to perform post-index housekeeping.
 void setGcCallingRangeSize(int pGcCallingRangeSize)
          Set range size for calling garbage collector, if any.
 void setJaxbUnmarshaller(atg.xml.jaxb.JaxbInvoker pJaxbUnmarshaller)
          The jaxb unmarshaller used by IndexingOutputConfigs.
 void setPagedQuerySize(int pPagedQuerySize)
          Set query page size for the top-level query, if any.
protected  java.util.Map uriMapFromAssetVersions(java.util.Set pAssetVersions)
          Return a set of version manager, versionless asset URIs corresponding to the specified AssetVersion objects.
 
Methods inherited from class atg.repository.search.indexing.LoaderImpl
addActiveContext, addUniqueParamsToURL, adjustIncrementalQueue, afterSessionStart, beforeSessionEnd, cancelActiveContext, claimGeneration, createAdminServlet, createContext, createEmptyOutputDocumentContent, deleteDocumentItem, doStopService, getActiveContextStatuses, getArrayFromMultiRepoItems, getConfigStatePersister, getDocumentsPerTransaction, getIncrementalItemQueue, getLoggingInfoStatusCount, getTransactionManager, getUpdateActivityTimeMillis, getUpdateLastActivityTimeInSeparateThread, initContext, isConnectionRelated, isEchoDocumentsToStdout, isPrettyPrint, isPropertyLoggingDebug, isSubTypeOfItemDesc, isTransactionPerDocument, nextQueuedUpdateActivity, outputAndSubmitDocument, popMembershipContexts, processItem, processMetaProperty, processParentProperties, processProperties, processProperties, processTextProperty, pushMembershipContexts, queueUpdateActivity, releaseContext, releaseGeneration, removeActiveContext, resetContextAfterDocument, setConfigStatePersister, setDocumentsPerTransaction, setEchoDocumentsToStdout, setIncrementalItemQueue, setLoggingInfoStatusCount, setNextIncrementalGeneration, setPrettyPrint, setPropertyLoggingDebug, setTransactionPerDocument, setupContextForNewDocument, setUpdateActivityTimeMillis, setUpdateLastActivityTimeInSeparateThread, splitPropertyValue, updateActivity
 
Methods inherited from class atg.nucleus.GenericService
addLogListener, doStartService, getAbsoluteName, getAdminServlet, getLoggingForVlogging, getLogListenerCount, getLogListeners, getName, getNameContext, getNucleus, getRoot, getServiceConfiguration, getServiceInfo, isLoggingDebug, isLoggingError, isLoggingInfo, isLoggingTrace, isLoggingWarning, isRunning, logDebug, logDebug, logDebug, logError, logError, logError, logInfo, logInfo, logInfo, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, nameContextElementBound, nameContextElementUnbound, removeLogListener, reResolveThis, resolveName, resolveName, resolveName, resolveName, sendLogEvent, setLoggingDebug, setLoggingError, setLoggingInfo, setLoggingTrace, setLoggingWarning, setNucleus, setServiceInfo, startService, stopService
 
Methods inherited from class atg.nucleus.logging.VariableArgumentApplicationLoggingImpl
vlogDebug, vlogDebug, vlogDebug, vlogDebug, vlogError, vlogError, vlogError, vlogError, vlogInfo, vlogInfo, vlogInfo, vlogInfo, vlogTrace, vlogTrace, vlogTrace, vlogTrace, vlogWarning, vlogWarning, vlogWarning, vlogWarning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CLASS_VERSION

public static java.lang.String CLASS_VERSION
Class version string

Constructor Detail

BulkLoaderImpl

public BulkLoaderImpl()
Method Detail

setJaxbUnmarshaller

public void setJaxbUnmarshaller(atg.xml.jaxb.JaxbInvoker pJaxbUnmarshaller)
The jaxb unmarshaller used by IndexingOutputConfigs.


getJaxbUnmarshaller

public atg.xml.jaxb.JaxbInvoker getJaxbUnmarshaller()
The jaxb unmarshaller used by IndexingOutputConfigs.

Specified by:
getJaxbUnmarshaller in interface BulkLoader

setPagedQuerySize

public void setPagedQuerySize(int pPagedQuerySize)
Set query page size for the top-level query, if any. A value of 0 means query for all the items at once.


getPagedQuerySize

public int getPagedQuerySize()
Get query page size for the top-level query, if any. A value of 0 means query for all the items at once.


setGcCallingRangeSize

public void setGcCallingRangeSize(int pGcCallingRangeSize)
Set range size for calling garbage collector, if any. A value of 0 means don't call gc at all.


getGcCallingRangeSize

public int getGcCallingRangeSize()
Get range size for calling garbage collector, if any. A value of 0 means don't call gc at all.


bulkLoadWithRetry

public atg.repository.search.indexing.BulkLoaderResults bulkLoadWithRetry(IndexingOutputConfig pOutputConfig,
                                                                          atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession,
                                                                          int pTimeoutMinutes,
                                                                          atg.search.index.IndexInfo pIndexInfo)
                                                                   throws IndexingException
Attempt a bulk load, retrying if someone else is currently updating.

Specified by:
bulkLoadWithRetry in interface BulkLoader
Parameters:
pOutputConfig - the output configuration to bulkLoad
pTimeoutMinutes - the timeout minutes.
pDocSubSession - the existing document submitter session. if null, will create a new document submitter session.
pTimeoutMinutes - the timeout minutes.
pIndexInfo - The IndexInfo as supplied by the SearchAdmin if null, will create a new document submitter session.
Throws:
IndexingException

bulkLoad

public atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig)
                                                          throws IndexingException
Do a bulk load, using the query pQuery for the top-level repository items.

Specified by:
bulkLoad in interface BulkLoader
Parameters:
pOutputConfig - The output configuration
Returns:
an object representing the result of the load operation
Throws:
IndexingException

bulkLoad

public atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig,
                                                                 Query pQuery)
                                                          throws IndexingException
Do a bulk load, using the query pQuery.

Specified by:
bulkLoad in interface BulkLoader
Parameters:
pOutputConfig - The output configuration
pQuery - the query to use for bulk loading
Returns:
an object representing the result of the load operation
Throws:
IndexingException

getPagedQueryOptions

protected QueryOptions getPagedQueryOptions(java.lang.String pRepoIdPropName)
Get the query options for paged queries.

Parameters:
pRepoIdPropName - The repository id property name.

getPagedQuery

protected Query getPagedQuery(RepositoryView pView,
                              RepositoryItemGroup pGroup,
                              java.lang.Object pLastRepositoryId)
Get the batched query.

Parameters:
pView - the top-level repository item view.
pGroup - the top-level repository item group (if any, may be null)
pLastRepositoryId - the repository ID of the last repository item processed. Should be null the first time this method is invoked.

getRepositoryIdPropertyDescriptor

protected RepositoryPropertyDescriptor getRepositoryIdPropertyDescriptor(RepositoryItemDescriptor pItemDesc)
Return the string name of the repository id.

Returns:
the name of the repository id property, or null if no matching property could be found.

loadPagedIteration

protected void loadPagedIteration(IndexingOutputConfig pOutputConfig,
                                  Context pContext,
                                  Query pQuery,
                                  atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession)
                           throws IndexingException,
                                  RepositoryException
Perform a page iteration over the set of top-level repository items. Will attempt to query using queryPageSize items at a time, rather then getting all the top-level repository items in one fel swoop.

Throws:
IndexingException
RepositoryException

loadIteration

protected void loadIteration(IndexingOutputConfig pOutputConfig,
                             Context pContext,
                             Query pQuery,
                             atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession,
                             RepositoryItem[] pItems)
                      throws IndexingException,
                             RepositoryException
Parameters:
pOutputConfig - the output configuration
pContext - the indexing context
pQuery - optional query to select top-level items (may be null)
pDocSubSession - the document submitter session
pItems - an optional array of items to index, if already calculated
Throws:
IndexingException
RepositoryException

assetVersionsToRepositoryItems

public java.util.List assetVersionsToRepositoryItems(java.util.Collection pAssetVersions)
Return a potentially empty or null list of repository items from the collection of AssetVersion objects.

Parameters:
pAssetVersions - the AssetVersion collection
Returns:
a potentially empty or null list of repository items

generateTestDocument

public atg.repository.search.indexing.DocumentSubmitterSession generateTestDocument(IndexingOutputConfig pOutputConfig,
                                                                                    DocumentSubmitter pDocSubmitter,
                                                                                    RepositoryItem pItem,
                                                                                    atg.search.index.IndexInfo pIndexInfo)
Generate a single output document for specified item, return the sesion.

Parameters:
pOutputConfig -
pDocSubmitter -
pItem -
pIndexInfo - the IndexInfo object

bulkLoad

public atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig,
                                                                 Query pQuery,
                                                                 atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession)
                                                          throws IndexingException
Do a bulk load, using the query pQuery.

Specified by:
bulkLoad in interface BulkLoader
Parameters:
pOutputConfig - The output configuration
pQuery - the query to use for the top-level items
pDocSubSession - the existing document submitter session. if null, will create a new document submitter session.
Returns:
an object representing the result of the load operation
Throws:
IndexingException

bulkLoad

public atg.repository.search.indexing.BulkLoaderResults bulkLoad(IndexingOutputConfig pOutputConfig,
                                                                 Query pQuery,
                                                                 atg.repository.search.indexing.DocumentSubmitterSession pDocSubSession,
                                                                 boolean pDeleteGenerations,
                                                                 atg.search.index.IndexInfo pIndexInfo)
                                                          throws IndexingException
Do a bulk load, using the query pQuery.

Specified by:
bulkLoad in interface BulkLoader
Parameters:
pOutputConfig - The output configuration
pQuery - the query to use for the top-level items
pDocSubSession - the existing document submitter session. if null, will create a new document submitter session.
pDeleteGenerations - If true, delete incremental generations up to and including the last generation indexed in this invocation. This value is true if this method has not been invoked by the SearchAdmin, or there is no other similar process that will be calling postIndexingCleanup() for this operation.
pTimeoutMinutes - the timeout minutes.
Returns:
an object representing the result of the load operation
Throws:
IndexingException

deleteGenerationChanges

public void deleteGenerationChanges(IndexingOutputConfig pOutputConfig,
                                    java.lang.String pContentId,
                                    int pGeneration)
Delete changes for IndexingOutputConfig and all generations in generation set.

Parameters:
pOutputConfig - the output config with associated change generations
pGenerations - the set of generations to delete

postIndexingCleanup

public void postIndexingCleanup(IndexingOutputConfig pOutputConfig,
                                boolean pSuccess,
                                atg.repository.search.indexing.BulkLoaderResults pResults)
Called after a SearchAdmin indexing invocation to perform post-index housekeeping.

Specified by:
postIndexingCleanup in interface BulkLoader
Parameters:
pOutputConfig - the IndexingOutputConfig
pSuccess - True if SearchAdmin indexing process completed successfully, false if rolled back
pResults - The local bulk indexing job results

getDevelopmentLines

public java.util.Collection getDevelopmentLines(IndexingOutputConfig pOutputConfig)
Return the development lines to load.

Returns:
the development lines to load

isItemOfTypeInSet

public boolean isItemOfTypeInSet(java.util.Set pTypes,
                                 RepositoryItemDescriptor pType)
Return true of item descriptor is of a type, or subtype of a type, in the set of types.

Parameters:
pTypes - the set of types
pType - the type to test for typehood

anyItemOfType

public boolean anyItemOfType(java.util.Set pItems,
                             java.util.Set pItemTypes,
                             java.util.Set pIgnoredTypeCache)
Return true if set of AssetVersion items (pItems) contains at least one instance of a repository item which is of any type, or any subtype, of the types specified in pItemTypes.

Parameters:
pDevLineItems - a set of AssetVersion items from a development line
pItemTypes - a set of item descriptors
pIgnoredTypeCache - a set of ignored types, or null

uriMapFromAssetVersions

protected java.util.Map uriMapFromAssetVersions(java.util.Set pAssetVersions)
Return a set of version manager, versionless asset URIs corresponding to the specified AssetVersion objects.

Returns:
a possibly empty set of versionless atg.versionmanager.VersionManagerURI objects