public class SecureOptions extends OptionContainer
Modifier and Type | Class and Description |
---|---|
static class |
SecureOptions.ColorObfuscatedTextRemediationOption |
static class |
SecureOptions.DecryptionStatusOption |
static class |
SecureOptions.Fields
A container for Microsoft Word fields.
|
static class |
SecureOptions.HeadersFootersBehaviorOption |
static class |
SecureOptions.OutputTypeOption |
static class |
SecureOptions.ProcessingStatusOption |
static class |
SecureOptions.Properties
A container for document properties.
|
static class |
SecureOptions.ToTextEncodingOption |
Modifier and Type | Field and Description |
---|---|
static ScrubOption |
AlternativeText
Text that is used as an alternative to displaying a graphic image in constrained viewing environments..
|
static ScrubOption |
AppsForOffice
Apps for Office allow for integration of 3rd party applications into the Office applications.
|
static FileListOption |
AssembleFileList
List of PowerPoint files to be assembled into a new PowerPoint file.
|
static ScrubOption |
AudioVideoFilePaths
Embedded audio and video objects that reference their data through a local or network share path.
|
static ScrubOption |
AuthorHistory
Hidden author history in Microsoft Word document.
|
static AnalyzeOption |
AuthorHistoryContainsPaths
Invisible author history contains paths.
|
static AnalyzeOption |
AuthorHistoryContainsShares
Invisible author history contains network share names.
|
static BooleanOption |
BrokenPDFCorrection
Enables correction of PDFs which has malformed internal structure..
|
static BooleanOption |
ChangeStartingPageNumber
Modify the page number a document starts at..
|
static ObjectListOption |
CheckboxActions
List of actions to perform on named checkboxes in the document while scrubbing.
|
static ScrubOption |
ClippedText
Some characters are hidden because they fall outside the current clipping path..
|
static ScrubOption |
ColorObfuscatedText
Some characters are visually obscured due to the font color matching the background color..
|
static SecureOptions.ColorObfuscatedTextRemediationOption |
ColorObfuscatedTextRemediation
Option that effects how remediation of color obfuscated text is performed..
|
static ScrubOption |
Comments
Author or reviewer comments in the document.
|
static ScrubOption |
ContentProperties
Document properties categorized as content properties.
|
static ScrubOption |
CustomProperties
Document properties categorized as custom properties.
|
static ScrubOption |
CustomXML
Any custom XML data.
|
static ScrubOption |
DatabaseQueries
Database connection and query information.
|
static ObjectOption |
DebugInfoCollector
Oracle internal option.
|
static SecureOptions.DecryptionStatusOption |
DecryptionStatus
Provides information on if and how decryption took place.
|
static ScrubOption |
DefaultScrubBehavior
The default scrub behavior.
|
static ScrubOption |
DocumentVariables
Programmatic variables that can be stored in PowerPoint documents..
|
static HandlerOption |
ElementHandler
Element handler that received the text and elements.
|
static ScrubOption |
EmbeddedObjects
Data from other applications embedded in the document.
|
static StringOption |
EmbeddingExportBaseFileName
Base part of the file name for exported embeddings..
|
static DirectoryOption |
EmbeddingExportDirectory
Directory to recieve exported embeddings..
|
static FileFormatListOption |
EmbeddingExportList
List of file types that will be exported..
|
static IntegerOption |
EmbeddingRecurseDepth
Maximum depth to which embeddings should be recursed..
|
static FileFormatListOption |
EmbeddingRecurseList
List of file types that will be recursively processed..
|
static AnalyzeOption |
Encryption
The document is encrypted.
|
static AnalyzeOption |
ExcelDataModel
Indicates the Excel workbook contains a relational data source and corresponding connection information to other data sources..
|
static BooleanOption |
ExcludeProcessingInfoElement
Do not include the processinginfo element in XML output.
|
static FileOption |
ExportDocument
Document that will contain exported data.
|
static IntegerOption |
ExportMaximumReplacementSize
The maximum number of bytes that may be provided to overwrite the exported document..
|
static FileFormatListOption |
ExportPossibleReplacementFormats
List of formats that may replace the exported document.
|
static BooleanOption |
ExportReplace
The exported document should be replaced.
|
static FileOption |
ExportReplacementDocument
File to replace the exported document with.
|
static FileFormatOption |
ExportReplacementFormat
File format of the ExportReplacementDocument.
|
static IntegerOption |
ExtremeCellHorizontalGapAllowance
Number of columns allowed between cells that are treated as a contiguous range when determining extreme ranges..
|
static AnalyzeOption |
ExtremeCells
Indicates the document contains one or more ranges of spreadsheet cells that are located an extreme distance from other cell ranges..
|
static IntegerOption |
ExtremeCellVerticalGapAllowance
Number of rows allowed between cells that are treated as a contiguous range when determining extreme ranges..
|
static AnalyzeOption |
ExtremeIndenting
Certain indenting, margin and other settings result in text that does not display or print..
|
static AnalyzeOption |
ExtremeObjects
Indicates the document contains one or more objects that are positioned an extreme distance outside the standard viewing area..
|
static ScrubOption |
FastSaveData
Text or other data that was 'deleted' but still exists in the file.
|
static BooleanOption |
FilterHyphensAtEndOfLine
Detect and remove soft and hard hyphens found at the end of a line..
|
static BooleanOption |
FilterOverprintedText
Detect and remove duplicate, overprinted text from extracted output..
|
static BooleanOption |
GenerateAcrobatHighlightPositions
Generate the character highlight positions associated with the start of each word when extracting from PDF documents..
|
static BooleanOption |
GenerateGraphicDataFingerprint
Generate a fingerprint element for each embedded graphic in the document..
|
static BooleanOption |
GenerateSlideAppearanceFingerprint
Generate a fingerprint element for each slide based on the text, images, colors, shape positions, and applied master..
|
static BooleanOption |
GenerateSlideContentFingerprint
Generate a fingerprint element based on the text and image content found for each slide..
|
static ScrubOption |
GPSData
GPS location information.
|
static ScrubOption |
HeadersFooters
Headers and footers.
|
static SecureOptions.HeadersFootersBehaviorOption |
HeadersFootersBehavior
Headers and footers behavior list.
|
static StringListOption |
HeadersFootersReplace
Headers and footers replace list.
|
static StringListOption |
HeadersFootersSearch
Headers and footers search list.
|
static AnalyzeOption |
HiddenCells
Hidden spreadsheet columns, rows, or worksheets.
|
static ScrubOption |
HiddenSlides
Slides that have been hidden from presentation and printing.
|
static ScrubOption |
HiddenText
Text that has been hidden by the author.
|
static ScrubOption |
HybridExcel9597BookStream
A redundant storage of Excel workbooks created for backwards combpatibility with Excel 95.
|
static BooleanOption |
IncludeLocators
Include locator elements in output.
|
static AnalyzeOption |
InvalidXML
Found XML elements that are invalid against the schema.
|
static BooleanOption |
JustAnalyze
Ignore all action settings and just analyze.
|
static BooleanOption |
JustAssemble
Assemble the source PowerPoint file list into a single PowerPoint document, merging all slides..
|
static BooleanOption |
JustDisassemble
Disassemble the source PowerPoint document into individual PowerPoint documents containing one slide each..
|
static BooleanOption |
JustIdentify
Ignore all other settings and just identify the file format of the source document.
|
static ScrubOption |
LinkedObjects
Links to files from other applications.
|
static ObjectListOption |
LocatorActions
List of locator-based actions to perform on the document while scrubbing.
|
static BooleanOption |
LoggedError
An error occured and was logged while processing the document.
|
static BooleanOption |
LoggedWarning
A warning occured and was logged while processing the document.
|
static ObjectOption |
Logger
Logger which should receive logging messages.
|
static ScrubOption |
MacrosAndCode
Macros and other executable code.
|
static ScrubOption |
MeetingMinutes
Meeting minutes entered using the PowerPoint Meeting Minder feature..
|
static ScrubOption |
OfficeGUIDProperty
A document property that provides a globally unique identifier (GUID) of the document and originating computer.
|
static AnalyzeOption |
OfficeXMLAlternateContentParts
This document contains parts that represent some level of disclosure risk if not scrubbed or further analyzed..
|
static BooleanOption |
OfficeXMLCanonicalization
Enable the process that canonicalizes Office XMLs.Note ScrubOption OfficeXMLFeatures must be set to canonicalize the file..
|
static BooleanOption |
OfficeXMLFeatures
Enable the features which does inspection and sanitatization of Office XMLs vulnerabilities..
|
static BooleanOption |
OfficeXMLPartValidation
Enable the process that validates all Office parts found in Office Open XML formats..
|
static BooleanOption |
OfficeXMLRenameNamespacePrefix
Rename namespace prefixes in all XML inside a MS office file.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefixes..
|
static AnalyzeOption |
OfficeXMLRogueParts
This document contains parts are not are not referenced or required by the document that represent a significant unintentional disclosure risk if not scrubbed or further analyzed..
|
static AnalyzeOption |
OfficeXMLUnanalyzedParts
This document contains parts that understood but not analyzed by the Clean Content analysis process..
|
static AnalyzeOption |
OfficeXMLUnexpectedParts
This document contains parts that are not processed by the Clean Content analysis process..
|
static ScrubOption |
OutlookProperties
Document properties added to Office document email attachments by Microsoft Outlook.
|
static SecureOptions.OutputTypeOption |
OutputType
Controls how the extracted data is returned to the developer.
|
static AnalyzeOption |
OverlappedObjects
Indicates the document contains one or more objects that have been overlapped by another object..
|
static ScrubOption |
OverlappedText
Some characters are hidden because they have been overlapped by a rectangular shape or image...
|
static StringListOption |
PasswordList
This option contains a list of passwords to be verified against password protected documents.
|
static ScrubOption |
PDF3DArtworkAnnotations
.
|
static ScrubOption |
PDFActions
PDF supports a set of interactive features called actions that range from jumping to a particular destination in the document to submitting the data of an interactive form to a server.
|
static ScrubOption |
PDFAlternateImages
Alternate versions of an image they may be used by readers..
|
static ScrubOption |
PDFAlternatePresentations
Alternate Presentations can be used to view a PDF document in an alternative way more consistent with a presentation rendition..
|
static ScrubOption |
PDFAnnotations
PDF supports a set of interactive features called annotations that allow numerous types of content to be associated with a page location or provide user interaction..
|
static ScrubOption |
PDFDeprecatedPostscriptObjects
Postscript objects embedded inside PDF documents..
|
static AnalyzeOption |
PDFDigitalSignatures
Digital signatures are used to authenticate the identity of the author and the contents of the document..
|
static ScrubOption |
PDFEmbeddedSearchIndex
Indicates that the document contains an embedded search index provided to make text searches faster within Adobe Acrobat..
|
static ScrubOption |
PDFFileAttachmentAnnotations
.
|
static ScrubOption |
PDFGoTo3DViewActions
The GoTo3D View action controls the view of a 3D annotation..
|
static ScrubOption |
PDFGoToActions
The GoTo action causes the Viewer software to change the current view of the document to specific location within the document..
|
static ScrubOption |
PDFGoToEActions
The GoToE (Go to embedded file) action causes the Viewer software to change the current view to a specific location in another PDF file that is embedded in this or another PDF file..
|
static ScrubOption |
PDFGoToRActions
The GoToR (Go to remote location) action causes the Viewer software to change the current view to a specific location in another PDF file..
|
static ScrubOption |
PDFGraphicalMarkupAnnotations
.
|
static ScrubOption |
PDFHideActions
The Hide action causes the Viewer software to change the visibility of annotations and form fields..
|
static ScrubOption |
PDFImportDataActions
The Import Data action imports Forms Data Format (FDF), XFSD, or XML into the interactive form fields of the PDF document..
|
static ScrubOption |
PDFJavaScriptActions
The JavaScript Action causes Javascript code to be executed by the Java interpreter supported by the PDF Viewer..
|
static ScrubOption |
PDFLaunchActions
The Launch action launches an application or opens or prints a document..
|
static ScrubOption |
PDFLegalAttestation
Information that specifies the existence of content that may result in unexpected rendering of a document..
|
static ScrubOption |
PDFLineMarkupAnnotations
.
|
static ScrubOption |
PDFLinkAnnotations
.
|
static IntegerOption |
PDFMinimumImageDimensionRequiredToProcess
The minimum pixel width and height required to process an image inside a PDF.
|
static ScrubOption |
PDFMovieActions
The Movie action causes the Viewer software to play a movie object that is stored as an external file..
|
static ScrubOption |
PDFMovieAnnotations
.
|
static ScrubOption |
PDFNamedActions
The Named action causes the Viewer software to change the current view of the document to a specific named location in the current document..
|
static ScrubOption |
PDFOtherPrivateApplicationData
Indicates that the document contains private application data other than an embedded search index..
|
static ScrubOption |
PDFPrintersMarkAnnotations
.
|
static ScrubOption |
PDFPrivateApplicationData
Private data stored in PDF documents by applications using the PDF Page-Piece dictionary construct..
|
static ScrubOption |
PDFProjectionAnnotations
.
|
static ScrubOption |
PDFRedactionAnnotations
.
|
static ScrubOption |
PDFRenditionActions
The Rendition action controls the playback of multimedia content..
|
static ScrubOption |
PDFResetFormActions
The Reset Form action resets a selected set of interactive form fields..
|
static ScrubOption |
PDFRichMediaActions
The Rich Media action identifies a rich media annotation and specifies a command to be sent to that annotation handler.
|
static ScrubOption |
PDFRichMediaAnnotations
.
|
static ScrubOption |
PDFScreenAnnotations
.
|
static ScrubOption |
PDFSetOCGStateActions
The Set OCG State action sets the state of one or morel optional content groups..
|
static ScrubOption |
PDFSoundActions
The Sound action causes the Viewer software to play a sound object..
|
static ScrubOption |
PDFSoundAnnotations
.
|
static ScrubOption |
PDFSubmitFormActions
The Submit Form action transmits the names and values of selected form fields to a specified URL..
|
static ScrubOption |
PDFTextAndFreeTextAnnotations
.
|
static ScrubOption |
PDFTextMarkupAnnotations
.
|
static ScrubOption |
PDFThreadActions
The Thread action causes the Viewer software to change the current view of the document to specific location in an article thread within the document..
|
static ScrubOption |
PDFThumbnailImages
Thumbnail images are small images that provide a represenation of either a PDF page or an externally referenced file..
|
static ScrubOption |
PDFTransitionActions
The Transition action is used in a sequence of actions to define transition appearances during the sequence..
|
static ScrubOption |
PDFTrapNetworkAnnotations
.
|
static ScrubOption |
PDFUnknownActions
Any action that is not in the list of supported actions is treated as an Unknown action..
|
static ScrubOption |
PDFURIActions
The URI action causes the Viewer software to resolve and open a resource described by a Uniform Resource Identifier..
|
static ScrubOption |
PDFWatermarkAnnotations
.
|
static ScrubOption |
PDFWebCaptureInformation
Data stored in PDF documents used to import content from external Web pages.
|
static ScrubOption |
PresentationNotes
Notes associated with a slide presentation.
|
static ScrubOption |
PrinterInformation
Printer information in the document.
|
static AnalyzeOption |
PrinterInformationContainsShares
Printer information that includes network share names.
|
static SecureOptions.ProcessingStatusOption |
ProcessingStatus
Describes why the document could not be processed.
|
static BooleanOption |
PropertiesOnly
Extract only properties from the document.
|
static IntegerOption |
RequestTimeout
Amount of time in milliseconds a request can execute before being timed out.
|
static FileOption |
ResultDocument
Document that will contain the extracted data.
|
static FileOption |
ResultTransform
The XSLT document with which to process the result XML.
|
static ScrubOption |
RoutingSlip
Email routing information.
|
static ScrubOption |
Scenarios
Scenarios are an Excel feature that allow for multiple data models.
|
static FileOption |
ScrubbedDocument
The scrubbed document..
|
static FileFormatOption |
ScrubbedFormat
The new file format for the scrubbed document.
|
static ScrubOption |
SensitiveContentLinks
Sensitive paths or URI's to external content that is to be included in this file.
|
static ScrubOption |
SensitiveHyperlinks
Hyperlinks containing either fully qualified local paths or network share names.
|
static ScrubOption |
SensitiveIncludeFields
INCLUDETEXT and INCLUDEPICTURE fields containing either fully qualified local paths or network share names.
|
static StringListOption |
SensitiveLinksRegex
List of regular expressions against which hyperlinks and content links should be tested to determine their sensitivity.
|
static BooleanOption |
SimulatePowerPointAnimationsDuringAssembly
Simulate PowerPoint Animations During Assembly..
|
static ScrubOption |
SizeObfuscatedText
Some character's sizes are outside a certain normal range.
|
static IntegerOption |
SizeObfuscatedTextMaximum
Maximum size a character may have when analyzing/scrubbing the SizeObfuscatedText target.
|
static IntegerOption |
SizeObfuscatedTextMinimum
Minimum size a character may have when analyzing/scrubbing the SizeObfuscatedText target.
|
static ScrubOption |
SmartTags
Tags applied to text that matches a defined pattern allowing specific actions to be executed based on the category of the smart tag..
|
static FileOption |
SourceDocument
The document to process.
|
static FileFormatOption |
SourceFormat
The file format of the source document.
|
static IntegerOption |
StartingPageNumber
The page number used when modifying a document's starting page number..
|
static ScrubOption |
StatisticProperties
Document properties categorized as statistics properties.
|
static ScrubOption |
StructuredDocumentTags
Word's Structure dDocument Tags.
|
static ScrubOption |
SummaryProperties
Document properties categorized as summary properties.
|
static ScrubOption |
TemplateName
If a template other than Normal.dot is used the document will contain a full path to the template file.
|
static BooleanOption |
TimeoutUsingThreadStop
If set to 'true', requests in tight infinite loops will be stopped using the depricated Thread.stop method.
|
static SecureOptions.ToTextEncodingOption |
ToTextEncoding
Controls the encoding when extracted data is returned as text.
|
static ScrubOption |
TrackedChanges
Tracked changes in the document.
|
static BooleanOption |
TransformResult
Perform an XML transform on the result document.
|
static BooleanOption |
UnhideHiddenCells
Unhide hidden spreadsheet cells.
|
static ScrubOption |
UninitializedDocfileData
Uninitialized data segments found in the Docfile format leveraged by Office 2003 and below and many other formats..
|
static AnalyzeOption |
UnknownXML
Found XML elements in unknown namespaces.
|
static ScrubOption |
UserNames
The names of users associated with the document.
|
static BooleanOption |
ValidateEmbeddedContent
Enable the process that validates all embedded contents found in Office Open XML formats..
|
static ScrubOption |
Versions
Version information in Word documents.
|
static BooleanOption |
WasException
An exception occured while processing the document.
|
static BooleanOption |
WasIdentified
The source document was identified.
|
static BooleanOption |
WasProcessed
The source document was scrubbed, analyzed or extracted.
|
static BooleanOption |
WasSupported
The source document's file format is supported.
|
static BooleanOption |
WasTimeout
Document took long than the request's RequestTimeout value to process.
|
static ScrubOption |
WeakProtections
Weak or easily breakable protections and passwords.
|
static ScrubOption |
XMLBoundedSpaces
Bounded whitespaces can be used to indent text.Note ScrubOption OfficeXMLFeatures must be set to scrub bounded spaces..
|
static ScrubOption |
XMLCDATA
XML CDATA refers to character data.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML CDATA..
|
static ScrubOption |
XMLComment
XML Comments are used to provide semantic information to the human reader.Note ScrubOption OfficeXMLFeatures must be set extract and scrub XML Comments..
|
static ScrubOption |
XMLExternalEntity
XML external entity are references to external file.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML external entity..
|
static ScrubOption |
XMLPI
XML Processing instruction can be used to pass information to applications.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XMP Processing instruction..
|
static ScrubOption |
XMLRenameNamespacePrefix
XML namespace prefix are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefix..
|
static ScrubOption |
XMLUnknownNamespace
XML namespace in the document which is not part of whitelisted namespace list.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML UnknownNamespace..
|
static ScrubOption |
XMLUnusedNamespaces
XML namespace are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML unused namespaces..
|
static ScrubOption |
XMPMetadataStreams
XMP Metadata streams are leveraged to store metadata properties using the Extensible Metadata Platform standard..
|
Modifier and Type | Method and Description |
---|---|
static Option |
deepMapOptionId(int id) |
OptionContainer[] |
getAllContainers()
Get all the option containers in this container
|
Option[] |
getAllOptions()
Gets all the options in this container
|
static SecureOptions |
getInstance()
Gets the one and only instance of this class.
|
Option |
mapOptionId(int id)
Maps an integer id of an option in this list to the option itself.
|
Option |
mapOptionId(java.lang.String id)
Maps an string id of an option in this list to the option itself.
|
getId, writeXML
public static final FileOption SourceDocument
This option gives the developer a number of ways to provide the document to analyze, scrub or extract.
public static final FileFormatOption SourceFormat
This result provides the file format of the source document.
public static final BooleanOption WasIdentified
The source document was identified.
Default value is
false
public static final BooleanOption WasSupported
The source document's file format is supported and processing was attempted.
Default value is
false
public static final BooleanOption WasProcessed
The source document was scrubbed, analyzed or extracted. Will be set to false if no component could be found to process the source document.
Default value is
false
public static final BooleanOption WasException
An exception occured while processing the document. This is somewhat redundant since the developer will receive the exception itself but is included so the SecureResult can stand alone to completely describe the result of processing a document.
Default value is
false
public static final BooleanOption WasTimeout
The document took long than the request's RequestTimeout value to process or was interrupted.
Default value is
false
public static final SecureOptions.ProcessingStatusOption ProcessingStatus
An enumeration of the possible reasons the document could not be processed.
Only the following values are allowed:
ProcessingStatusOption.Processed
The document was processed successfully
ProcessingStatusOption.NotIdentified
The document could not be indetified
ProcessingStatusOption.NotSupported
The document's file format is not supported
ProcessingStatusOption.CausedException
Processing the document caused an exception
ProcessingStatusOption.Timeout
Processing the document timed out before it could complete
Default value is
PROCESSINGSTATUS_Processed
public static final SecureOptions.DecryptionStatusOption DecryptionStatus
An enumeration of the possible outcomes of decryption.
Only the following values are allowed:
DecryptionStatusOption.NotEncrypted
The document contained no encryption
DecryptionStatusOption.DecryptedWithDefaultPassword
Parts of the document were encrypted and were successfully decrypted with the default password
DecryptionStatusOption.DecryptedWithPasswordList
Parts of the document were encrypted and were successfully decrypted with a password from the list provided
DecryptionStatusOption.DecryptionFailed
Parts of the document were encrypted but could not be decrypted with any of the default or provided passwords
DecryptionStatusOption.DecryptionNotSupported
Parts of the document were encrypted using an unsupported encryption method
Default value is
DECRYPTIONSTATUS_NotEncrypted
public static final BooleanOption LoggedError
An error occured and was logged while processing the document. Errors include exceptions that end processing (WasException will also be true) and other conditions that don't cause exceptions but may lead major loss of functionallity. See the log for details.
Default value is
false
public static final BooleanOption LoggedWarning
A warning occured and was logged while processing the document. Warnings include conditions that may lead to small losses of functionallity. See log for details.
Default value is
false
public static final IntegerOption RequestTimeout
The amount of time in milliseconds a request can execute before being timed out. Timeouts are useful for the extreemly rare cases where malformed documents cause infinite loops within the Clean Content code. While it is tempting to set this number low since most documents process in much less than 100 ms, very large or complex documents can take a significant amount of time to process hence the 2 minute default for this option. A value of zero may be used to disable timeout for the request but this is not recommended.
Default value is
120000
public static final BooleanOption TimeoutUsingThreadStop
When a malformed document pushes Clean Content into an infinite loop, a monitoring thread attempts to interrupt the thread after a certain timeout period given by the RequestTimeout option. One of two things will then occur: 1) if the request is in a loop that can be interrupted then the request will be stopped and the SecureRequest execute method will return, 2) if in the very rare case the request is in a tight loop and this option is set to 'true' the monitoring thread will use the depricate Thread.stop method to kill the thread. Anyone setting this option to 'true' must understand the implications of having the Java thread running the request destroyed. See the Java API documentation for java.lang.Thread for details.
Default value is
false
public static final BooleanOption JustIdentify
When this option is true the only action that will be taken is to identify the file format of the source document.
Default value is
false
public static final ObjectListOption CheckboxActions
A List of actions to perform on named checkboxes in the document while scrubbing.
public static final ObjectOption DebugInfoCollector
Oracle internal option
public static final HandlerOption ElementHandler
This options allows the developer to provide an object that implements the
ElementHandler
interface. This object will receive the text and elements during the
execute
method in
ExtractRequest
This option is only valid if the
OutputType
option is set to
OUTPUTTYPE_TOHANDLER
.
public static final FileOption ResultDocument
This option gives the developer a number of ways to provide the file that will receive the plain text or XML rendition of the extracted text and elements. This option is only valid if the
OutputType
option is set to
OUTPUTTYPE_TOXML
or
OUTPUTTYPE_TOTEXT
.
public static final BooleanOption TransformResult
If set to
true
the contents of the XML result will be XSLT processed using the document specificed in the ResultTransform option before being written. This option is valid only when OutputType is set to TOXML.
Default value is
false
public static final FileOption ResultTransform
The XSLT document with which to process the report XML. This option is valid only when OutputType is set to TOXML.
public static final BooleanOption PropertiesOnly
Extract only properties from the document while skipping the body text and structure.
Default value is
false
public static final FileFormatListOption EmbeddingRecurseList
This option defines a list of file types that when found as embeddings (embedded images, OLE embeddings, etc.) should be recursively processed. The embeddings will be processed using the same options as the main document.
public static final IntegerOption EmbeddingRecurseDepth
This option sets a limit as to how 'deep' embedding recursion will go. Setting this value to 0 will disable embedding recursion even for file formats defined in the EmbeddingRecurseList. Setting this value to 1 will allow one level of recursion and so on.
Default value is
0
public static final FileFormatListOption EmbeddingExportList
This option defines a list of file types that when found as embeddings (embedded images, OLE embeddings, etc.) will be exported as stand alone files.
public static final DirectoryOption EmbeddingExportDirectory
This option defines the directory where exported embedding (embedded images, OLE embeddings, etc.) files should be placed. File naming is format specific and cannot be modified at this time. This value defaults to the process's current directory.
public static final StringOption EmbeddingExportBaseFileName
This option defines the beginning of the file name used when exporting embeddings (embedded images, OLE embeddings, etc.) to EmbeddingExportDirectory. The rest of the file name and the file's extension is format specific.
Default value is
null
public static final FileOption ExportDocument
This option should be used only used in the startEmbeddedContent and processEmbeddedContent methods of an ElementHandler.
This option defines where exported data such as embeddedings and fast save data should be placed. Within the element handler methods startEmbeddedObject and startFastSaveData this option will be set on the exportOptions field to decribe where that particular data will be saved and to allow the developer to override that location.public static final BooleanOption ExportReplace
This option should be used only used in the startEmbeddedContent and processEmbeddedContent methods of an ElementHandler.
When this option is set to true the ExportReplacementDocument and ExportReplacementFormat options are used to replace the exported document. Commonly used to replace embeddings.
Default value is
false
public static final IntegerOption ExportMaximumReplacementSize
This option should be used only used in the startEmbeddedContent and processEmbeddedContent methods of an ElementHandler.
This options defines the maximum number of bytes, provided through the
ExportReplacementDocument
option, that may be provided to overwrite the exported document. If ExportReplacementDocument is larger than this then an exception will be thrown. Note that this value will not necessarly be the same as the size of the exported document due to compression and other factors. If this option is zero (0) then the replacement image may be of any size.
Default value is
0
public static final FileFormatListOption ExportPossibleReplacementFormats
This option should be used only used in the startEmbeddedContent and processEmbeddedContent methods of an ElementHandler.
This option provides a list of file formats that may be provided through the ExportReplacementDocument options. Providing a file not in one of these formats will cause unexpected results.public static final FileFormatOption ExportReplacementFormat
This option should be used only used in the startEmbeddedContent and processEmbeddedContent methods of an ElementHandler.
This option defines the file format of the file which should overwrite the exported document. ExportReplacementDocument must be in this format.public static final FileOption ExportReplacementDocument
This option should be used only used in the startEmbeddedContent and processEmbeddedContent methods of an ElementHandler.
This option defines the file which should overwrite the exported document. This file must be ExportReplacementSize or less and ExportReplacementFormat must describe its format.public static final BooleanOption GenerateAcrobatHighlightPositions
This option enables the extraction of character highlight positions at the start of each word when extracting from PDF documents. This information can be used to create an Adobe highlight file for the purpose of highlighting select text when viewing the original PDF document in Acrobat. The details of the Adobe Highlight file format can be found in the Adobe technical note titled HighlightFileFormat.pdf, availabe from Adobe. The highlight positions are provided by the PdfHL element in the extracted XML. These positions are character positions as defined in the Adobe technical note. Note that the Adobe character counting algorithm does not necessarily increment by 1 for each subsequent character. However, Acrobat highlights on full word boundaries even when a partial range is provided. For this reason it is reasonable to highlight select words by providing the position and a length equal to 1 or the number of characters to highlight.
Default value is
false
public static final BooleanOption FilterHyphensAtEndOfLine
This option enables the detection and removal of soft and hard hyphens found at the end of a line during the extraction process. It is not uncommon for applications that generate PDF to use either a soft or hard hyphen to hyphenate a word when wrapping from one line to another. This feature is dependent on Clean Content's ability to infer line boundaries since they are not stored within a PDF document. Lines are inferred by monitoring position changes during text operations. It would be ideal to only remove soft hyphens during this process but unfortunately many application use hard hyphens for hyphenation when generating the PDF document. Use of this feature can result in the removal of legitimate hard hyphens from the extracted output. This option defaults to 'off' for this reason. This feature primarily benefits applications when searching the text output without the use of intelligent hyphen monitoring.
Default value is
false
public static final BooleanOption FilterOverprintedText
This option enables the detection and removal of duplicate, overprinted text from the extracted output. It is not uncommon to see PDF documents with duplicate characters drawn very nearly on top of themselves for the purpose of supporting certain types of character attributes that may inlcude bolded, embossed, shadowed, or 3D text characteristics. Unfortunately, the overprinting may occur at character, intra-word, word, or line boundaries. This can have the unfortunate side effect of breaking valid words into a stream of unintelligable characters which in turn has adverse consequences on text searching. This feature addresses this problem by monitoring the position of every drawn piece of text within a line for overprinting situations. Most common use cases are covered though there are valid cases that are not detected when spaces are a part of one text operation but not another, causing the match algorihtm to fail. Additionally, this feature is disabled on any text that is drawn using a font that lacks valid character width metrics.
Default value is
false
public static final BooleanOption BrokenPDFCorrection
Major PDF reader like Adobe allows many deviations from standard PDF specifications. But Clean Content parser strictly follows the specification to read the pdf streams.Hence, though many readers can open broken pdf documents as they overlay the broken streams to correct the malformed internal structure. This option enables the correction of such broken Pdf documents if parsing fails for given input Pdf. Though Clean Content will try to recover as many PDFs it can , but there may be such broken streams which are too malformed to auto-correction. Therefore, there will be limitations to PDF corrections.
Default value is
false
public static final BooleanOption GenerateSlideContentFingerprint
This option enables the generation of a fingerprint element with a type attribute of 'SlideContent'. This element is generated by Clean Content during analysis of a presentation. The value attribute provides the fingerprint as a 128 bit MD5 hash. The fingerprint for SlideContent is generated based on the text and images found on the slide. This allows the fingerprint to be consistent regardless of modifications due to positions, colors, shapes, masters, and other slide attributes. The SlideAppearance fingerprint is an extension of the SlideContent fingerprint that includes consideration for the applicable slide master, slide background, and the position and select formatting of slide content, including shapes. Numerous presentation features are excluded from the fingerprint calculation in order to improve the consistencty of the fingerprint across different versions of PowerPoint. This fingerprint can be leveraged to identify slides across a diverse document set that are substantially similar in content but may vary with respect to formatting.
Default value is
false
public static final BooleanOption GenerateSlideAppearanceFingerprint
This option enables the generation of a fingerprint element with a type attribute of 'SlideAppearance'. This element is generated by Clean Content during analysis of a presentation. The value attribute provides the fingerprint as a 128 bit MD5 hash. The SlideAppearance fingerprint is an extension of the SlideContent fingerprint and includes consideration for the slide background and the position and select formatting of slide content, including shapes. Numerous presentation features are excluded from the fingerprint calculation in order to improve the consistencty of the fingerprint across different versions of PowerPoint. This fingerprint can be leveraged to identify slides that are substantially similar in both content and appearance.
Default value is
false
public static final BooleanOption GenerateGraphicDataFingerprint
This option enables the generation of a fingerprint element with a type attribute of 'GraphicData'. This element is generated by Clean Content during analysis of embedded objects that are of type 'Graphic'. The value attribute provides the fingerprint as a 128 bit MD5 hash. This fingerprint can be leveraged to identify documents that a particular embedded image.
Default value is
false
public static final BooleanOption ExcludeProcessingInfoElement
Do not include the processinginfo element in XML output. This option is for testing only! Removal of the processing info element allows QA processes that produce XML output at different times and with different source documents to easily compare resulting XML.
Default value is
false
public static final BooleanOption IncludeLocators
Include locator elements in output.
Default value is
false
public static final ObjectListOption LocatorActions
A List of locator-based actions to perform on the document while scrubbing.
public static final StringListOption PasswordList
This option contains a list of passwords to be verified against password protected documents
public static final FileFormatOption ScrubbedFormat
This result is set when the format of the scrubbed document differs from that of the soruce document. In many cases the extension of the scrubbed document must be changed in order for the document to be sucsessfully opened by its applciation. This happens in Office 2007 when marcos are removed from documents. For example Microsoft Word 2007 documents with macros (.docm files) must be changed to .docx when macros are removed or Word will not open them. The new extension can be retrieved using the getExtension method on the file format returned by this option.
public static final FileListOption AssembleFileList
This option is used when the JustAssemble option is set to true. The set of files defined by this option will be assembled into a new PowerPoint document.
public static final BooleanOption ChangeStartingPageNumber
When this option is true the StartingPageNumber option is used to modify the page number a document starts at.
Applies to Microsoft Word 2007 and above
Default value is
false
public static final IntegerOption StartingPageNumber
When the option ChangeStartingPageNumber is true this option is used to modify the page number a document starts at.
Applies to Microsoft Word 2007 and above
Default value is
1
public static final IntegerOption PDFMinimumImageDimensionRequiredToProcess
This option allows any image found inside a PDF document to be ignored during exctraction unless both the x and y pixel dimensions of the image are greater or equal to this value. This option is useful to prevent extracting small images commonly used to generate drawing artifacts like table border, underline, shading, and patterns.
Applies to Adobe Acrobat (PDF)
Default value is
96
public static final BooleanOption JustDisassemble
When this option is true the input document will be disassembed into a set of new documents. At this time this only applies to disassembling a PowerPoint document into multiple PowerPoint documents, each containing one slide. The resulting files will be placed in the embedding export directory by default.
Default value is
false
public static final BooleanOption JustAssemble
When this option is true the AssembleFileList option defines the list of PowerPoit documents to be assembled into a single PowerPoint document. The source document defined by the SourceFile option must contain a PowerPoint document that will be used as the source for document wide defaults. At this time this only applies to assembling a set of PowerPoint documents into a single PowerPoint document.. The resulting file will be placed in the embedding export directory by default.
Default value is
false
public static final BooleanOption JustAnalyze
When this option is true all scrub targets with actions of SCRUB will act as if they are set to ANALYZE. This allows for an analysis with no copying of the source document and no chance anything will be scrubbed.
Default value is
false
public static final BooleanOption OfficeXMLPartValidation
The Office Open XML file formats, generated by Office 2007 and above, follow a specification that describes how a collection of related parts define an Office Document. Each part is stored as a unique file in the collection, and parts may reference other parts to define the structure of the document. Many of these parts are deeply inspected during the Clean Content analysis process, however this option activates additional analysis, extraction and scrubbing behavior that covers every part in the document in one way or another. When this option is set to True the following additional behaviors are active. The extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each questionable part using an OfficeXMLPartRisk element that provides further information about the part. There are 4 categories of parts that carry some level of disclosure risk: Rogue, Unexpected, Unanalyzed, and Alternate Content parts. Each of these is documented as a specific analysis target. Those analysis targets must be set to ANALYZE when this option is enabled in order to report that particular risk in the extracted output. Rogue parts will automatically be scrubbed whether this option is enabled or disabled because rogue parts serve no known valid purpose in the document. Unexpected parts will not be scrubbed since doing so might break the document structure. Unanalyzed parts will only be scrubbed if they are removable due to a specific scrub target (i.e. Printer Settings). The Choice portion of Alternate Content is always scrubbed whether this option is enabled or disabled. Alternate Content parts that are referenced within the Choice portion are removed unless they are required in another valid context whether this option is enabled or disabled.
Default value is
false
public static final BooleanOption ValidateEmbeddedContent
This feature is an add on feature for OfficeXMLPartValidation. This feature enables scrubbing of the rogue content present inside the Office open documents. The Office Open XML file formats, generated by Office 2007 and above, follow a specification that describes how a collection of related parts define an Office Document. Each part is stored as a unique file in the collection, and parts may reference other parts to define the structure of the document. Many of these parts are deeply inspected during the Clean Content analysis process, however this option activates additional analysis, extraction and scrubbing behavior that covers every part in the document in one way or another. When this option is set to True the following additional behaviors are active. The extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each questionable part using an OfficeXMLPartRisk element that provides further information about the part. This falls under the category of Rogue part present in the document.Rogue parts will automatically be scrubbed because rogue parts serve no known valid purpose in the document.
Default value is
false
public static final BooleanOption OfficeXMLCanonicalization
Canonical XML is a normal form of XML, intended to allow relatively simple comparison of pairs of XML documents for equivalence; for this purpose, the Canonical XML transformation removes non-meaningful differences between the documents.Canonicalization involves UTF-8 encoding, attribute normalization , handle special characters , replace entity references and many more.Note ScrubOption OfficeXMLFeatures must be set to canonicalize the file.
Default value is
false
public static final BooleanOption OfficeXMLFeatures
Once this option is enabled Clean content will start processing 2007 and above office file formats for XMLComments,XML External entity, XML CDATA and XML Unknown Namespaces. Once this option is set then only Clean Content will report existence of XML Comments,XMLCDATA, XML External entity or XML UnknownNamespace and scrub options for these features also work only when this flag is set.
Default value is
false
public static final ScrubOption XMLBoundedSpaces
Bounded whitespaces can be used to indent text.Note ScrubOption OfficeXMLFeatures must be set to scrub bounded spaces.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final BooleanOption OfficeXMLRenameNamespacePrefix
Namespace prefix can contain sensitive information.It is therefore ,recommended to rename namespace prefixes to neutral prefixes.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefixes.
Default value is
false
public static final BooleanOption UnhideHiddenCells
Unhide hidden sheets, rows, and columns found in spreadsheets.
Default value is
false
public static final SecureOptions.OutputTypeOption OutputType
This option controls how the extracted data is returned to the developer.
Only the following values are allowed:
OutputTypeOption.ToHandler
Output to the handler provided in the ElementHandler option
OutputTypeOption.ToText
Output text to file provided in the ResultDocument option. Text output is in the encoding defined by the ToTextEncoding option which defaults to Unicode UTF-16, the byte order is the platform's native order, the line seperator is the platform's native line seperator and for UTF-16 output the first character is always the Unicode Byte Order Mark (BOM).
OutputTypeOption.ToXML
Output XML to file provided in the ResultDocument option. XML will be in the namespace 'http://www.bitform.net/xml/schema/elements.xsd'.
OutputTypeOption.NoOutput
Do not output any data. This value is used to disable extraction.
Default value is
OUTPUTTYPE_NoOutput
public static final SecureOptions.ToTextEncodingOption ToTextEncoding
This option controls the encoding of extracted data when the OutputType options is set to ToText.
Only the following values are allowed:
ToTextEncodingOption.UTF16
Output text in UTF-16
ToTextEncodingOption.UTF8
Output text in UTF-8
Default value is
TOTEXTENCODING_UTF16
public static final ObjectOption Logger
Logger which should receive logging messages.
public static final FileOption ScrubbedDocument
This option gives the application a number of ways to provide the document to produce as a result of scrubbing the source document.
public static final ScrubOption AlternativeText
Each graphic image and shape in a document may include an optional piece of text that can be used in place of the image when viewing the document in a constrained environment.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption AppsForOffice
Apps for Office allow for integration of 3rd party applications into the Office applications using web technologies. There are two types of Web extensions; content and taskpane. Web extensions enable 3rd party applications to tightly integrate into Office using web based interfaces like JavaScript, HTML5, CSS3. A Web extension runs inside of a web page frame within Office. The web page is served by some web server and the page has access to the Office document object model allowing rich feature connections between document content and the 3rd party web app. Content extensions contribute to content directly within a frame of the document. Taskpane extensions enable user interactions that enhance the authoring process but don’t directly generate document content (for example a dictionary app).
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLComment
XML Comments are used to provide semantic information to the human reader.Note ScrubOption OfficeXMLFeatures must be set extract and scrub XML Comments.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLPI
XML Processing instruction can be used to pass information to applications.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XMP Processing instruction.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLCDATA
CDATA is defined as blocks of text that are not parsed by the parser, but are otherwise recognized as markup.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML CDATA.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLUnknownNamespace
CC stores a list of namespaces which has internal schema definitions.There are many namespace which can not map to whitelisted namespace list and thus has no schema definition within CC.These namespaces are flagged as unknown namespaces.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML UnknownNamespace.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLExternalEntity
CC would show if external entity references exist in the document and user can decide to remove them.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML external entity.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLRenameNamespacePrefix
When using prefixes in XML, a namespace for the prefix must be defined.XML namespace prefix are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefix.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMLUnusedNamespaces
A XML can have multiple namespaces defined which are not being used.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML unused namespaces.
Applies to Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption AudioVideoFilePaths
Microsoft PowerPoint supports linking to audio and video files using the 'Insert > Movies and Sounds > Movie from File' and 'Insert > Movies and Sounds > Sound from File' commands. Use of this feature results in storing a potentially sensitive link to a local or network file path. Note that this type of path can also be removed only when it is considered sensitive using the Sensitive Content Links target .
Applies to Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption AuthorHistory
Up to the last 10 authors that saved the document are stored in an area of the document that is inaccessible using the Word application. In Word 97 and Word 2000 this information also contains the paths where the document was saved and may include sensitive user logon or network share information.
Applies to Microsoft Word 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final AnalyzeOption AuthorHistoryContainsPaths
The hidden author history contains the last 10 fully qualified path names where the document was saved.
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption AuthorHistoryContainsShares
The hidden author history contains network share names. This information can provide dangerous insight into an organization's internal network.
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption ClippedText
The PDF file format allows a clipping path to be established that limits the region of the page affected by painting operations including text drawing. The page boundary inherently establishes the initial clipping region and it can be adjusted from there as needed. This target detects the existence of text that is drawn outside the current clipping region and is therefore not visible.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption ColorObfuscatedText
The font color of some document text closely matches the background color of the text resulting in text that is not visible in the authoring application. This feature targets the more common ways to obfuscate text by setting the text color to match a solid background color and includes consideration for numerous cases where the background is inherited from underlying objects. Complex backgrounds that include underlying images, objects, shapes, and transparency may inadvertantly generate false positives and false negatives.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final SecureOptions.ColorObfuscatedTextRemediationOption ColorObfuscatedTextRemediation
Option that effects how remediation of color obfuscated text is performed.
Only the following values are allowed:
ColorObfuscatedTextOption.AdjustColor
When the Color Obfuscated Text option is set to scrub, the color obfuscated text will be exposed by adjusting either the font color or the applicable background color depending on the context. In rare circumstances the obfuscation cannot be removed because modifying the font or background color would risk adding new obfuscation elsewhere in the document.
ColorObfuscatedTextOption.RemoveText
When the Color Obfuscated Text option is set to scrub, the color obfuscated text will be exposed by either removing or replacing the obfuscated text with spaces.
Default value is
COLOROBFUSCATEDTEXTREMEDIATION_AdjustColor
public static final ScrubOption Comments
Microsoft Office supports adding user comments to a document through the 'Insert > Comment' command. Comments often contain private or sensitive information.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above, Microsoft Excel 2007 and above binary, Microsoft Word 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption ContentProperties
Content properties are viewable in Office using the 'File > Properties > Contents' command. They are document properties that provide a view into some of the content within the document. These properties include: Title and Headings in Word documents, Sheet Names and Named Ranges in Excel documents, and Fonts Used, Design Template, and Slide Titles in PowerPoint documents.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption CustomProperties
Custom document properties can be created using the 'File > Properties > Custom' command. They may include user defined properties or application generated properties. Custom properties include: Checked by, Client, Date completed, Department, Destination, Disposition, Division, Document number, Editor, Forward to, Group, Language, Mailstop, Matter, Office, Owner, Project, Publisher, Purpose, Received from, Recorded by, Recorded date, Reference, Source, Status, Telephone number, Typist, and all other user defined properties and application generated properties.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption CustomXML
Custom XML data added to the document through various means
Applies to Microsoft Word 2007 and above, Microsoft Office 2007 and above, Microsoft Word 2003, Microsoft Excel 97 thru 2003, Microsoft Word 97 thru 2003, Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption DatabaseQueries
Microsoft Office supports powerful connectivity to databases that results in database connection and query information being stored in Office documents. This information may include a path or URL to a database server, the database username, database password and SQL query strings, all of which can be highly sensitive information.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption DefaultScrubBehavior
Defines the behavior of a ScrubOption that has the value of DEFAULT. Setting this option to DEFAULT itself has the same effect as setting it to NONE.
Applies to All formats
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption DocumentVariables
Document variables are named pieces of data that can be attached to PowerPoint documents.
Applies to Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption EmbeddedObjects
The Office embedded object feature (Insert > Object..) allows embedding an object into the document that is created and served by another application. The resulting object data may then contain any of the hidden and sensitive data issues found in the serving application. Adobe PDF documents may include attached documents through the embedded files feature of the PDF format. Files embedded in a PDF document are detected under this analysis option.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above, Microsoft Excel 2007 and above binary, Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final AnalyzeOption Encryption
The document is encrypted and most analysis and scrubbing requests cannot be accomplished. This is distinguished from
ScrubOptions.WeakProtection
in that it cannot be easily circumvented short of brute force or dictionary based password attacks. However, using the Microsoft Office encryption feature (Tools > Options > Security > Password to open) does not encrypt the entire document, potentially leaving document properties and embeddings into Word and Excel unencrypted. Both Office and PDF documents can be encrypted with a default password. Clean Content will test the default password and decrypt the document when used on PowerPoint and PDF documents.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption ExcelDataModel
Indicates the Excel workbook contains a relational data source and corresponding connection information to other data sources. Office Excel 2013 introduced the Data Model extension to allow integrating data from multiple tables, effectively building a relational data source inside an Excel workbook. The data model leverages a binary stream that stores a tabular data model of all data that has been imported into the data model. It also includes the definition of each data source, including connection information required for external data sources (connection strings and potentially passwords), as well as relationships between tables, user-defined hierarchical relationships between columns, and calculated columns that are a function of existing columns. Scrubbing of this data is not supported due to the complexities of disconnecting dependencies from tables, queries, pivot tables. Detection is provided to allow the risk to be surfaced and reviewed.
Applies to Microsoft Excel 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption ExtremeCells
The Extreme Cells target indicates that ranges of spreadsheet cells within the document are located an extreme distance from other cell ranges. The definition of an extreme cell range can be controlled by setting two options; Extreme Cell Horizontal Gap Allowance and Extreme Cell Vertcal Gap Allowance.
Applies to Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final IntegerOption ExtremeCellHorizontalGapAllowance
This option defines the maximum number of columns allowed between two cell ranges before they are treated as being two non-contiguous cell ranges. When an otherwise contiguous block of cells are separated by a greater number of columns they may be treated as extreme cells during analysis.
Applies to Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
10
public static final IntegerOption ExtremeCellVerticalGapAllowance
This option defines the maximum number of rows allowed between two cell ranges before they are treated as being two non-contiguous cell ranges. When an otherwise contiguous block of cells are separated by a greater number of rows they may be treated as extreme cells during analysis.
Applies to Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
40
public static final AnalyzeOption ExtremeIndenting
The Extreme Indenting target indicates that indent, margin, gutter or other settings could result in text that is off the page or outside a table or column. Such text will not display or print. Note that the existence of the Extreme Indenting target does not guarantee that text is hidden; only that text may be hidden.
Applies to Microsoft Word 2007 and above, Microsoft Word 97 thru 2003
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption ExtremeObjects
The Extreme Objects target identifies embedded, linked, and graphic objects that have been positioned in such a way that a majority of the object may fall outside the reasonable viewing area when viewed or printed in the authoring application. This may include objects positioned outside the slide or speaker note frame in PowerPoint, and in an extreme cell range in Excel documents. Extreme objects are reported but modifications can only be made upon author review in the authoring application.
Applies to Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above, Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption FastSaveData
The fast save feature in Microsoft Word and PowerPoint is set using the 'Tools > Options > Save > Allow fast saves' command. When fast save is activated deleted text and data can remain in the file even though it is no longer visible or accessible from within the application. Adobe PDF documents may also include earlier revisions of nearly any type of content through the Incremental Update feature of the file format.
Applies to Microsoft Word 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption HeadersFooters
Headers and footers in documents, spreadsheets and presentations. When this option is set to Scrub, the scrubbing behavior may be modified using the
HeadersFootersSearch
,
HeadersFootersBehavior
and
HeadersFootersReplace
options.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above, Microsoft Excel 2007 and above binary, Microsoft PowerPoint 97 thru 2003, Microsoft Excel 97 thru 2003
Default value is
ScrubOption.Action.ANALYZE
public static final StringListOption HeadersFootersSearch
This option is a list of regular expressions that will be used to test the text of each header or footer. When the first match is found the behavior defined by the corresponding item in the
HeadersFootersBehavior
list is executed against that header or footer. If no match is found the header or footer will be scrubbed in its entirety. This option is only valid if the
HeadersFooters
scrub target is set to Scrub. If this option is set, both the
HeadersFootersBehavior
and
HeadersFootersReplace
lists must be set and the lengths of all three lists must be the same.
public static final SecureOptions.HeadersFootersBehaviorOption HeadersFootersBehavior
This option is a list of behaviors to perform that maps one to one with the regular expressions in the
HeadersFootersSearch
list. See the
HeadersFootersSearch
option for more details. If the behavior is Replace, the corresponding item in the
HeadersFootersReplace
list will be used as the replacement text.
Default value is
HEADERSFOOTERSBEHAVIOR_Scrub
public static final StringListOption HeadersFootersReplace
This option is a list of strings that maps one to one with the behaviors in the
HeadersFootersBehavior
list. A given item is ignored (and may be null or a empty string) unless associated item in the
HeadersFootersBehavior
list is set to Replace.
public static final AnalyzeOption HiddenCells
Spreadsheet rows, columns, or worksheets that have been hidden. Hidden cells may contain sensitive data that requires user review prior to release. Hidden cells can be identified during analysis and can be made visible by setting the Unhide Hidden Cells option. Hidden cells are not deleted or cleared when cleaned since they may be required to resolve references from visible cells.
Applies to Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption HiddenSlides
The PowerPoint hidden slide feature (Slide Show > Hide Slide) allows individual slides to be hidden during the slide show and printing of the presentation. Hidden slides may contain information that is not intended for general release.
Applies to Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption HiddenText
Text that has been intentionally hidden (Format > Font... > Font > Hidden) by the user may contain sensitive information that should be reviewed or removed before distributing the document.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption HybridExcel9597BookStream
Microsoft substantially changed the Excel format between Excel 95 and Excel 97. In order to maintain backwards compatbility with Excel 95 it was possible to store both versions of the file inside the XLS document. This target detects and optionally scrubs the 'Book' stream that hodls the Excel 95 version of the workbooks.
Applies to Microsoft Excel 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final BooleanOption SimulatePowerPointAnimationsDuringAssembly
This option applies to the assembly of PowerPoint 2007 and above (PPTX). When set, this option will cause slides that originally contained animation to be expanded into a series of slides that simulate the animations by hiding and restoring slide elements to simulate the entrance and exit of animated elements.
Applies to Microsoft PowerPoint 2007 and above
Default value is
false
public static final AnalyzeOption InvalidXML
Many applications that use XML formats, especially Microsoft's Office, do not strictly follow the XML format's schema when writing out documents. This target indicates that one or more invalid elements have been found and ignored.
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption UnknownXML
Many applications that use XML formats, especially Microsoft's Office, have situations where any element may appear or an particular namespace may be ignored. This target indicates that such an element is in a namespace that is not known and can therefore cannot be validated.
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption LinkedObjects
The Office linked object feature (Insert > Object...) allows linking to an external file that is managed and rendered by another application. These links can expose local and network path information.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 2007 and above, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption MacrosAndCode
Microsoft Office includes support for Visual Basic and can be used to create everything from simple macros to data entry forms to full blown applications. Visual Basic can also be used to create macro viruses that travel with documents. Adobe PDF documents may contain code in the form of Java Script.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft Excel 2007 and above binary, Microsoft PowerPoint 2007 and above, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption MeetingMinutes
Meeting minutes can be attached to PowerPoint documents with the PowerPoint Meeting Minder feature and are typically associated with an action item list. The action item list is included in the presentation as part of a slide or series of slides. The associated minutes are accessible only through the Meeting Minder user interface.
Applies to Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption OfficeGUIDProperty
The Office GUID property is a document property created by versions of Microsoft Office prior to the release of Office 2000. This globally unique identifier (GUID) can be used to identify the computer from which the document originated.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final AnalyzeOption OfficeXMLRogueParts
This target identifies the existence of parts that are not referenced or required by the document. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each rogue part using an OfficeXMLPartRisk element that provides further information about the part. Parts of this type are always removed when the OfficeXMLPartValidation option is enabled.
Applies to Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption OfficeXMLUnexpectedParts
This target identifies the existence of parts that may represent a disclosure risk if the offending part is not further inspected by human or machine review. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each unexpected part using an OfficeXMLPartRisk element that provides further information about the part.
Applies to Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption OfficeXMLUnanalyzedParts
This target identifies the existence of parts that may represent a disclosure risk if the offending part is not scrubbed from the document or further inspected by human or machine review. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each unanalyzed part using an OfficeXMLPartRisk element that provides further information about the part.
Applies to Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final AnalyzeOption OfficeXMLAlternateContentParts
This target identifies the existence of parts that may represent a disclosure risk if the offending part is not scrubbed from the document or further inspected by human or machine review. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each Alternate Content Choice part using an OfficeXMLPartRisk element that provides further information about the part.
Applies to Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption OutlookProperties
Outlook properties are custom document properties that may be added by Microsoft Outlook to Office documents when they are sent as attachments. These properties include the author, email address, subject of the email, and review cycle identifiers associated with the attachment.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final AnalyzeOption OverlappedObjects
The Overlapped Objects target identifies embedded, linked, and graphic objects that have been covered by another object thus obscuring some portion of the underlying object. At least 50% of an object must be covered to be treated as overlapped. Overlapped objects are reported but modifications can only be made upon author review in the authoring application.
Applies to Microsoft Excel 2007 and above, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 2007 and above, Microsoft PowerPoint 97 thru 2003
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption OverlappedText
Text may be covered by graphics elements that are drawn after the text operations. This target detects specific use cases where that may occur including rectangles and thick lines that are a known source of poor PDF text redaction. Detection of overlapped text is limited to specific use cases due to the complexity of the transparent imaging model. However, the common cases associated with poor text redaction are covered.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption PDFActions
The PDF format supports a set of interactive features called actions. Example actions include jumping to a particular destination in a document, thread, or URI location, launching an external file, playing a sound or movie, importing or submitting form data, executing JavaScript code, and numerous other interactive features. Actions can be associated with outline items, annotations, form fields, pages, or the document as a whole and can be triggered based on specific user or document interactions like opening the document, viewing a page, or selecting an outline item. Each triggering event can execute one or more actions in sequence. Each type of action is given its own scrub target while this target is provided to cover all actions in a single target.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption PDFGoToActions
The GoTo action can be executed from a variety of triggering events and causes the Viewer software to change the current view of the document to specific location within the document.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFGoToRActions
The GoToR (Go to remote location) action can be executed from a variety of triggering events and causes the Viewer software to change the current view to a specific location in another PDF file.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFGoToEActions
The GoToE (Go to remote location) action can be executed from a variety of triggering events and causes the Viewer software to change the current view to a specific location in another PDF file that is embedded in this or another PDF file..
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFLaunchActions
The Launch action can be executed from a variety of triggering events and causes the Viewer software to launch an application or open or print a document.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFThreadActions
The Thread action can be executed from a variety of triggering events and causes the Viewer software to change the current view of the document to specific location in an article thread within the document.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFURIActions
The URI action can be executed from a variety of triggering events and causes the Viewer software to resolve and open a resource described by a Uniform Resource Identifier.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFSoundActions
The Sound action can be executed from a variety of triggering events and causes the Viewer software to play the associated sound object.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFMovieActions
The Movie action can be executed from a variety of triggering events and causes the Viewer software to play the associated movie object that is stored as an external file.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFHideActions
The Hide action can be executed from a variety of triggering events and causes the Viewer software to change the visibility of annotations and form fields.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFNamedActions
The Named action can be executed from a variety of triggering events and causes the Viewer software to change the current view of the document to a specific named location in the current document. The supported named locations include NextPage, PrevPage, FirstPage, LastPage.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFSetOCGStateActions
The Set OCG State action can be executed from a variety of triggering events and sets the state of one or morel optional content groups.Optional content refers to sections of content in a PDF document that can be selectively viewed or hidden. Optional content features are typically seen in interactive PDF documents like CAD drawings or Maps.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFRenditionActions
The Rendition action can be executed from a variety of triggering events and controls the playback of multimedia content. The rendition action was introduced in PDF 1.5 to allow a far richer mechanism to control multimedia playback than supported by the earlier release Movie and Sound actions. Rendition actions can make use of extensive options to describe the location and sequence of multimedia content, the player to be used, allow for JavaScript execution to further control the playback, as well as many other parameters. Rendition actions are closely tied to a Screen annotation that specifies the region of a page where media clips are played.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFGoTo3DViewActions
The GoTo3D View action can be executed from a variety of triggering events and controls the view of a 3D annotation. PDF supports a rich collection of features to define and view three-dimensional objects, such as those used by CAD software. This action targets a 3D annotation and can change how the 3D artwork appears to the user by setting parameters such as lighting, rendering, and projection that control the virtual camera illustrating the 3D artwork.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFRichMediaActions
The Rich Media action can be executed from a variety of triggering events and identifies a rich media annotation and specifies a command to be sent to that annotation handler.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFJavaScriptActions
The JavaScript action can be executed from a variety of triggering events and causes Javascript code to be executed by the Java interpreter supported by the PDF Viewer. This is often used to dynamically control the view of a PDF document, particularly forms.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFSubmitFormActions
The Submit Form action can be executed from a variety of triggering events and transmits the names and values of selected form fields to a specified URL (uniform resource locator).
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFResetFormActions
The Reset Form action resets a selected set of interactive form fields causing their current values to return to a default value. It can be executed from a variety of triggering events.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFImportDataActions
The Import Data action imports Forms Data Format (FDF), XFSD, or XML into the interactive form fields of the PDF document and can be executed from a variety of triggering events.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFTransitionActions
The Transition action is used in a sequence of actions to define transition appearances during the sequence. It can be executed from a variety of triggering events.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFUnknownActions
Clean Content supports scrub targets for all PDF actions defined through Version 1.7 and the supplement to ISO 32000. Any PDF action that is not in the list of supported action is treated as an Unknown action. The most likely occurrence of an Unknown action is either due to an PDF file specification update supporting new actions or due to an attempt to create a custom action.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFAlternateImages
Alternate images are additional versions of an image that may be used by readers though there is no clear description on when or why.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFDeprecatedPostscriptObjects
Postscript objects embedded inside PDF documents. These objects are no longer recommended to be included in PDF documents.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFAlternatePresentations
Alternate Presentations allow a PDF document to be viewed in a slide show like manner. PDF 1.4 allowed a page to be viewed for a specified duration before moving into an automatic or user enabled page transition phase. PDF 1.5 allowed for a more extensive, JavaScript driven, alternate presentation rendering. This PDF feature is seldom used and has ben deprecated by ISO 32000-1. This target addresses both forms.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption PDFPrivateApplicationData
The PDF file format supports storing private data in PDF documents to allow extended functionality to be created by an application. This data is stored in the Page-Piece dictionary construct described in the PDF Reference manual. For example, it is common for applications such as Adobe Illustrator and Adobe Photoshop to store additional data using this feature. The Embedded Search Index feature supported by Adobe Acrobat is also enabled using this approach.The PDF Private Application Data target provides a general target for detecting and removing any private application data found in PDF documents that leverage the PieceInfo entry to store a Page-Piece construct.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption PDFEmbeddedSearchIndex
Adobe Acrobat supports an option to embed a search index into a PDF document. The search index makes user searches faster, particularly in large documents. This index is a private data structure supported by Adobe and may retain content from previous versions of the document. This scrub target is a child of the more general PDF Private Application Data target in order to allow this target to be scrubbed while leaving other private application data if desired.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFOtherPrivateApplicationData
The PDF file format supports storing private data in PDF documents to allow extended functionality to be created by an application. This scrub target specifically addresses private application data other than the Embedded Search Index private application data. The Embedded Search Index data is addressed by a specific target in order to provide explicit control over that use case.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFWebCaptureInformation
The PDF file format supports creating information from web or local files using a method called Web Capture. Content can be retrieved from the referenced external files, either once or through additional updates. The original web capture information is maintained in the PDF file.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFLegalAttestation
The PDF file format supports including information that describes the existence of any content that may result in unexpected rendering of a document. This information is commonly included in documents that also include a document certification signature. It can be used by PDF applications to determine the trustworthiness of a document. The information primarily indicates the use of certain PDF features like JavaScript, Launching, URI's, multimedia objects, and the like that may result in a document that will render differently in different environments.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final AnalyzeOption PDFDigitalSignatures
Digital signatures are used to authenticate the identity of the author and the contents of the document and may come in three forms. Digital signatures can be used for approval signatures, modifications and detection prevention, and to enable usage rights that are not available without the required signature.
Applies to Adobe Acrobat (PDF)
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption PDFThumbnailImages
Thumbnail images are typically used to provide a representation of each page in a PDF document that allows viewers to quickly render an image of each page. They can also be associated with an external file reference. Thumbnails have been deprecated from use in PDF as of ISO 32000-1 and can safely be scrubbed from files.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFAnnotations
The PDF format supports a set of interactive features called annotations. Example annotations include text, file attachments, watermarks, redaction, rich-media and numerous other interactive features. Each type of annotation has been categorized into a scrub target in order to provide finer control over detection and removal of the various types of annotations. This target is provided to cover all annotations in a single target.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption PDFTextAndFreeTextAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFLineMarkupAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFTextMarkupAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFGraphicalMarkupAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFFileAttachmentAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFScreenAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFPrintersMarkAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFWatermarkAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFRedactionAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFProjectionAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDF3DArtworkAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFSoundAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFMovieAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFLinkAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFRichMediaAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PDFTrapNetworkAnnotations
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PresentationNotes
The PowerPoint notes feature allows notes to be associated with each slide. Notes may contain general content or internal commentary that should be reviewed or removed prior to distributing a presentation.
Applies to Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption PrinterInformation
Printer setup information is often stored within a Microsoft Word or Excel document. In the case of network printers, this information may include potentially sensitive network share information and less sensitive printer model names.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final AnalyzeOption PrinterInformationContainsShares
The printer information described in
ScrubOptions.PrinterInformation
contained network share information. This information can provide dangerous insight into an enterprises internal network.
Default value is
AnalyzeOption.Action.ANALYZE
public static final ScrubOption RoutingSlip
The email routing feature of Microsoft Office (File > Send To > Routing Recipient) stores the email addresses and user names of recipients in the document.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption Scenarios
Microsoft Excel supports entering multiple data models within specific areas of a spreadsheet (Tools > Scenario...). Once a specific scenario is selected the remaining scenarios may expose data models that should not be exposed once the document is released to an outside party.
Applies to Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.ANALYZE
public static final ScrubOption SensitiveContentLinks
Microsoft Office and Acrobat PDF include a number of features that allow referencing an external document that is then pulled into the primary document while maintaining the original link. In Microsoft Office 2007 and above, the insert picture feature is an example that allows the inserted picture to optionally retain the link to the original file. Microsoft PowerPoint through versions up to 2003 allows external links to Audio and Video files. Microsoft Word (through 2003) uses an include field to provide non-OLE based linking to external files (Insert > Field->IncludeText and Insert > Field > IncludePicture). Any of these examples may contain fully qualified local paths or network paths. A content link is considered sensitive if it begins with 'file:' or begins with a drive letter followed by a colon or it begins with two backward slashes or it matches any of the regular expressions defined using the Sensitive Links Regular Expressions option. Note that OLE based linking is handled by the Linked Objects target.
Applies to Microsoft PowerPoint 97 thru 2003, Microsoft Word 97 thru 2003, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Word 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption SensitiveHyperlinks
The Adobe PDF (link annotations) and the Office hyperlink feature (Insert->Hyperlink) allows the creation of links to various locations. Two of the possibilities, fully qualified local paths and network paths, can provide unwanted insight into an organization's internal structure. A hyperlink is considered sensitive if it begins with 'file:', begins with a drive letter followed by a colon, begins with two backslashes, or it matches any of the regular expressions defined using the Sensitive Links Regular Expressions option.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above, Microsoft Excel 2007 and above binary, Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final StringListOption SensitiveLinksRegex
This option allows additional regex-based tests to be run against hyperlinks and content linkes to determine their sensitivity. A match against any of the regular expressions will cause the hyperlink to be clasified 'sensitive'. Hyperlinks classifed this way will be reported or scrubbed depending on the value of the SensitiveHyperlinks target. Content links classifed this way will be reported or scrubbed depending on the value of the SensitiveHyperlinks target.Any link that be with a single alpha character drive letter followed by a colon, or with the file: URI scheme is automatically considered sensitive.
public static final ScrubOption SensitiveIncludeFields
The Microsoft Word include field feature provides non-OLE based linking to external files (Insert > Field->IncludeText and Insert > Field > IncludePicture). These fields may contain fully qualified local paths or network paths.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption SizeObfuscatedText
The sizes of some of the character in the document are below the value defined by the SizeObfuscatedTextMinimum or above the value defined by SizeObfuscatedTextMaximum
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above, Microsoft PowerPoint 97 thru 2003, Microsoft PowerPoint 2007 and above
Default value is
ScrubOption.Action.ANALYZE
public static final IntegerOption SizeObfuscatedTextMinimum
Character sizes below this value (expressed in points) will be flaged by the SizeObfuscatedText target and will be reset to this value if SizeObfuscatedText is set to SCRUB.
Default value is
4
public static final IntegerOption SizeObfuscatedTextMaximum
Character sizes above this value (expressed in points) will be flaged by the SizeObfuscatedText target and will be reset to this value if SizeObfuscatedText is set to SCRUB.
Default value is
96
public static final ScrubOption SmartTags
Smart tags are a feature of Office that allows specific actions to be associated with text content that matches a pattern associated with each category of smart tags. For example, stock ticker symbols can be recognized and tagged in order make related actions available to the user whenever a ticker symbol is encountered.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above binary, Microsoft PowerPoint 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption StatisticProperties
Statistic properties (File > Properties > Statistics) are document properties that include: Created, Modified, Accessed, Printed, Last saved by, Revision number, Total editing time, Pages, Paragraphs, Lines, Words, Characters, Bytes, Notes, Hidden Slides, Multimedia clips, and Presentation format. Additional application maintained properties in this category include: Application name, Hyperlinks changed flag, Links up to date flag, and Scale flag. Some or all of these properties should be reviewed or removed prior to document distribution.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption StructuredDocumentTags
Structured Document Tags are a feature of Word 2007 and above that allows user input through gadgets such as date pickers and picture pickers.
Applies to Microsoft Word 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption SummaryProperties
Summary properties (File > Properties > Summary) are document properties that include: Title, Subject, Author, Manager, Company, Category, Keywords, Comment, Hyperlink Base, Template, and Preview Picture. Some or all of these properties should be reviewed or removed prior to document distribution.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption TemplateName
If a template other than Normal.dot is used, the document will contain a full path to the template file. This can expose local path or network share information.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption TrackedChanges
The change tracking feature of Microsoft Office tracks insertions, deletions and formatting changes made to the document. Such changes contain deleted text and author and date information that may be unintentionally left in the document upon distribution.
Applies to Microsoft Word 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption UninitializedDocfileData
The Microsoft Office binary file formats, among many other formats, leverage the Docfile file format (aka Structured Storage or Microsoft Compound File Binary File Format) to store a collection of data streams within a single file. This file allocation method allows data sectors to be allocated and freed as needed by the application (i.e. Word, Excel, and PowerPoint). This scrub target detects and optionally scrubs data sectors that are not currently in use but contain uninitialized (non-zero) data, including extra data sectors that may have been concatenated to the end of a valid file but are not intended to be part of the actual file.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Docfile
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption UserNames
A number of Office features cause user names to be saved in the document including the document properties Author and Last Saved By, document routing recipients, Word comment and tracked change authors, Excel scenario authors, file sharing participants, and the last user to edit a Microsoft Excel document or view a Microsoft PowerPoint document.
Applies to Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft PowerPoint 97 thru 2003, Microsoft Word 2007 and above, Microsoft Excel 2007 and above, Microsoft PowerPoint 2007 and above, Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption Versions
The versioning feature (File > Versions) in Microsoft Word allows multiple historical versions of a document to be saved within a single file. Versioning is useful during document creation but potentially sensitive once a document is released.
Applies to Microsoft Word 97 thru 2003
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption WeakProtections
Weak protections are features of an application that appear to provide a strong level of protection against specific user actions on the document but in fact can be easily removed from the file without access to a password. A protection is only considered weak if it requires a password to remove the protection. Protections that don't require passwords are considered simple but not weak since they don't imply any additional password based strength.
Applies to Microsoft Word 2007 and above, Microsoft Word 97 thru 2003, Microsoft Excel 97 thru 2003, Microsoft Excel 2007 and above
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption XMPMetadataStreams
Extensible Metadata Platform streams are used by a number of formats, including PDF, to associate metadata properties with an entire document or objects within a document. In PDF an XMP stream can be associated with the document and specific pages, drawing and image objects, and color profiles. Note that PDF often replicates a set of standard document properties into an XMP stream as well as its own internal property storage format. This type of metadate typically contains standard properties like Author and Title, but can be extended to include any type of metadata.
Applies to Adobe Acrobat (PDF)
Default value is
ScrubOption.Action.DEFAULT
public static final ScrubOption GPSData
Metadata may have location information about the source of the document or the location of the authors or consumers
Applies to Extensible Metadata Platform
Default value is
ScrubOption.Action.DEFAULT
public static SecureOptions getInstance()
public static Option deepMapOptionId(int id)
public Option mapOptionId(int id)
OptionContainer
mapOptionId
in class OptionContainer
id
- The unique string identifier for an optionpublic final Option[] getAllOptions()
OptionContainer
getAllOptions
in class OptionContainer
public final OptionContainer[] getAllContainers()
OptionContainer
getAllContainers
in class OptionContainer
public final Option mapOptionId(java.lang.String id)
OptionContainer
mapOptionId
in class OptionContainer
id
- The unique string identifier for an optionCopyright © 2021 Oracle. All right reserved. Restricted and confidential property of Oracle. Solely for use by recipent under agreement forbidding disclosure.