CleanContent::SecureOptions Class Reference

A class containing all the options available in this API. More...

List of all members.

Classes

class  ColorObfuscatedTextRemediationOption
 Enumeration class that includes possible values for the ColorObfuscatedTextRemediation option. More...
class  DecryptionStatusOption
 Enumeration class that includes possible values for the DecryptionStatus option. More...
class  Fields
 A container for Microsoft Word fields. Each field includes options that allow scrubbing and modification of that field. More...
class  HeadersFootersBehaviorOption
 Enumeration class that includes possible values for the HeadersFootersBehavior option. More...
class  OutputTypeOption
 Enumeration class that includes possible values for the OutputType option. More...
class  ProcessingStatusOption
 Enumeration class that includes possible values for the ProcessingStatus option. More...
class  Properties
 A container for document properties. Each property includes options that allow scrubbing and modification of that property. More...
class  ToTextEncodingOption
 Enumeration class that includes possible values for the ToTextEncoding option. More...

Static Public Member Functions

static Option MapOptionId (int uid)
 Map an option uid to a option object.

Static Public Attributes

static ScrubOption AlternativeText = new ScrubOption(1063)
 Text that is used as an alternative to displaying a graphic image in constrained viewing environments.
static ScrubOption AppsForOffice = new ScrubOption(1191)
 Apps for Office allow for integration of 3rd party applications into the Office applications.
static FileListOption AssembleFileList = new FileListOption(1102)
 List of PowerPoint files to be assembled into a new PowerPoint file. Input and output is currently limited to PowerPoint 97-2003.
static ScrubOption AudioVideoFilePaths = new ScrubOption(1011)
 Embedded audio and video objects that reference their data through a local or network share path.
static ScrubOption AuthorHistory = new ScrubOption(1012)
 Hidden author history in Microsoft Word document.
static AnalyzeOption AuthorHistoryContainsPaths = new AnalyzeOption(1014)
 Invisible author history contains paths.
static AnalyzeOption AuthorHistoryContainsShares = new AnalyzeOption(1013)
 Invisible author history contains network share names.
static BooleanOption BrokenPDFCorrection = new BooleanOption(1199)
 Enables correction of PDFs which has malformed internal structure.
static BooleanOption ChangeStartingPageNumber = new BooleanOption(1122)
 Modify the page number a document starts at.
static ObjectListOption CheckboxActions = new ObjectListOption(1133)
 List of actions to perform on named checkboxes in the document while scrubbing.
static ScrubOption ClippedText = new ScrubOption(1161)
 Some characters are hidden because they fall outside the current clipping path.
static ScrubOption ColorObfuscatedText = new ScrubOption(1090)
 Some characters are visually obscured due to the font color matching the background color.
static
ColorObfuscatedTextRemediationOption 
ColorObfuscatedTextRemediation = new ColorObfuscatedTextRemediationOption(1091)
 Option that effects how remediation of color obfuscated text is performed.
static ScrubOption Comments = new ScrubOption(1015)
 Author or reviewer comments in the document.
static ScrubOption ContentProperties = new ScrubOption(1016)
 Document properties categorized as content properties.
static ScrubOption CustomProperties = new ScrubOption(1017)
 Document properties categorized as custom properties.
static ScrubOption CustomXML = new ScrubOption(1115)
 Any custom XML data.
static ScrubOption DatabaseQueries = new ScrubOption(1018)
 Database connection and query information.
static ObjectOption DebugInfoCollector = new ObjectOption(1134)
 Oracle internal option.
static DecryptionStatusOption DecryptionStatus = new DecryptionStatusOption(1126)
 Provides information on if and how decryption took place.
static ScrubOption DefaultScrubBehavior = new ScrubOption(1080)
 The default scrub behavior.
static ScrubOption DocumentVariables = new ScrubOption(1043)
 Programmatic variables that can be stored in PowerPoint documents.
static HandlerOption ElementHandler = new HandlerOption(1050)
 Element handler that received the text and elements.
static ScrubOption EmbeddedObjects = new ScrubOption(1019)
 Data from other applications embedded in the document.
static StringOption EmbeddingExportBaseFileName = new StringOption(1058)
 Base part of the file name for exported embeddings.
static DirectoryOption EmbeddingExportDirectory = new DirectoryOption(1057)
 Directory to recieve exported embeddings.
static FileFormatListOption EmbeddingExportList = new FileFormatListOption(1056)
 List of file types that will be exported.
static IntegerOption EmbeddingRecurseDepth = new IntegerOption(1055)
 Maximum depth to which embeddings should be recursed.
static FileFormatListOption EmbeddingRecurseList = new FileFormatListOption(1054)
 List of file types that will be recursively processed.
static AnalyzeOption Encryption = new AnalyzeOption(1020)
 The document is encrypted.
static AnalyzeOption ExcelDataModel = new AnalyzeOption(1192)
 Indicates the Excel workbook contains a relational data source and corresponding connection information to other data sources.
static BooleanOption ExcludeProcessingInfoElement = new BooleanOption(1118)
 Do not include the processinginfo element in XML output. For Testing Only!
static FileOption ExportDocument = new FileOption(1059)
 Document that will contain exported data.
static IntegerOption ExportMaximumReplacementSize = new IntegerOption(1071)
 The maximum number of bytes that may be provided to overwrite the exported document.
static FileFormatListOption ExportPossibleReplacementFormats = new FileFormatListOption(1070)
 List of formats that may replace the exported document.
static BooleanOption ExportReplace = new BooleanOption(1072)
 The exported document should be replaced.
static FileOption ExportReplacementDocument = new FileOption(1073)
 File to replace the exported document with.
static FileFormatOption ExportReplacementFormat = new FileFormatOption(1074)
 File format of the ExportReplacementDocument.
static IntegerOption ExtremeCellHorizontalGapAllowance = new IntegerOption(1094)
 Number of columns allowed between cells that are treated as a contiguous range when determining extreme ranges.
static AnalyzeOption ExtremeCells = new AnalyzeOption(1093)
 Indicates the document contains one or more ranges of spreadsheet cells that are located an extreme distance from other cell ranges.
static IntegerOption ExtremeCellVerticalGapAllowance = new IntegerOption(1095)
 Number of rows allowed between cells that are treated as a contiguous range when determining extreme ranges.
static AnalyzeOption ExtremeIndenting = new AnalyzeOption(1092)
 Certain indenting, margin and other settings result in text that does not display or print.
static AnalyzeOption ExtremeObjects = new AnalyzeOption(1096)
 Indicates the document contains one or more objects that are positioned an extreme distance outside the standard viewing area.
static ScrubOption FastSaveData = new ScrubOption(1021)
 Text or other data that was 'deleted' but still exists in the file.
static BooleanOption FilterHyphensAtEndOfLine = new BooleanOption(1107)
 Detect and remove soft and hard hyphens found at the end of a line.
static BooleanOption FilterOverprintedText = new BooleanOption(1108)
 Detect and remove duplicate, overprinted text from extracted output.
static BooleanOption GenerateAcrobatHighlightPositions = new BooleanOption(1105)
 Generate the character highlight positions associated with the start of each word when extracting from PDF documents.
static BooleanOption GenerateGraphicDataFingerprint = new BooleanOption(1113)
 Generate a fingerprint element for each embedded graphic in the document.
static BooleanOption GenerateSlideAppearanceFingerprint = new BooleanOption(1112)
 Generate a fingerprint element for each slide based on the text, images, colors, shape positions, and applied master.
static BooleanOption GenerateSlideContentFingerprint = new BooleanOption(1109)
 Generate a fingerprint element based on the text and image content found for each slide.
static ScrubOption GPSData = new ScrubOption(1193)
 GPS location information.
static ScrubOption HeadersFooters = new ScrubOption(1098)
 Headers and footers.
static HeadersFootersBehaviorOption HeadersFootersBehavior = new HeadersFootersBehaviorOption(1101)
 Headers and footers behavior list.
static StringListOption HeadersFootersReplace = new StringListOption(1100)
 Headers and footers replace list.
static StringListOption HeadersFootersSearch = new StringListOption(1099)
 Headers and footers search list.
static AnalyzeOption HiddenCells = new AnalyzeOption(1022)
 Hidden spreadsheet columns, rows, or worksheets.
static ScrubOption HiddenSlides = new ScrubOption(1023)
 Slides that have been hidden from presentation and printing.
static ScrubOption HiddenText = new ScrubOption(1024)
 Text that has been hidden by the author.
static ScrubOption HybridExcel9597BookStream = new ScrubOption(1195)
 A redundant storage of Excel workbooks created for backwards combpatibility with Excel 95.
static BooleanOption IncludeLocators = new BooleanOption(1121)
 Include locator elements in output.
static AnalyzeOption InvalidXML = new AnalyzeOption(1159)
 Found XML elements that are invalid against the schema.
static BooleanOption JustAnalyze = new BooleanOption(1001)
 Ignore all action settings and just analyze.
static BooleanOption JustAssemble = new BooleanOption(1104)
 Assemble the source PowerPoint file list into a single PowerPoint document, merging all slides.
static BooleanOption JustDisassemble = new BooleanOption(1103)
 Disassemble the source PowerPoint document into individual PowerPoint documents containing one slide each.
static BooleanOption JustIdentify = new BooleanOption(1114)
 Ignore all other settings and just identify the file format of the source document.
static ScrubOption LinkedObjects = new ScrubOption(1025)
 Links to files from other applications.
static ObjectListOption LocatorActions = new ObjectListOption(1120)
 List of locator-based actions to perform on the document while scrubbing.
static BooleanOption LoggedError = new BooleanOption(1110)
 An error occured and was logged while processing the document.
static BooleanOption LoggedWarning = new BooleanOption(1111)
 A warning occured and was logged while processing the document.
static ObjectOption Logger = new ObjectOption(1106)
static ScrubOption MacrosAndCode = new ScrubOption(1026)
 Macros and other executable code.
static ScrubOption MeetingMinutes = new ScrubOption(1044)
 Meeting minutes entered using the PowerPoint Meeting Minder feature.
static ScrubOption OfficeGUIDProperty = new ScrubOption(1027)
 A document property that provides a globally unique identifier (GUID) of the document and originating computer.
static AnalyzeOption OfficeXMLAlternateContentParts = new AnalyzeOption(1131)
 This document contains parts that represent some level of disclosure risk if not scrubbed or further analyzed.
static BooleanOption OfficeXMLCanonicalization = new BooleanOption(1171)
 Enable the process that canonicalizes Office XMLs.Note ScrubOption OfficeXMLFeatures must be set to canonicalize the file.
static BooleanOption OfficeXMLFeatures = new BooleanOption(1212)
 Enable the features which does inspection and sanitatization of Office XMLs vulnerabilities.
static BooleanOption OfficeXMLPartValidation = new BooleanOption(1132)
 Enable the process that validates all Office parts found in Office Open XML formats.
static BooleanOption OfficeXMLRenameNamespacePrefix = new BooleanOption(1196)
 Rename namespace prefixes in all XML inside a MS office file.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefixes.
static AnalyzeOption OfficeXMLRogueParts = new AnalyzeOption(1128)
 This document contains parts are not are not referenced or required by the document that represent a significant unintentional disclosure risk if not scrubbed or further analyzed.
static AnalyzeOption OfficeXMLUnanalyzedParts = new AnalyzeOption(1130)
 This document contains parts that understood but not analyzed by the Clean Content analysis process.
static AnalyzeOption OfficeXMLUnexpectedParts = new AnalyzeOption(1129)
 This document contains parts that are not processed by the Clean Content analysis process.
static ScrubOption OutlookProperties = new ScrubOption(1028)
 Document properties added to Office document email attachments by Microsoft Outlook.
static OutputTypeOption OutputType = new OutputTypeOption(1052)
 Controls how the extracted data is returned to the developer.
static AnalyzeOption OverlappedObjects = new AnalyzeOption(1097)
 Indicates the document contains one or more objects that have been overlapped by another object.
static ScrubOption OverlappedText = new ScrubOption(1162)
 Some characters are hidden because they have been overlapped by a rectangular shape or image..
static StringListOption PasswordList = new StringListOption(1124)
 This option contains a list of passwords to be verified against password protected documents.
static ScrubOption PDF3DArtworkAnnotations = new ScrubOption(1178)
static ScrubOption PDFActions = new ScrubOption(1135)
 PDF supports a set of interactive features called actions that range from jumping to a particular destination in the document to submitting the data of an interactive form to a server. Individual targets are defined for each specific type of action. This target acts covers the entire set of actions as a single target.
static ScrubOption PDFAlternateImages = new ScrubOption(1188)
 Alternate versions of an image they may be used by readers.
static ScrubOption PDFAlternatePresentations = new ScrubOption(1185)
 Alternate Presentations can be used to view a PDF document in an alternative way more consistent with a presentation rendition.
static ScrubOption PDFAnnotations = new ScrubOption(1166)
 PDF supports a set of interactive features called annotations that allow numerous types of content to be associated with a page location or provide user interaction.. This target covers the entire set of actions as a single target.
static ScrubOption PDFDeprecatedPostscriptObjects = new ScrubOption(1189)
 Postscript objects embedded inside PDF documents.
static AnalyzeOption PDFDigitalSignatures = new AnalyzeOption(1165)
 Digital signatures are used to authenticate the identity of the author and the contents of the document.
static ScrubOption PDFEmbeddedSearchIndex = new ScrubOption(1157)
 Indicates that the document contains an embedded search index provided to make text searches faster within Adobe Acrobat.
static ScrubOption PDFFileAttachmentAnnotations = new ScrubOption(1172)
static ScrubOption PDFGoTo3DViewActions = new ScrubOption(1148)
 The GoTo3D View action controls the view of a 3D annotation.
static ScrubOption PDFGoToActions = new ScrubOption(1136)
 The GoTo action causes the Viewer software to change the current view of the document to specific location within the document.
static ScrubOption PDFGoToEActions = new ScrubOption(1138)
 The GoToE (Go to embedded file) action causes the Viewer software to change the current view to a specific location in another PDF file that is embedded in this or another PDF file.
static ScrubOption PDFGoToRActions = new ScrubOption(1137)
 The GoToR (Go to remote location) action causes the Viewer software to change the current view to a specific location in another PDF file.
static ScrubOption PDFGraphicalMarkupAnnotations = new ScrubOption(1170)
static ScrubOption PDFHideActions = new ScrubOption(1144)
 The Hide action causes the Viewer software to change the visibility of annotations and form fields.
static ScrubOption PDFImportDataActions = new ScrubOption(1153)
 The Import Data action imports Forms Data Format (FDF), XFSD, or XML into the interactive form fields of the PDF document.
static ScrubOption PDFJavaScriptActions = new ScrubOption(1150)
 The JavaScript Action causes Javascript code to be executed by the Java interpreter supported by the PDF Viewer.
static ScrubOption PDFLaunchActions = new ScrubOption(1139)
 The Launch action launches an application or opens or prints a document.
static ScrubOption PDFLegalAttestation = new ScrubOption(1164)
 Information that specifies the existence of content that may result in unexpected rendering of a document.
static ScrubOption PDFLineMarkupAnnotations = new ScrubOption(1168)
static ScrubOption PDFLinkAnnotations = new ScrubOption(1181)
static IntegerOption PDFMinimumImageDimensionRequiredToProcess = new IntegerOption(1194)
 The minimum pixel width and height required to process an image inside a PDF.
static ScrubOption PDFMovieActions = new ScrubOption(1143)
 The Movie action causes the Viewer software to play a movie object that is stored as an external file.
static ScrubOption PDFMovieAnnotations = new ScrubOption(1180)
static ScrubOption PDFNamedActions = new ScrubOption(1145)
 The Named action causes the Viewer software to change the current view of the document to a specific named location in the current document.
static ScrubOption PDFOtherPrivateApplicationData = new ScrubOption(1158)
 Indicates that the document contains private application data other than an embedded search index.
static ScrubOption PDFPrintersMarkAnnotations = new ScrubOption(1174)
static ScrubOption PDFPrivateApplicationData = new ScrubOption(1156)
 Private data stored in PDF documents by applications using the PDF Page-Piece dictionary construct.
static ScrubOption PDFProjectionAnnotations = new ScrubOption(1177)
static ScrubOption PDFRedactionAnnotations = new ScrubOption(1176)
static ScrubOption PDFRenditionActions = new ScrubOption(1147)
 The Rendition action controls the playback of multimedia content.
static ScrubOption PDFResetFormActions = new ScrubOption(1152)
 The Reset Form action resets a selected set of interactive form fields.
static ScrubOption PDFRichMediaActions = new ScrubOption(1149)
 The Rich Media action identifies a rich media annotation and specifies a command to be sent to that annotation handler. Rich media PDF contstructs support playing a SWF file to provide enhanced rich media. The command defined in this action can either be an ActionScript or JavaScript function name.
static ScrubOption PDFRichMediaAnnotations = new ScrubOption(1182)
static ScrubOption PDFScreenAnnotations = new ScrubOption(1173)
static ScrubOption PDFSetOCGStateActions = new ScrubOption(1146)
 The Set OCG State action sets the state of one or morel optional content groups.
static ScrubOption PDFSoundActions = new ScrubOption(1142)
 The Sound action causes the Viewer software to play a sound object.
static ScrubOption PDFSoundAnnotations = new ScrubOption(1179)
static ScrubOption PDFSubmitFormActions = new ScrubOption(1151)
 The Submit Form action transmits the names and values of selected form fields to a specified URL.
static ScrubOption PDFTextAndFreeTextAnnotations = new ScrubOption(1167)
static ScrubOption PDFTextMarkupAnnotations = new ScrubOption(1169)
static ScrubOption PDFThreadActions = new ScrubOption(1140)
 The Thread action causes the Viewer software to change the current view of the document to specific location in an article thread within the document.
static ScrubOption PDFThumbnailImages = new ScrubOption(1186)
 Thumbnail images are small images that provide a represenation of either a PDF page or an externally referenced file.
static ScrubOption PDFTransitionActions = new ScrubOption(1154)
 The Transition action is used in a sequence of actions to define transition appearances during the sequence.
static ScrubOption PDFTrapNetworkAnnotations = new ScrubOption(1183)
static ScrubOption PDFUnknownActions = new ScrubOption(1155)
 Any action that is not in the list of supported actions is treated as an Unknown action.
static ScrubOption PDFURIActions = new ScrubOption(1141)
 The URI action causes the Viewer software to resolve and open a resource described by a Uniform Resource Identifier.
static ScrubOption PDFWatermarkAnnotations = new ScrubOption(1175)
static ScrubOption PDFWebCaptureInformation = new ScrubOption(1163)
 Data stored in PDF documents used to import content from external Web pages.
static ScrubOption PresentationNotes = new ScrubOption(1029)
 Notes associated with a slide presentation.
static ScrubOption PrinterInformation = new ScrubOption(1030)
 Printer information in the document.
static AnalyzeOption PrinterInformationContainsShares = new AnalyzeOption(1032)
 Printer information that includes network share names.
static ProcessingStatusOption ProcessingStatus = new ProcessingStatusOption(1125)
 Describes why the document could not be processed.
static BooleanOption PropertiesOnly = new BooleanOption(1053)
 Extract only properties from the document.
static IntegerOption RequestTimeout = new IntegerOption(1088)
 Amount of time in milliseconds a request can execute before being timed out.
static FileOption ResultDocument = new FileOption(1051)
 Document that will contain the extracted data.
static FileOption ResultTransform = new FileOption(1062)
 The XSLT document with which to process the result XML.
static ScrubOption RoutingSlip = new ScrubOption(1031)
 Email routing information.
static ScrubOption Scenarios = new ScrubOption(1065)
 Scenarios are an Excel feature that allow for multiple data models.
static FileOption ScrubbedDocument = new FileOption(1008)
 The scrubbed document.
static FileFormatOption ScrubbedFormat = new FileFormatOption(1117)
 The new file format for the scrubbed document.
static ScrubOption SensitiveContentLinks = new ScrubOption(1184)
 Sensitive paths or URI's to external content that is to be included in this file.
static ScrubOption SensitiveHyperlinks = new ScrubOption(1034)
 Hyperlinks containing either fully qualified local paths or network share names.
static ScrubOption SensitiveIncludeFields = new ScrubOption(1035)
 INCLUDETEXT and INCLUDEPICTURE fields containing either fully qualified local paths or network share names.
static StringListOption SensitiveLinksRegex = new StringListOption(1046)
 List of regular expressions against which hyperlinks and content links should be tested to determine their sensitivity.
static BooleanOption SimulatePowerPointAnimationsDuringAssembly = new BooleanOption(1197)
 Simulate PowerPoint Animations During Assembly.
static ScrubOption SizeObfuscatedText = new ScrubOption(1081)
 Some character's sizes are outside a certain normal range.
static IntegerOption SizeObfuscatedTextMaximum = new IntegerOption(1083)
 Maximum size a character may have when analyzing/scrubbing the SizeObfuscatedText target.
static IntegerOption SizeObfuscatedTextMinimum = new IntegerOption(1082)
 Minimum size a character may have when analyzing/scrubbing the SizeObfuscatedText target.
static ScrubOption SmartTags = new ScrubOption(1066)
 Tags applied to text that matches a defined pattern allowing specific actions to be executed based on the category of the smart tag.
static FileOption SourceDocument = new FileOption(1006)
 The document to process.
static FileFormatOption SourceFormat = new FileFormatOption(1009)
 The file format of the source document.
static IntegerOption StartingPageNumber = new IntegerOption(1123)
 The page number used when modifying a document's starting page number.
static ScrubOption StatisticProperties = new ScrubOption(1036)
 Document properties categorized as statistics properties.
static ScrubOption StructuredDocumentTags = new ScrubOption(1116)
 Word's Structure dDocument Tags.
static ScrubOption SummaryProperties = new ScrubOption(1037)
 Document properties categorized as summary properties.
static ScrubOption TemplateName = new ScrubOption(1038)
 If a template other than Normal.dot is used the document will contain a full path to the template file.
static BooleanOption TimeoutUsingThreadStop = new BooleanOption(1198)
 If set to 'true', requests in tight infinite loops will be stopped using the depricated Thread.stop method.
static ToTextEncodingOption ToTextEncoding = new ToTextEncodingOption(1127)
 Controls the encoding when extracted data is returned as text.
static ScrubOption TrackedChanges = new ScrubOption(1039)
 Tracked changes in the document.
static BooleanOption TransformResult = new BooleanOption(1060)
 Perform an XML transform on the result document.
static BooleanOption UnhideHiddenCells = new BooleanOption(1002)
 Unhide hidden spreadsheet cells.
static ScrubOption UninitializedDocfileData = new ScrubOption(1160)
 Uninitialized data segments found in the Docfile format leveraged by Office 2003 and below and many other formats.
static AnalyzeOption UnknownXML = new AnalyzeOption(1190)
 Found XML elements in unknown namespaces.
static ScrubOption UserNames = new ScrubOption(1040)
 The names of users associated with the document.
static BooleanOption ValidateEmbeddedContent = new BooleanOption(1211)
 Enable the process that validates all embedded contents found in Office Open XML formats.
static ScrubOption Versions = new ScrubOption(1041)
 Version information in Word documents.
static BooleanOption WasException = new BooleanOption(1087)
 An exception occured while processing the document.
static BooleanOption WasIdentified = new BooleanOption(1086)
 The source document was identified.
static BooleanOption WasProcessed = new BooleanOption(1010)
 The source document was scrubbed, analyzed or extracted.
static BooleanOption WasSupported = new BooleanOption(1084)
 The source document's file format is supported.
static BooleanOption WasTimeout = new BooleanOption(1089)
 Document took long than the request's RequestTimeout value to process.
static ScrubOption WeakProtections = new ScrubOption(1042)
 Weak or easily breakable protections and passwords.
static ScrubOption XMLBoundedSpaces = new ScrubOption(1209)
 Bounded whitespaces can be used to indent text.Note ScrubOption OfficeXMLFeatures must be set to scrub bounded spaces.
static ScrubOption XMLCDATA = new ScrubOption(1203)
 XML CDATA refers to character data.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML CDATA.
static ScrubOption XMLComment = new ScrubOption(1201)
 XML Comments are used to provide semantic information to the human reader.Note ScrubOption OfficeXMLFeatures must be set extract and scrub XML Comments.
static ScrubOption XMLExternalEntity = new ScrubOption(1205)
 XML external entity are references to external file.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML external entity.
static ScrubOption XMLPI = new ScrubOption(1202)
 XML Processing instruction can be used to pass information to applications.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XMP Processing instruction.
static ScrubOption XMLRenameNamespacePrefix = new ScrubOption(1207)
 XML namespace prefix are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefix.
static ScrubOption XMLUnknownNamespace = new ScrubOption(1204)
 XML namespace in the document which is not part of whitelisted namespace list.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML UnknownNamespace.
static ScrubOption XMLUnusedNamespaces = new ScrubOption(1208)
 XML namespace are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML unused namespaces.
static ScrubOption XMPMetadataStreams = new ScrubOption(1187)
 XMP Metadata streams are leveraged to store metadata properties using the Extensible Metadata Platform standard.

Properties

static Option[] AllOptions [get]
 A list of all available options.

Detailed Description

A class containing all the options available in this API.


Member Function Documentation

static Option CleanContent::SecureOptions::MapOptionId ( int  uid  )  [static]

Map an option uid to a option object.


Member Data Documentation

Text that is used as an alternative to displaying a graphic image in constrained viewing environments.

Each graphic image and shape in a document may include an optional piece of text that can be used in place of the image when viewing the document in a constrained environment.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is ScrubOption.Action.DEFAULT

Apps for Office allow for integration of 3rd party applications into the Office applications.

Apps for Office allow for integration of 3rd party applications into the Office applications using web technologies. There are two types of Web extensions; content and taskpane. Web extensions enable 3rd party applications to tightly integrate into Office using web based interfaces like JavaScript, HTML5, CSS3. A Web extension runs inside of a web page frame within Office. The web page is served by some web server and the page has access to the Office document object model allowing rich feature connections between document content and the 3rd party web app. Content extensions contribute to content directly within a frame of the document. Taskpane extensions enable user interactions that enhance the authoring process but don’t directly generate document content (for example a dictionary app).

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

List of PowerPoint files to be assembled into a new PowerPoint file. Input and output is currently limited to PowerPoint 97-2003.

This option is used when the JustAssemble option is set to true. The set of files defined by this option will be assembled into a new PowerPoint document.

Embedded audio and video objects that reference their data through a local or network share path.

Microsoft PowerPoint supports linking to audio and video files using the 'Insert > Movies and Sounds > Movie from File' and 'Insert > Movies and Sounds > Sound from File' commands. Use of this feature results in storing a potentially sensitive link to a local or network file path. Note that this type of path can also be removed only when it is considered sensitive using the Sensitive Content Links target .

Applies to:

  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above

Default value is ScrubOption.Action.DEFAULT

Hidden author history in Microsoft Word document.

Up to the last 10 authors that saved the document are stored in an area of the document that is inaccessible using the Word application. In Word 97 and Word 2000 this information also contains the paths where the document was saved and may include sensitive user logon or network share information.

Applies to:

  • Microsoft Word 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Invisible author history contains paths.

The hidden author history contains the last 10 fully qualified path names where the document was saved.

Default value is AnalyzeOption.Action.ANALYZE

Invisible author history contains network share names.

The hidden author history contains network share names. This information can provide dangerous insight into an organization's internal network.

Default value is AnalyzeOption.Action.ANALYZE

Enables correction of PDFs which has malformed internal structure.

Major PDF reader like Adobe allows many deviations from standard PDF specifications. But Clean Content parser strictly follows the specification to read the pdf streams.Hence, though many readers can open broken pdf documents as they overlay the broken streams to correct the malformed internal structure. This option enables the correction of such broken Pdf documents if parsing fails for given input Pdf. Though Clean Content will try to recover as many PDFs it can , but there may be such broken streams which are too malformed to auto-correction. Therefore, there will be limitations to PDF corrections.

Default value is false

Modify the page number a document starts at.

When this option is true the StartingPageNumber option is used to modify the page number a document starts at.

Applies to:

  • Microsoft Word 2007 and above

Default value is false

List of actions to perform on named checkboxes in the document while scrubbing.

A List of actions to perform on named checkboxes in the document while scrubbing.

Some characters are hidden because they fall outside the current clipping path.

The PDF file format allows a clipping path to be established that limits the region of the page affected by painting operations including text drawing. The page boundary inherently establishes the initial clipping region and it can be adjusted from there as needed. This target detects the existence of text that is drawn outside the current clipping region and is therefore not visible.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

Some characters are visually obscured due to the font color matching the background color.

The font color of some document text closely matches the background color of the text resulting in text that is not visible in the authoring application. This feature targets the more common ways to obfuscate text by setting the text color to match a solid background color and includes consideration for numerous cases where the background is inherited from underlying objects. Complex backgrounds that include underlying images, objects, shapes, and transparency may inadvertantly generate false positives and false negatives.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

Option that effects how remediation of color obfuscated text is performed.

Option that effects how remediation of color obfuscated text is performed.

Default value is AdjustColor

Author or reviewer comments in the document.

Microsoft Office supports adding user comments to a document through the 'Insert > Comment' command. Comments often contain private or sensitive information.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft Excel 2007 and above binary
  • Microsoft Word 2007 and above

Default value is ScrubOption.Action.DEFAULT

Document properties categorized as content properties.

Content properties are viewable in Office using the 'File > Properties > Contents' command. They are document properties that provide a view into some of the content within the document. These properties include: Title and Headings in Word documents, Sheet Names and Named Ranges in Excel documents, and Fonts Used, Design Template, and Slide Titles in PowerPoint documents.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is ScrubOption.Action.DEFAULT

Document properties categorized as custom properties.

Custom document properties can be created using the 'File > Properties > Custom' command. They may include user defined properties or application generated properties. Custom properties include: Checked by, Client, Date completed, Department, Destination, Disposition, Division, Document number, Editor, Forward to, Group, Language, Mailstop, Matter, Office, Owner, Project, Publisher, Purpose, Received from, Recorded by, Recorded date, Reference, Source, Status, Telephone number, Typist, and all other user defined properties and application generated properties.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Any custom XML data.

Custom XML data added to the document through various means

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Office 2007 and above
  • Microsoft Word 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft Word 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Database connection and query information.

Microsoft Office supports powerful connectivity to databases that results in database connection and query information being stored in Office documents. This information may include a path or URL to a database server, the database username, database password and SQL query strings, all of which can be highly sensitive information.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

Oracle internal option.

Oracle internal option

Provides information on if and how decryption took place.

An enumeration of the possible outcomes of decryption.

Default value is NotEncrypted

The default scrub behavior.

Defines the behavior of a ScrubOption that has the value of DEFAULT. Setting this option to DEFAULT itself has the same effect as setting it to NONE.

Applies to:

  • All formats

Default value is ScrubOption.Action.ANALYZE

Programmatic variables that can be stored in PowerPoint documents.

Document variables are named pieces of data that can be attached to PowerPoint documents.

Applies to:

  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Element handler that received the text and elements.

This options allows the developer to provide an object that implements the ElementHandlerinterface. This object will receive the text and elements during the executemethod in ExtractRequestThis option is only valid if the OutputTypeoption is set to OUTPUTTYPE_TOHANDLER.

Data from other applications embedded in the document.

The Office embedded object feature (Insert > Object..) allows embedding an object into the document that is created and served by another application. The resulting object data may then contain any of the hidden and sensitive data issues found in the serving application. Adobe PDF documents may include attached documents through the embedded files feature of the PDF format. Files embedded in a PDF document are detected under this analysis option.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above
  • Microsoft Excel 2007 and above binary
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Base part of the file name for exported embeddings.

This option defines the beginning of the file name used when exporting embeddings (embedded images, OLE embeddings, etc.) to EmbeddingExportDirectory. The rest of the file name and the file's extension is format specific.

Default value is null

Directory to recieve exported embeddings.

This option defines the directory where exported embedding (embedded images, OLE embeddings, etc.) files should be placed. File naming is format specific and cannot be modified at this time. This value defaults to the process's current directory.

List of file types that will be exported.

This option defines a list of file types that when found as embeddings (embedded images, OLE embeddings, etc.) will be exported as stand alone files.

Maximum depth to which embeddings should be recursed.

This option sets a limit as to how 'deep' embedding recursion will go. Setting this value to 0 will disable embedding recursion even for file formats defined in the EmbeddingRecurseList. Setting this value to 1 will allow one level of recursion and so on.

Default value is 0

List of file types that will be recursively processed.

This option defines a list of file types that when found as embeddings (embedded images, OLE embeddings, etc.) should be recursively processed. The embeddings will be processed using the same options as the main document.

The document is encrypted.

The document is encrypted and most analysis and scrubbing requests cannot be accomplished. This is distinguished from ScrubOptions.WeakProtectionin that it cannot be easily circumvented short of brute force or dictionary based password attacks. However, using the Microsoft Office encryption feature (Tools > Options > Security > Password to open) does not encrypt the entire document, potentially leaving document properties and embeddings into Word and Excel unencrypted. Both Office and PDF documents can be encrypted with a default password. Clean Content will test the default password and decrypt the document when used on PowerPoint and PDF documents.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is AnalyzeOption.Action.ANALYZE

Indicates the Excel workbook contains a relational data source and corresponding connection information to other data sources.

Indicates the Excel workbook contains a relational data source and corresponding connection information to other data sources. Office Excel 2013 introduced the Data Model extension to allow integrating data from multiple tables, effectively building a relational data source inside an Excel workbook. The data model leverages a binary stream that stores a tabular data model of all data that has been imported into the data model. It also includes the definition of each data source, including connection information required for external data sources (connection strings and potentially passwords), as well as relationships between tables, user-defined hierarchical relationships between columns, and calculated columns that are a function of existing columns. Scrubbing of this data is not supported due to the complexities of disconnecting dependencies from tables, queries, pivot tables. Detection is provided to allow the risk to be surfaced and reviewed.

Applies to:

  • Microsoft Excel 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

Do not include the processinginfo element in XML output. For Testing Only!

Do not include the processinginfo element in XML output. This option is for testing only! Removal of the processing info element allows QA processes that produce XML output at different times and with different source documents to easily compare resulting XML.

Default value is false

Document that will contain exported data.

This option defines where exported data such as embeddedings and fast save data should be placed. Within the element handler methods startEmbeddedObject and startFastSaveData this option will be set on the exportOptions field to decribe where that particular data will be saved and to allow the developer to override that location.

The maximum number of bytes that may be provided to overwrite the exported document.

This options defines the maximum number of bytes, provided through the ExportReplacementDocumentoption, that may be provided to overwrite the exported document. If ExportReplacementDocument is larger than this then an exception will be thrown. Note that this value will not necessarly be the same as the size of the exported document due to compression and other factors. If this option is zero (0) then the replacement image may be of any size.

Default value is 0

List of formats that may replace the exported document.

This option provides a list of file formats that may be provided through the ExportReplacementDocument options. Providing a file not in one of these formats will cause unexpected results.

The exported document should be replaced.

When this option is set to true the ExportReplacementDocument and ExportReplacementFormat options are used to replace the exported document. Commonly used to replace embeddings.

Default value is false

File to replace the exported document with.

This option defines the file which should overwrite the exported document. This file must be ExportReplacementSize or less and ExportReplacementFormat must describe its format.

File format of the ExportReplacementDocument.

This option defines the file format of the file which should overwrite the exported document. ExportReplacementDocument must be in this format.

Number of columns allowed between cells that are treated as a contiguous range when determining extreme ranges.

This option defines the maximum number of columns allowed between two cell ranges before they are treated as being two non-contiguous cell ranges. When an otherwise contiguous block of cells are separated by a greater number of columns they may be treated as extreme cells during analysis.

Applies to:

  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is 10

Indicates the document contains one or more ranges of spreadsheet cells that are located an extreme distance from other cell ranges.

The Extreme Cells target indicates that ranges of spreadsheet cells within the document are located an extreme distance from other cell ranges. The definition of an extreme cell range can be controlled by setting two options; Extreme Cell Horizontal Gap Allowance and Extreme Cell Vertcal Gap Allowance.

Applies to:

  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

Number of rows allowed between cells that are treated as a contiguous range when determining extreme ranges.

This option defines the maximum number of rows allowed between two cell ranges before they are treated as being two non-contiguous cell ranges. When an otherwise contiguous block of cells are separated by a greater number of rows they may be treated as extreme cells during analysis.

Applies to:

  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is 40

Certain indenting, margin and other settings result in text that does not display or print.

The Extreme Indenting target indicates that indent, margin, gutter or other settings could result in text that is off the page or outside a table or column. Such text will not display or print. Note that the existence of the Extreme Indenting target does not guarantee that text is hidden; only that text may be hidden.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Word 97 thru 2003

Default value is AnalyzeOption.Action.ANALYZE

Indicates the document contains one or more objects that are positioned an extreme distance outside the standard viewing area.

The Extreme Objects target identifies embedded, linked, and graphic objects that have been positioned in such a way that a majority of the object may fall outside the reasonable viewing area when viewed or printed in the authoring application. This may include objects positioned outside the slide or speaker note frame in PowerPoint, and in an extreme cell range in Excel documents. Extreme objects are reported but modifications can only be made upon author review in the authoring application.

Applies to:

  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

Text or other data that was 'deleted' but still exists in the file.

The fast save feature in Microsoft Word and PowerPoint is set using the 'Tools > Options > Save > Allow fast saves' command. When fast save is activated deleted text and data can remain in the file even though it is no longer visible or accessible from within the application. Adobe PDF documents may also include earlier revisions of nearly any type of content through the Incremental Update feature of the file format.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Detect and remove soft and hard hyphens found at the end of a line.

This option enables the detection and removal of soft and hard hyphens found at the end of a line during the extraction process. It is not uncommon for applications that generate PDF to use either a soft or hard hyphen to hyphenate a word when wrapping from one line to another. This feature is dependent on Clean Content's ability to infer line boundaries since they are not stored within a PDF document. Lines are inferred by monitoring position changes during text operations. It would be ideal to only remove soft hyphens during this process but unfortunately many application use hard hyphens for hyphenation when generating the PDF document. Use of this feature can result in the removal of legitimate hard hyphens from the extracted output. This option defaults to 'off' for this reason. This feature primarily benefits applications when searching the text output without the use of intelligent hyphen monitoring.

Default value is false

Detect and remove duplicate, overprinted text from extracted output.

This option enables the detection and removal of duplicate, overprinted text from the extracted output. It is not uncommon to see PDF documents with duplicate characters drawn very nearly on top of themselves for the purpose of supporting certain types of character attributes that may inlcude bolded, embossed, shadowed, or 3D text characteristics. Unfortunately, the overprinting may occur at character, intra-word, word, or line boundaries. This can have the unfortunate side effect of breaking valid words into a stream of unintelligable characters which in turn has adverse consequences on text searching. This feature addresses this problem by monitoring the position of every drawn piece of text within a line for overprinting situations. Most common use cases are covered though there are valid cases that are not detected when spaces are a part of one text operation but not another, causing the match algorihtm to fail. Additionally, this feature is disabled on any text that is drawn using a font that lacks valid character width metrics.

Default value is false

Generate the character highlight positions associated with the start of each word when extracting from PDF documents.

This option enables the extraction of character highlight positions at the start of each word when extracting from PDF documents. This information can be used to create an Adobe highlight file for the purpose of highlighting select text when viewing the original PDF document in Acrobat. The details of the Adobe Highlight file format can be found in the Adobe technical note titled HighlightFileFormat.pdf, availabe from Adobe. The highlight positions are provided by the PdfHL element in the extracted XML. These positions are character positions as defined in the Adobe technical note. Note that the Adobe character counting algorithm does not necessarily increment by 1 for each subsequent character. However, Acrobat highlights on full word boundaries even when a partial range is provided. For this reason it is reasonable to highlight select words by providing the position and a length equal to 1 or the number of characters to highlight.

Default value is false

Generate a fingerprint element for each embedded graphic in the document.

This option enables the generation of a fingerprint element with a type attribute of 'GraphicData'. This element is generated by Clean Content during analysis of embedded objects that are of type 'Graphic'. The value attribute provides the fingerprint as a 128 bit MD5 hash. This fingerprint can be leveraged to identify documents that a particular embedded image.

Default value is false

Generate a fingerprint element for each slide based on the text, images, colors, shape positions, and applied master.

This option enables the generation of a fingerprint element with a type attribute of 'SlideAppearance'. This element is generated by Clean Content during analysis of a presentation. The value attribute provides the fingerprint as a 128 bit MD5 hash. The SlideAppearance fingerprint is an extension of the SlideContent fingerprint and includes consideration for the slide background and the position and select formatting of slide content, including shapes. Numerous presentation features are excluded from the fingerprint calculation in order to improve the consistencty of the fingerprint across different versions of PowerPoint. This fingerprint can be leveraged to identify slides that are substantially similar in both content and appearance.

Default value is false

Generate a fingerprint element based on the text and image content found for each slide.

This option enables the generation of a fingerprint element with a type attribute of 'SlideContent'. This element is generated by Clean Content during analysis of a presentation. The value attribute provides the fingerprint as a 128 bit MD5 hash. The fingerprint for SlideContent is generated based on the text and images found on the slide. This allows the fingerprint to be consistent regardless of modifications due to positions, colors, shapes, masters, and other slide attributes. The SlideAppearance fingerprint is an extension of the SlideContent fingerprint that includes consideration for the applicable slide master, slide background, and the position and select formatting of slide content, including shapes. Numerous presentation features are excluded from the fingerprint calculation in order to improve the consistencty of the fingerprint across different versions of PowerPoint. This fingerprint can be leveraged to identify slides across a diverse document set that are substantially similar in content but may vary with respect to formatting.

Default value is false

GPS location information.

Metadata may have location information about the source of the document or the location of the authors or consumers

Applies to:

  • Extensible Metadata Platform

Default value is ScrubOption.Action.DEFAULT

Headers and footers.

Headers and footers in documents, spreadsheets and presentations. When this option is set to Scrub, the scrubbing behavior may be modified using the HeadersFootersSearch, HeadersFootersBehaviorand HeadersFootersReplaceoptions.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft Excel 2007 and above binary
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Excel 97 thru 2003

Default value is ScrubOption.Action.ANALYZE

Headers and footers behavior list.

This option is a list of behaviors to perform that maps one to one with the regular expressions in the HeadersFootersSearchlist. See the HeadersFootersSearchoption for more details. If the behavior is Replace, the corresponding item in the HeadersFootersReplacelist will be used as the replacement text.

Default value is Scrub

Headers and footers replace list.

This option is a list of strings that maps one to one with the behaviors in the HeadersFootersBehaviorlist. A given item is ignored (and may be null or a empty string) unless associated item in the HeadersFootersBehaviorlist is set to Replace.

Headers and footers search list.

This option is a list of regular expressions that will be used to test the text of each header or footer. When the first match is found the behavior defined by the corresponding item in the HeadersFootersBehaviorlist is executed against that header or footer. If no match is found the header or footer will be scrubbed in its entirety. This option is only valid if the HeadersFootersscrub target is set to Scrub. If this option is set, both the HeadersFootersBehaviorand HeadersFootersReplacelists must be set and the lengths of all three lists must be the same.

Hidden spreadsheet columns, rows, or worksheets.

Spreadsheet rows, columns, or worksheets that have been hidden. Hidden cells may contain sensitive data that requires user review prior to release. Hidden cells can be identified during analysis and can be made visible by setting the Unhide Hidden Cells option. Hidden cells are not deleted or cleared when cleaned since they may be required to resolve references from visible cells.

Applies to:

  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

Slides that have been hidden from presentation and printing.

The PowerPoint hidden slide feature (Slide Show > Hide Slide) allows individual slides to be hidden during the slide show and printing of the presentation. Hidden slides may contain information that is not intended for general release.

Applies to:

  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Text that has been hidden by the author.

Text that has been intentionally hidden (Format > Font... > Font > Hidden) by the user may contain sensitive information that should be reviewed or removed before distributing the document.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above

Default value is ScrubOption.Action.DEFAULT

A redundant storage of Excel workbooks created for backwards combpatibility with Excel 95.

Microsoft substantially changed the Excel format between Excel 95 and Excel 97. In order to maintain backwards compatbility with Excel 95 it was possible to store both versions of the file inside the XLS document. This target detects and optionally scrubs the 'Book' stream that hodls the Excel 95 version of the workbooks.

Applies to:

  • Microsoft Excel 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Include locator elements in output.

Include locator elements in output.

Default value is false

Found XML elements that are invalid against the schema.

Many applications that use XML formats, especially Microsoft's Office, do not strictly follow the XML format's schema when writing out documents. This target indicates that one or more invalid elements have been found and ignored.

Default value is AnalyzeOption.Action.ANALYZE

Ignore all action settings and just analyze.

When this option is true all scrub targets with actions of SCRUB will act as if they are set to ANALYZE. This allows for an analysis with no copying of the source document and no chance anything will be scrubbed.

Default value is false

Assemble the source PowerPoint file list into a single PowerPoint document, merging all slides.

When this option is true the AssembleFileList option defines the list of PowerPoit documents to be assembled into a single PowerPoint document. The source document defined by the SourceFile option must contain a PowerPoint document that will be used as the source for document wide defaults. At this time this only applies to assembling a set of PowerPoint documents into a single PowerPoint document.. The resulting file will be placed in the embedding export directory by default.

Default value is false

Disassemble the source PowerPoint document into individual PowerPoint documents containing one slide each.

When this option is true the input document will be disassembed into a set of new documents. At this time this only applies to disassembling a PowerPoint document into multiple PowerPoint documents, each containing one slide. The resulting files will be placed in the embedding export directory by default.

Default value is false

Ignore all other settings and just identify the file format of the source document.

When this option is true the only action that will be taken is to identify the file format of the source document.

Default value is false

Links to files from other applications.

The Office linked object feature (Insert > Object...) allows linking to an external file that is managed and rendered by another application. These links can expose local and network path information.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

List of locator-based actions to perform on the document while scrubbing.

A List of locator-based actions to perform on the document while scrubbing.

An error occured and was logged while processing the document.

An error occured and was logged while processing the document. Errors include exceptions that end processing (WasException will also be true) and other conditions that don't cause exceptions but may lead major loss of functionallity. See the log for details.

Default value is false

A warning occured and was logged while processing the document.

A warning occured and was logged while processing the document. Warnings include conditions that may lead to small losses of functionallity. See log for details.

Default value is false

Logger which should receive logging messages.

Macros and other executable code.

Microsoft Office includes support for Visual Basic and can be used to create everything from simple macros to data entry forms to full blown applications. Visual Basic can also be used to create macro viruses that travel with documents. Adobe PDF documents may contain code in the form of Java Script.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft Excel 2007 and above binary
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Meeting minutes entered using the PowerPoint Meeting Minder feature.

Meeting minutes can be attached to PowerPoint documents with the PowerPoint Meeting Minder feature and are typically associated with an action item list. The action item list is included in the presentation as part of a slide or series of slides. The associated minutes are accessible only through the Meeting Minder user interface.

Applies to:

  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

A document property that provides a globally unique identifier (GUID) of the document and originating computer.

The Office GUID property is a document property created by versions of Microsoft Office prior to the release of Office 2000. This globally unique identifier (GUID) can be used to identify the computer from which the document originated.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

This document contains parts that represent some level of disclosure risk if not scrubbed or further analyzed.

This target identifies the existence of parts that may represent a disclosure risk if the offending part is not scrubbed from the document or further inspected by human or machine review. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each Alternate Content Choice part using an OfficeXMLPartRisk element that provides further information about the part.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

Enable the process that canonicalizes Office XMLs.Note ScrubOption OfficeXMLFeatures must be set to canonicalize the file.

Canonical XML is a normal form of XML, intended to allow relatively simple comparison of pairs of XML documents for equivalence; for this purpose, the Canonical XML transformation removes non-meaningful differences between the documents.Canonicalization involves UTF-8 encoding, attribute normalization , handle special characters , replace entity references and many more.Note ScrubOption OfficeXMLFeatures must be set to canonicalize the file.

Default value is false

Enable the features which does inspection and sanitatization of Office XMLs vulnerabilities.

Once this option is enabled Clean content will start processing 2007 and above office file formats for XMLComments,XML External entity, XML CDATA and XML Unknown Namespaces. Once this option is set then only Clean Content will report existence of XML Comments,XMLCDATA, XML External entity or XML UnknownNamespace and scrub options for these features also work only when this flag is set.

Default value is false

Enable the process that validates all Office parts found in Office Open XML formats.

The Office Open XML file formats, generated by Office 2007 and above, follow a specification that describes how a collection of related parts define an Office Document. Each part is stored as a unique file in the collection, and parts may reference other parts to define the structure of the document. Many of these parts are deeply inspected during the Clean Content analysis process, however this option activates additional analysis, extraction and scrubbing behavior that covers every part in the document in one way or another. When this option is set to True the following additional behaviors are active. The extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each questionable part using an OfficeXMLPartRisk element that provides further information about the part. There are 4 categories of parts that carry some level of disclosure risk: Rogue, Unexpected, Unanalyzed, and Alternate Content parts. Each of these is documented as a specific analysis target. Those analysis targets must be set to ANALYZE when this option is enabled in order to report that particular risk in the extracted output. Rogue parts will automatically be scrubbed whether this option is enabled or disabled because rogue parts serve no known valid purpose in the document. Unexpected parts will not be scrubbed since doing so might break the document structure. Unanalyzed parts will only be scrubbed if they are removable due to a specific scrub target (i.e. Printer Settings). The Choice portion of Alternate Content is always scrubbed whether this option is enabled or disabled. Alternate Content parts that are referenced within the Choice portion are removed unless they are required in another valid context whether this option is enabled or disabled.

Default value is false

Rename namespace prefixes in all XML inside a MS office file.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefixes.

Namespace prefix can contain sensitive information.It is therefore ,recommended to rename namespace prefixes to neutral prefixes.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefixes.

Default value is false

This document contains parts are not are not referenced or required by the document that represent a significant unintentional disclosure risk if not scrubbed or further analyzed.

This target identifies the existence of parts that are not referenced or required by the document. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each rogue part using an OfficeXMLPartRisk element that provides further information about the part. Parts of this type are always removed when the OfficeXMLPartValidation option is enabled.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

This document contains parts that understood but not analyzed by the Clean Content analysis process.

This target identifies the existence of parts that may represent a disclosure risk if the offending part is not scrubbed from the document or further inspected by human or machine review. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each unanalyzed part using an OfficeXMLPartRisk element that provides further information about the part.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

This document contains parts that are not processed by the Clean Content analysis process.

This target identifies the existence of parts that may represent a disclosure risk if the offending part is not further inspected by human or machine review. When this target is set to Analyze and the OfficeXMLPartValidation option is enabled, the extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each unexpected part using an OfficeXMLPartRisk element that provides further information about the part.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is AnalyzeOption.Action.ANALYZE

Document properties added to Office document email attachments by Microsoft Outlook.

Outlook properties are custom document properties that may be added by Microsoft Outlook to Office documents when they are sent as attachments. These properties include the author, email address, subject of the email, and review cycle identifiers associated with the attachment.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Controls how the extracted data is returned to the developer.

This option controls how the extracted data is returned to the developer.

Default value is NoOutput

Indicates the document contains one or more objects that have been overlapped by another object.

The Overlapped Objects target identifies embedded, linked, and graphic objects that have been covered by another object thus obscuring some portion of the underlying object. At least 50% of an object must be covered to be treated as overlapped. Overlapped objects are reported but modifications can only be made upon author review in the authoring application.

Applies to:

  • Microsoft Excel 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 2007 and above
  • Microsoft PowerPoint 97 thru 2003

Default value is AnalyzeOption.Action.ANALYZE

Some characters are hidden because they have been overlapped by a rectangular shape or image..

Text may be covered by graphics elements that are drawn after the text operations. This target detects specific use cases where that may occur including rectangles and thick lines that are a known source of poor PDF text redaction. Detection of overlapped text is limited to specific use cases due to the complexity of the transparent imaging model. However, the common cases associated with poor text redaction are covered.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

This option contains a list of passwords to be verified against password protected documents.

This option contains a list of passwords to be verified against password protected documents

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

PDF supports a set of interactive features called actions that range from jumping to a particular destination in the document to submitting the data of an interactive form to a server. Individual targets are defined for each specific type of action. This target acts covers the entire set of actions as a single target.

The PDF format supports a set of interactive features called actions. Example actions include jumping to a particular destination in a document, thread, or URI location, launching an external file, playing a sound or movie, importing or submitting form data, executing JavaScript code, and numerous other interactive features. Actions can be associated with outline items, annotations, form fields, pages, or the document as a whole and can be triggered based on specific user or document interactions like opening the document, viewing a page, or selecting an outline item. Each triggering event can execute one or more actions in sequence. Each type of action is given its own scrub target while this target is provided to cover all actions in a single target.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

Alternate versions of an image they may be used by readers.

Alternate images are additional versions of an image that may be used by readers though there is no clear description on when or why.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Alternate Presentations can be used to view a PDF document in an alternative way more consistent with a presentation rendition.

Alternate Presentations allow a PDF document to be viewed in a slide show like manner. PDF 1.4 allowed a page to be viewed for a specified duration before moving into an automatic or user enabled page transition phase. PDF 1.5 allowed for a more extensive, JavaScript driven, alternate presentation rendering. This PDF feature is seldom used and has ben deprecated by ISO 32000-1. This target addresses both forms.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

PDF supports a set of interactive features called annotations that allow numerous types of content to be associated with a page location or provide user interaction.. This target covers the entire set of actions as a single target.

The PDF format supports a set of interactive features called annotations. Example annotations include text, file attachments, watermarks, redaction, rich-media and numerous other interactive features. Each type of annotation has been categorized into a scrub target in order to provide finer control over detection and removal of the various types of annotations. This target is provided to cover all annotations in a single target.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

Postscript objects embedded inside PDF documents.

Postscript objects embedded inside PDF documents. These objects are no longer recommended to be included in PDF documents.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Digital signatures are used to authenticate the identity of the author and the contents of the document.

Digital signatures are used to authenticate the identity of the author and the contents of the document and may come in three forms. Digital signatures can be used for approval signatures, modifications and detection prevention, and to enable usage rights that are not available without the required signature.

Applies to:

  • Adobe Acrobat (PDF)

Default value is AnalyzeOption.Action.ANALYZE

Indicates that the document contains an embedded search index provided to make text searches faster within Adobe Acrobat.

Adobe Acrobat supports an option to embed a search index into a PDF document. The search index makes user searches faster, particularly in large documents. This index is a private data structure supported by Adobe and may retain content from previous versions of the document. This scrub target is a child of the more general PDF Private Application Data target in order to allow this target to be scrubbed while leaving other private application data if desired.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The GoTo3D View action controls the view of a 3D annotation.

The GoTo3D View action can be executed from a variety of triggering events and controls the view of a 3D annotation. PDF supports a rich collection of features to define and view three-dimensional objects, such as those used by CAD software. This action targets a 3D annotation and can change how the 3D artwork appears to the user by setting parameters such as lighting, rendering, and projection that control the virtual camera illustrating the 3D artwork.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The GoTo action causes the Viewer software to change the current view of the document to specific location within the document.

The GoTo action can be executed from a variety of triggering events and causes the Viewer software to change the current view of the document to specific location within the document.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The GoToE (Go to embedded file) action causes the Viewer software to change the current view to a specific location in another PDF file that is embedded in this or another PDF file.

The GoToE (Go to remote location) action can be executed from a variety of triggering events and causes the Viewer software to change the current view to a specific location in another PDF file that is embedded in this or another PDF file..

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The GoToR (Go to remote location) action causes the Viewer software to change the current view to a specific location in another PDF file.

The GoToR (Go to remote location) action can be executed from a variety of triggering events and causes the Viewer software to change the current view to a specific location in another PDF file.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Hide action causes the Viewer software to change the visibility of annotations and form fields.

The Hide action can be executed from a variety of triggering events and causes the Viewer software to change the visibility of annotations and form fields.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Import Data action imports Forms Data Format (FDF), XFSD, or XML into the interactive form fields of the PDF document.

The Import Data action imports Forms Data Format (FDF), XFSD, or XML into the interactive form fields of the PDF document and can be executed from a variety of triggering events.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The JavaScript Action causes Javascript code to be executed by the Java interpreter supported by the PDF Viewer.

The JavaScript action can be executed from a variety of triggering events and causes Javascript code to be executed by the Java interpreter supported by the PDF Viewer. This is often used to dynamically control the view of a PDF document, particularly forms.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Launch action launches an application or opens or prints a document.

The Launch action can be executed from a variety of triggering events and causes the Viewer software to launch an application or open or print a document.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Information that specifies the existence of content that may result in unexpected rendering of a document.

The PDF file format supports including information that describes the existence of any content that may result in unexpected rendering of a document. This information is commonly included in documents that also include a document certification signature. It can be used by PDF applications to determine the trustworthiness of a document. The information primarily indicates the use of certain PDF features like JavaScript, Launching, URI's, multimedia objects, and the like that may result in a document that will render differently in different environments.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The minimum pixel width and height required to process an image inside a PDF.

This option allows any image found inside a PDF document to be ignored during exctraction unless both the x and y pixel dimensions of the image are greater or equal to this value. This option is useful to prevent extracting small images commonly used to generate drawing artifacts like table border, underline, shading, and patterns.

Applies to:

  • Adobe Acrobat (PDF)

Default value is 96

The Movie action causes the Viewer software to play a movie object that is stored as an external file.

The Movie action can be executed from a variety of triggering events and causes the Viewer software to play the associated movie object that is stored as an external file.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Named action causes the Viewer software to change the current view of the document to a specific named location in the current document.

The Named action can be executed from a variety of triggering events and causes the Viewer software to change the current view of the document to a specific named location in the current document. The supported named locations include NextPage, PrevPage, FirstPage, LastPage.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Indicates that the document contains private application data other than an embedded search index.

The PDF file format supports storing private data in PDF documents to allow extended functionality to be created by an application. This scrub target specifically addresses private application data other than the Embedded Search Index private application data. The Embedded Search Index data is addressed by a specific target in order to provide explicit control over that use case.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Private data stored in PDF documents by applications using the PDF Page-Piece dictionary construct.

The PDF file format supports storing private data in PDF documents to allow extended functionality to be created by an application. This data is stored in the Page-Piece dictionary construct described in the PDF Reference manual. For example, it is common for applications such as Adobe Illustrator and Adobe Photoshop to store additional data using this feature. The Embedded Search Index feature supported by Adobe Acrobat is also enabled using this approach.The PDF Private Application Data target provides a general target for detecting and removing any private application data found in PDF documents that leverage the PieceInfo entry to store a Page-Piece construct.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.ANALYZE

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Rendition action controls the playback of multimedia content.

The Rendition action can be executed from a variety of triggering events and controls the playback of multimedia content. The rendition action was introduced in PDF 1.5 to allow a far richer mechanism to control multimedia playback than supported by the earlier release Movie and Sound actions. Rendition actions can make use of extensive options to describe the location and sequence of multimedia content, the player to be used, allow for JavaScript execution to further control the playback, as well as many other parameters. Rendition actions are closely tied to a Screen annotation that specifies the region of a page where media clips are played.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Reset Form action resets a selected set of interactive form fields.

The Reset Form action resets a selected set of interactive form fields causing their current values to return to a default value. It can be executed from a variety of triggering events.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Rich Media action identifies a rich media annotation and specifies a command to be sent to that annotation handler. Rich media PDF contstructs support playing a SWF file to provide enhanced rich media. The command defined in this action can either be an ActionScript or JavaScript function name.

The Rich Media action can be executed from a variety of triggering events and identifies a rich media annotation and specifies a command to be sent to that annotation handler.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Set OCG State action sets the state of one or morel optional content groups.

The Set OCG State action can be executed from a variety of triggering events and sets the state of one or morel optional content groups.Optional content refers to sections of content in a PDF document that can be selectively viewed or hidden. Optional content features are typically seen in interactive PDF documents like CAD drawings or Maps.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Sound action causes the Viewer software to play a sound object.

The Sound action can be executed from a variety of triggering events and causes the Viewer software to play the associated sound object.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Submit Form action transmits the names and values of selected form fields to a specified URL.

The Submit Form action can be executed from a variety of triggering events and transmits the names and values of selected form fields to a specified URL (uniform resource locator).

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Thread action causes the Viewer software to change the current view of the document to specific location in an article thread within the document.

The Thread action can be executed from a variety of triggering events and causes the Viewer software to change the current view of the document to specific location in an article thread within the document.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Thumbnail images are small images that provide a represenation of either a PDF page or an externally referenced file.

Thumbnail images are typically used to provide a representation of each page in a PDF document that allows viewers to quickly render an image of each page. They can also be associated with an external file reference. Thumbnails have been deprecated from use in PDF as of ISO 32000-1 and can safely be scrubbed from files.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The Transition action is used in a sequence of actions to define transition appearances during the sequence.

The Transition action is used in a sequence of actions to define transition appearances during the sequence. It can be executed from a variety of triggering events.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Any action that is not in the list of supported actions is treated as an Unknown action.

Clean Content supports scrub targets for all PDF actions defined through Version 1.7 and the supplement to ISO 32000. Any PDF action that is not in the list of supported action is treated as an Unknown action. The most likely occurrence of an Unknown action is either due to an PDF file specification update supporting new actions or due to an attempt to create a custom action.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

The URI action causes the Viewer software to resolve and open a resource described by a Uniform Resource Identifier.

The URI action can be executed from a variety of triggering events and causes the Viewer software to resolve and open a resource described by a Uniform Resource Identifier.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Data stored in PDF documents used to import content from external Web pages.

The PDF file format supports creating information from web or local files using a method called Web Capture. Content can be retrieved from the referenced external files, either once or through additional updates. The original web capture information is maintained in the PDF file.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Notes associated with a slide presentation.

The PowerPoint notes feature allows notes to be associated with each slide. Notes may contain general content or internal commentary that should be reviewed or removed prior to distributing a presentation.

Applies to:

  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above

Default value is ScrubOption.Action.DEFAULT

Printer information in the document.

Printer setup information is often stored within a Microsoft Word or Excel document. In the case of network printers, this information may include potentially sensitive network share information and less sensitive printer model names.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above

Default value is ScrubOption.Action.DEFAULT

Printer information that includes network share names.

The printer information described in ScrubOptions.PrinterInformationcontained network share information. This information can provide dangerous insight into an enterprises internal network.

Default value is AnalyzeOption.Action.ANALYZE

Describes why the document could not be processed.

An enumeration of the possible reasons the document could not be processed.

Default value is Processed

Extract only properties from the document.

Extract only properties from the document while skipping the body text and structure.

Default value is false

Amount of time in milliseconds a request can execute before being timed out.

The amount of time in milliseconds a request can execute before being timed out. Timeouts are useful for the extreemly rare cases where malformed documents cause infinite loops within the Clean Content code. While it is tempting to set this number low since most documents process in much less than 100 ms, very large or complex documents can take a significant amount of time to process hence the 2 minute default for this option. A value of zero may be used to disable timeout for the request but this is not recommended.

Default value is 120000

Document that will contain the extracted data.

This option gives the developer a number of ways to provide the file that will receive the plain text or XML rendition of the extracted text and elements. This option is only valid if the OutputTypeoption is set to OUTPUTTYPE_TOXMLor OUTPUTTYPE_TOTEXT.

The XSLT document with which to process the result XML.

The XSLT document with which to process the report XML. This option is valid only when OutputType is set to TOXML.

Email routing information.

The email routing feature of Microsoft Office (File > Send To > Routing Recipient) stores the email addresses and user names of recipients in the document.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

Scenarios are an Excel feature that allow for multiple data models.

Microsoft Excel supports entering multiple data models within specific areas of a spreadsheet (Tools > Scenario...). Once a specific scenario is selected the remaining scenarios may expose data models that should not be exposed once the document is released to an outside party.

Applies to:

  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.ANALYZE

The scrubbed document.

This option gives the application a number of ways to provide the document to produce as a result of scrubbing the source document.

The new file format for the scrubbed document.

This result is set when the format of the scrubbed document differs from that of the soruce document. In many cases the extension of the scrubbed document must be changed in order for the document to be sucsessfully opened by its applciation. This happens in Office 2007 when marcos are removed from documents. For example Microsoft Word 2007 documents with macros (.docm files) must be changed to .docx when macros are removed or Word will not open them. The new extension can be retrieved using the getExtension method on the file format returned by this option.

Sensitive paths or URI's to external content that is to be included in this file.

Microsoft Office and Acrobat PDF include a number of features that allow referencing an external document that is then pulled into the primary document while maintaining the original link. In Microsoft Office 2007 and above, the insert picture feature is an example that allows the inserted picture to optionally retain the link to the original file. Microsoft PowerPoint through versions up to 2003 allows external links to Audio and Video files. Microsoft Word (through 2003) uses an include field to provide non-OLE based linking to external files (Insert > Field->IncludeText and Insert > Field > IncludePicture). Any of these examples may contain fully qualified local paths or network paths. A content link is considered sensitive if it begins with 'file:' or begins with a drive letter followed by a colon or it begins with two backward slashes or it matches any of the regular expressions defined using the Sensitive Links Regular Expressions option. Note that OLE based linking is handled by the Linked Objects target.

Applies to:

  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 97 thru 2003
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Word 2007 and above

Default value is ScrubOption.Action.DEFAULT

Hyperlinks containing either fully qualified local paths or network share names.

The Adobe PDF (link annotations) and the Office hyperlink feature (Insert->Hyperlink) allows the creation of links to various locations. Two of the possibilities, fully qualified local paths and network paths, can provide unwanted insight into an organization's internal structure. A hyperlink is considered sensitive if it begins with 'file:', begins with a drive letter followed by a colon, begins with two backslashes, or it matches any of the regular expressions defined using the Sensitive Links Regular Expressions option.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above
  • Microsoft Excel 2007 and above binary
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

INCLUDETEXT and INCLUDEPICTURE fields containing either fully qualified local paths or network share names.

The Microsoft Word include field feature provides non-OLE based linking to external files (Insert > Field->IncludeText and Insert > Field > IncludePicture). These fields may contain fully qualified local paths or network paths.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above

Default value is ScrubOption.Action.DEFAULT

List of regular expressions against which hyperlinks and content links should be tested to determine their sensitivity.

This option allows additional regex-based tests to be run against hyperlinks and content linkes to determine their sensitivity. A match against any of the regular expressions will cause the hyperlink to be clasified 'sensitive'. Hyperlinks classifed this way will be reported or scrubbed depending on the value of the SensitiveHyperlinks target. Content links classifed this way will be reported or scrubbed depending on the value of the SensitiveHyperlinks target.Any link that be with a single alpha character drive letter followed by a colon, or with the file: URI scheme is automatically considered sensitive.

Simulate PowerPoint Animations During Assembly.

This option applies to the assembly of PowerPoint 2007 and above (PPTX). When set, this option will cause slides that originally contained animation to be expanded into a series of slides that simulate the animations by hiding and restoring slide elements to simulate the entrance and exit of animated elements.

Applies to:

  • Microsoft PowerPoint 2007 and above

Default value is false

Some character's sizes are outside a certain normal range.

The sizes of some of the character in the document are below the value defined by the SizeObfuscatedTextMinimum or above the value defined by SizeObfuscatedTextMaximum

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft PowerPoint 2007 and above

Default value is ScrubOption.Action.ANALYZE

Maximum size a character may have when analyzing/scrubbing the SizeObfuscatedText target.

Character sizes above this value (expressed in points) will be flaged by the SizeObfuscatedText target and will be reset to this value if SizeObfuscatedText is set to SCRUB.

Default value is 96

Minimum size a character may have when analyzing/scrubbing the SizeObfuscatedText target.

Character sizes below this value (expressed in points) will be flaged by the SizeObfuscatedText target and will be reset to this value if SizeObfuscatedText is set to SCRUB.

Default value is 4

Tags applied to text that matches a defined pattern allowing specific actions to be executed based on the category of the smart tag.

Smart tags are a feature of Office that allows specific actions to be associated with text content that matches a pattern associated with each category of smart tags. For example, stock ticker symbols can be recognized and tagged in order make related actions available to the user whenever a ticker symbol is encountered.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above binary
  • Microsoft PowerPoint 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

The document to process.

This option gives the developer a number of ways to provide the document to analyze, scrub or extract.

The file format of the source document.

This result provides the file format of the source document.

The page number used when modifying a document's starting page number.

When the option ChangeStartingPageNumber is true this option is used to modify the page number a document starts at.

Applies to:

  • Microsoft Word 2007 and above

Default value is 1

Document properties categorized as statistics properties.

Statistic properties (File > Properties > Statistics) are document properties that include: Created, Modified, Accessed, Printed, Last saved by, Revision number, Total editing time, Pages, Paragraphs, Lines, Words, Characters, Bytes, Notes, Hidden Slides, Multimedia clips, and Presentation format. Additional application maintained properties in this category include: Application name, Hyperlinks changed flag, Links up to date flag, and Scale flag. Some or all of these properties should be reviewed or removed prior to document distribution.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Word's Structure dDocument Tags.

Structured Document Tags are a feature of Word 2007 and above that allows user input through gadgets such as date pickers and picture pickers.

Applies to:

  • Microsoft Word 2007 and above

Default value is ScrubOption.Action.DEFAULT

Document properties categorized as summary properties.

Summary properties (File > Properties > Summary) are document properties that include: Title, Subject, Author, Manager, Company, Category, Keywords, Comment, Hyperlink Base, Template, and Preview Picture. Some or all of these properties should be reviewed or removed prior to document distribution.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

If a template other than Normal.dot is used the document will contain a full path to the template file.

If a template other than Normal.dot is used, the document will contain a full path to the template file. This can expose local path or network share information.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above

Default value is ScrubOption.Action.DEFAULT

If set to 'true', requests in tight infinite loops will be stopped using the depricated Thread.stop method.

When a malformed document pushes Clean Content into an infinite loop, a monitoring thread attempts to interrupt the thread after a certain timeout period given by the RequestTimeout option. One of two things will then occur: 1) if the request is in a loop that can be interrupted then the request will be stopped and the SecureRequest execute method will return, 2) if in the very rare case the request is in a tight loop and this option is set to 'true' the monitoring thread will use the depricate Thread.stop method to kill the thread. Anyone setting this option to 'true' must understand the implications of having the Java thread running the request destroyed. See the Java API documentation for java.lang.Thread for details.

Default value is false

Controls the encoding when extracted data is returned as text.

This option controls the encoding of extracted data when the OutputType options is set to ToText.

Default value is UTF16

Tracked changes in the document.

The change tracking feature of Microsoft Office tracks insertions, deletions and formatting changes made to the document. Such changes contain deleted text and author and date information that may be unintentionally left in the document upon distribution.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

Perform an XML transform on the result document.

If set to truethe contents of the XML result will be XSLT processed using the document specificed in the ResultTransform option before being written. This option is valid only when OutputType is set to TOXML.

Default value is false

Unhide hidden spreadsheet cells.

Unhide hidden sheets, rows, and columns found in spreadsheets.

Default value is false

Uninitialized data segments found in the Docfile format leveraged by Office 2003 and below and many other formats.

The Microsoft Office binary file formats, among many other formats, leverage the Docfile file format (aka Structured Storage or Microsoft Compound File Binary File Format) to store a collection of data streams within a single file. This file allocation method allows data sectors to be allocated and freed as needed by the application (i.e. Word, Excel, and PowerPoint). This scrub target detects and optionally scrubs data sectors that are not currently in use but contain uninitialized (non-zero) data, including extra data sectors that may have been concatenated to the end of a valid file but are not intended to be part of the actual file.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Docfile

Default value is ScrubOption.Action.DEFAULT

Found XML elements in unknown namespaces.

Many applications that use XML formats, especially Microsoft's Office, have situations where any element may appear or an particular namespace may be ignored. This target indicates that such an element is in a namespace that is not known and can therefore cannot be validated.

Default value is AnalyzeOption.Action.ANALYZE

The names of users associated with the document.

A number of Office features cause user names to be saved in the document including the document properties Author and Last Saved By, document routing recipients, Word comment and tracked change authors, Excel scenario authors, file sharing participants, and the last user to edit a Microsoft Excel document or view a Microsoft PowerPoint document.

Applies to:

  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft PowerPoint 97 thru 2003
  • Microsoft Word 2007 and above
  • Microsoft Excel 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT

Enable the process that validates all embedded contents found in Office Open XML formats.

This feature is an add on feature for OfficeXMLPartValidation. This feature enables scrubbing of the rogue content present inside the Office open documents. The Office Open XML file formats, generated by Office 2007 and above, follow a specification that describes how a collection of related parts define an Office Document. Each part is stored as a unique file in the collection, and parts may reference other parts to define the structure of the document. Many of these parts are deeply inspected during the Clean Content analysis process, however this option activates additional analysis, extraction and scrubbing behavior that covers every part in the document in one way or another. When this option is set to True the following additional behaviors are active. The extracted output will contain a Collection element of type OfficeXMLPartDisclosureRisks that includes each questionable part using an OfficeXMLPartRisk element that provides further information about the part. This falls under the category of Rogue part present in the document.Rogue parts will automatically be scrubbed because rogue parts serve no known valid purpose in the document.

Default value is false

Version information in Word documents.

The versioning feature (File > Versions) in Microsoft Word allows multiple historical versions of a document to be saved within a single file. Versioning is useful during document creation but potentially sensitive once a document is released.

Applies to:

  • Microsoft Word 97 thru 2003

Default value is ScrubOption.Action.DEFAULT

An exception occured while processing the document.

An exception occured while processing the document. This is somewhat redundant since the developer will receive the exception itself but is included so the SecureResult can stand alone to completely describe the result of processing a document.

Default value is false

The source document was identified.

The source document was identified.

Default value is false

The source document was scrubbed, analyzed or extracted.

The source document was scrubbed, analyzed or extracted. Will be set to false if no component could be found to process the source document.

Default value is false

The source document's file format is supported.

The source document's file format is supported and processing was attempted.

Default value is false

Document took long than the request's RequestTimeout value to process.

The document took long than the request's RequestTimeout value to process or was interrupted.

Default value is false

Weak or easily breakable protections and passwords.

Weak protections are features of an application that appear to provide a strong level of protection against specific user actions on the document but in fact can be easily removed from the file without access to a password. A protection is only considered weak if it requires a password to remove the protection. Protections that don't require passwords are considered simple but not weak since they don't imply any additional password based strength.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft Word 97 thru 2003
  • Microsoft Excel 97 thru 2003
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

Bounded whitespaces can be used to indent text.Note ScrubOption OfficeXMLFeatures must be set to scrub bounded spaces.

Bounded whitespaces can be used to indent text.Note ScrubOption OfficeXMLFeatures must be set to scrub bounded spaces.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML CDATA refers to character data.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML CDATA.

CDATA is defined as blocks of text that are not parsed by the parser, but are otherwise recognized as markup.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML CDATA.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML Comments are used to provide semantic information to the human reader.Note ScrubOption OfficeXMLFeatures must be set extract and scrub XML Comments.

XML Comments are used to provide semantic information to the human reader.Note ScrubOption OfficeXMLFeatures must be set extract and scrub XML Comments.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML external entity are references to external file.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML external entity.

CC would show if external entity references exist in the document and user can decide to remove them.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML external entity.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML Processing instruction can be used to pass information to applications.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XMP Processing instruction.

XML Processing instruction can be used to pass information to applications.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XMP Processing instruction.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML namespace prefix are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefix.

When using prefixes in XML, a namespace for the prefix must be defined.XML namespace prefix are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to rename namespace prefix.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML namespace in the document which is not part of whitelisted namespace list.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML UnknownNamespace.

CC stores a list of namespaces which has internal schema definitions.There are many namespace which can not map to whitelisted namespace list and thus has no schema definition within CC.These namespaces are flagged as unknown namespaces.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML UnknownNamespace.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XML namespace are used to avoid name conflict in XML.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML unused namespaces.

A XML can have multiple namespaces defined which are not being used.Note ScrubOption OfficeXMLFeatures must be set to extract and scrub XML unused namespaces.

Applies to:

  • Microsoft Word 2007 and above
  • Microsoft PowerPoint 2007 and above
  • Microsoft Excel 2007 and above

Default value is ScrubOption.Action.DEFAULT

XMP Metadata streams are leveraged to store metadata properties using the Extensible Metadata Platform standard.

Extensible Metadata Platform streams are used by a number of formats, including PDF, to associate metadata properties with an entire document or objects within a document. In PDF an XMP stream can be associated with the document and specific pages, drawing and image objects, and color profiles. Note that PDF often replicates a set of standard document properties into an XMP stream as well as its own internal property storage format. This type of metadate typically contains standard properties like Author and Title, but can be extended to include any type of metadata.

Applies to:

  • Adobe Acrobat (PDF)

Default value is ScrubOption.Action.DEFAULT


Property Documentation

Option [] CleanContent::SecureOptions::AllOptions [static, get]

A list of all available options.


The documentation for this class was generated from the following file:
 All Classes Functions Variables Enumerations Properties
Clean Content .NET API 8.5.6.01.211123 documentation generated on Tue Nov 23 02:28:42 2021 by Doxygen 1.6.3