Oracle® Outside In Content Access Release 8.3.5 |
|
|
View PDF |
This chapter discusses tagged content and other content topics.
The SCCCA_BEGINTAG and SCCCA_ENDTAG content types are used to tag or delimit other content for a particular purpose. This can be especially useful when searching for specific document property values like the author or title of a document. It can also be used to separate subdocument text like headers, footers, and footnotes from the main document text. Tagged text may be nested inside other tagged text, and tags may overlap each other.
Though most tag types are not particularly useful to developers, the Data Access technology provides all of the tag types rather than make a judgment as to usability. Each is briefly described below.
This section lists the applicable parameters and corresponding values.
dwType
SCCCA_BEGINTAG: Beginning of tagged content
SCCCA_ENDTAG: End of tagged content
dwSubType: Tag type - see "Tag Types" on page 7-1
dwData1: Additional ID - see "Document Property IDs" on page 7-3 or "Mail Field IDs" on page 7-6
dwData2: Not used
dwData3: Reserved
dwData4: Reserved
pDataBuf: Not used
This section lists the applicable values and corresponding descriptions.
SCCCA_ALTFONTDATA: Reserved
SCCCA_ANNOTATIONREFERENCE: Tags content that references an annotation
SCCCA_BOOKMARK: Delimits content tagged as a bookmark
SCCCA_CAPTIONTEXT: Tags content that is used as a caption on objects such as tables, equations and figures
SCCCA_CHARACTER: Reserved
SCCCA_COMPILEDFIELD: Tags content resulting from an application compiling a field code such as a date. The lack of consistent support by applications for this field makes it unreliable as a search property.
SCCCA_CONDITIONALSTYLE: Reserved
SCCCA_COUNTERFORMAT: Reserved
SCCCA_CUSTOMDATAFORMAT: Reserved
SCCCA_DATEDEFINITION: Reserved
SCCCA_DIAGRAM: Reserved
SCCCA_DIAGRAM_*: Reserved
SCCCA_DOCUMENTPROPERTY: Tags document property content - see "Document Property IDs" on page 7-3
SCCCA_DOCUMENTPROPERTYNAME: Name of a user-defined document property (SCCCA_USERDEFINEDPROP)
SCCCA_EMAILFIELD: Tags fields associated with email formats - see "Mail Field IDs" on page 7-6
SCCCA_EMAILFIELDNAME: Tags the name of a non-standard email field.
SCCCA_EMAILTABLE: Table of email fields
SCCCA_ENDNOTEREFERENCE: Tags content that references an endnote
SCCCA_FONTANDGLYPHDATA: Tags content that references font or glyph data
SCCCA_FOOTER: Delimits content tagged as footer
SCCCA_FOOTNOTEREFERENCE: Tags content that references a footnote
SCCCA_FRAME: Tags content stored within a frame
SCCCA_FRAME_EX: Tags content that references extended frames
SCCCA_GENERATEDFIELD: Reserved
SCCCA_GENERATOR: Reserved
SCCCA_HEADER: Delimits content tagged as header
SCCCA_HYPERLINK: Delimits content tagged as a hypertext link
SCCCA_INDEX: Reserved
SCCCA_INDEXENTRY: Delimits content that should be placed in the index
SCCCA_INLINEDATAFORMAT: Reserved
SCCCA_LINKEDOBJECT: Tags content referencing a linked object
SCCCA_LISTENTRY: Reserved
SCCCA_MERGEENTRY: Reserved
SCCCA_NAMEDCELLRANGE: Reserved
SCCCA_REFERENCEDTEXT: Tags text for later reference
SCCCA_SLIDENOTES: Tags content stored in speaker/slide notes in a presentation document
SCCCA_SSHEADERFOOTER: Tags content that references headers or footers in spreadsheet files
SCCCA_STYLE: Delimits a style definition. Styles may contain text, but typically do not.
SCCCA_SUBDOCPROPERTY: Tags metadata associated with a subdocument, such as a comment.
SCCCA_SUBDOCTEXT: Delimits content stored in subdocuments like headers, footers, frames and notes.
SCCCA_TOA: Reserved
SCCCA_TOAENTRY: Reserved
SCCCA_TOC: Reserved
SCCCA_TOCENTRY: Reserved
SCCCA_TOF: Reserved
SCCCA_VECTORSAVETAG: Reserved
SCCCA_XMPDATA: Document properties parsed out of the XMP data
SCCCA_XREF: Reserved
When dwSubType is SCCCA_DOCUMENTPROPERTY, dwData1 will be one of the values listed in the header file sccca.h. The following section, Document Property IDs, lists many of the common document property types. Any content generated between the begin and end tag defines the value of the document property.
When dwSubType is SCCCA_EMAILFIELD, dwData1 will be one of the values in "Mail Field IDs" on page 7-6, and any content generated between the begin and end tag defines the value of the email field.
The following is a list of document property IDs.
SCCCA_ABSTRACT
SCCCA_ACCOUNT
SCCCA_ADDRESS
SCCCA_APPVERSION
SCCCA_ATTACHMENTS
SCCCA_AUTHORIZATION
SCCCA_BACKUPDATE
SCCCA_BASEFILELOCATION
SCCCA_BILLTO
SCCCA_BLINDCOPY
SCCCA_CARBONCOPY
SCCCA_CATEGORY
SCCCA_CHECKEDBY
SCCCA_CLIENT
SCCCA_COMPANY
SCCCA_COMPLETEDDATE
SCCCA_COUNTBYTES
SCCCA_COUNTCHARS
SCCCA_COUNTCHARSWITHSPACES
SCCCA_COUNTLINES
SCCCA_COUNTMMCLIPS
SCCCA_COUNTNOTES
SCCCA_COUNTPAGES
SCCCA_COUNTPARAS
SCCCA_COUNTSLIDES
SCCCA_COUNTSLIDESHIDDEN
SCCCA_COUNTWORDS
SCCCA_CREATIONDATE
SCCCA_DEPARTMENT
SCCCA_DESTINATION
SCCCA_DISPOSITION
SCCCA_DIVISION
SCCCA_DOCCOMMENT
SCCCA_DOCNUMBER
SCCCA_DOCTYPE
SCCCA_EDITMINUTES
SCCCA_EDITOR
SCCCA_FORWARDTO
SCCCA_GROUP
SCCCA_HEADINGPAIRS
SCCCA_KEYWORD
SCCCA_LANGUAGE
SCCCA_LASTPRINTDATE
SCCCA_LASTSAVEDATE
SCCCA_LASTSAVEDBY
SCCCA_LINKSDIRTY
SCCCA_MAILSTOP
SCCCA_MANAGER
SCCCA_MATTER
SCCCA_OFFICE
SCCCA_OPERATOR
SCCCA_OWNER
SCCCA_PRESENTATIONFORMAT
SCCCA_PRIMARYAUTHOR
SCCCA_PROJECT
SCCCA_PUBLISHER
SCCCA_PURPOSE
SCCCA_RECEIVEDFROM
SCCCA_RECORDEDBY
SCCCA_RECORDEDDATE
SCCCA_REFERENCE
SCCCA_REVISIONDATE
SCCCA_REVISIONNOTES
SCCCA_REVISIONNUMBER
SCCCA_SCALECROP
SCCCA_SECONDARYAUTHOR
SCCCA_SECTION
SCCCA_SECURITY
SCCCA_SOURCE
SCCCA_STATUS
SCCCA_SYSTEM_FILECREATED
SCCCA_SYSTEM_FILEMODIFIED
SCCCA_SYSTEM_FILESIZE
SCCCA_SUBJECT
SCCCA_TITLE
SCCCA_TITLEOFPARTS
SCCCA_TYPIST
SCCCA_USERDEFINEDPROP
SCCCA_VERSIONDATE
SCCCA_VERSIONNOTES
SCCCA_VERSIONNUMBER
Note:
Document Properties with IDs of SCCCA_USERDEFINEDPROP or above are user-defined properties.The following values are properties of SCCCA_SUBDOCPROPERTY:
SCCCA_SUBDOC_AUTHOR
SCCCA_SUBDOC_CREATEDATE
SCCCA_SUBDOC_LASTSAVEDATE
SCCCA_SUBDOC_TITLE
SCCCA_SUBDOC_NOTES
SCCCA_SUBDOC_AUTHORSHORT
SCCCA_MAIL_ALTERNATE_RECIPIENT_ALLOWED
SCCCA_MAIL_ATTACHMENT
SCCCA_MAIL_ATTENDEES
SCCCA_MAIL_ATTR_HIDDEN
SCCCA_MAIL_ATTR_READONLY
SCCCA_MAIL_ATTR_SYSTEM
SCCCA_MAIL_AUTO_FORWARDED
SCCCA_MAIL_BCC
SCCCA_MAIL_CATEGORIES
SCCCA_MAIL_CC
SCCCA_MAIL_CCME
SCCCA_MAIL_CLIENT_SUBMIT_TIME
SCCCA_MAIL_COMPANY
SCCCA_MAIL_CONVERSATION_INDEX
SCCCA_MAIL_CONVERSATION_TOPIC
SCCCA_MAIL_CREATION_TIME
SCCCA_MAIL_CREATOR_ENTRYID
SCCCA_MAIL_CREATOR_NAME
SCCCA_MAIL_DEFERRED_DELIVERY_TIME
SCCCA_MAIL_DELETE_AFTER_SUBMIT
SCCCA_MAIL_EMAIL
SCCCA_MAIL_ENTRYID
SCCCA_MAIL_EXPIRES
SCCCA_MAIL_EXPIRY_TIME
SCCCA_MAIL_FLAGSTS
SCCCA_MAIL_FROM
SCCCA_MAIL_FULLNAME
SCCCA_MAIL_HOMEPHONE
SCCCA_MAIL_IMPORTANCE
SCCCA_MAIL_INET_MAIL_OVERRIDE_FORMAT
SCCCA_MAIL_INTERNET_ARTICLE_NUMBER
SCCCA_MAIL_INTERNET_CPID
SCCCA_MAIL_INTERNET_MESSAGE_ID
SCCCA_MAIL_JOBTITLE
SCCCA_MAIL_LASTMODIFIED
SCCCA_MAIL_LAST_MODIFIER_ENTRYID
SCCCA_MAIL_LAST_MODIFIER_NAME
SCCCA_MAIL_LATEST_DELIVERY_TIME
SCCCA_MAIL_LOCATION
SCCCA_MAIL_MESSAGE_CLASS
SCCCA_MAIL_MESSAGE_CODEPAGE
SCCCA_MAIL_MESSAGE_LOCALE_ID
SCCCA_MAIL_MESSAGE_SUBMISSION_ID
SCCCA_MAIL_MSGFLAG
SCCCA_MAIL_MSG_EDITOR_FORMAT
SCCCA_MAIL_NEWSGROUPS
SCCCA_MAIL_NORMALIZED_SUBJECT
SCCCA_MAIL_NT_SECURITY_DESCRIPTOR
SCCCA_MAIL_ORIGINATOR_DELIVERY_REPORT_REQUESTED
SCCCA_MAIL_PRIORITY
SCCCA_MAIL_PROFILE_CONNECT_FLAGS
SCCCA_MAIL_RCVD_BY_FLAGS
SCCCA_MAIL_RCVD_REPRESENTING_ADDRTYPE
SCCCA_MAIL_RCVD_REPRESENTING_EMAIL_ADDRESS
SCCCA_MAIL_RCVD_REPRESENTING_ENTRYID
SCCCA_MAIL_RCVD_REPRESENTING_FLAGS
SCCCA_MAIL_RCVD_REPRESENTING_NAME
SCCCA_MAIL_RCVD_REPRESENTING_SEARCH_KEY
SCCCA_MAIL_READ_RECEIPT_REQUESTED
SCCCA_MAIL_RECEIVED
SCCCA_MAIL_RECEIVED_BY_ADDRTYPE
SCCCA_MAIL_RECEIVED_BY_EMAIL_ADDRESS
SCCCA_MAIL_RECEIVED_BY_ENTRYID
SCCCA_MAIL_RECEIVED_BY_NAME
SCCCA_MAIL_RECEIVED_BY_SEARCH_KEY
SCCCA_MAIL_RECIPIENT_REASSIGNMENT_PROHIBITED
SCCCA_MAIL_REPLY_REQUESTED
SCCCA_MAIL_REPLY_TIME
SCCCA_MAIL_REPORT_TAG
SCCCA_MAIL_RESPONSE_REQUESTED
SCCCA_MAIL_RTFBODY
SCCCA_MAIL_RTF_IN_SYNC
SCCCA_MAIL_RTF_SYNC_BODY_COUNT
SCCCA_MAIL_RTF_SYNC_BODY_CRC
SCCCA_MAIL_RTF_SYNC_BODY_TAG
SCCCA_MAIL_RTF_SYNC_PREFIX_COUNT
SCCCA_MAIL_RTF_SYNC_TRAILING_COUNT
SCCCA_MAIL_SEARCH_KEY
SCCCA_MAIL_SENDER_ADDRTYPE
SCCCA_MAIL_SENDER_EMAIL_ADDRESS
SCCCA_MAIL_SENDER_ENTRYID
SCCCA_MAIL_SENDER_FLAGS
SCCCA_MAIL_SENDER_NAME
SCCCA_MAIL_SENDER_SEARCH_KEY
SCCCA_MAIL_SENSITIVITY
SCCCA_MAIL_SENT_REPRESENTING_ADDRTYPE
SCCCA_MAIL_SENT_REPRESENTING_EMAIL_ADDRESS
SCCCA_MAIL_SENT_REPRESENTING_ENTRYID
SCCCA_MAIL_SENT_REPRESENTING_FLAGS
SCCCA_MAIL_SENT_REPRESENTING_NAME
SCCCA_MAIL_SENT_REPRESENTING_SEARCH_KEY
SCCCA_MAIL_SIZE
SCCCA_MAIL_SUBJECT
SCCCA_MAIL_SUBMITTIME
SCCCA_MAIL_TO
SCCCA_MAIL_TRANSPORT_MESSAGE_HEADERS
SCCCA_MAIL_TRUST_SENDER
SCCCA_MAIL_WEBPAGE
SCCCA_MAIL_WORKPHONE
A SCCCA_COMMENTREFERENCE is placed in the actual location of the comment. The body of the comment may appear elsewhere and will be tagged with a SCCCA_BEGINTAG of type SCCCA_SUBDOCTEXT and will have the same Id as the SCCCA_COMMENTREFERENCE.
dwType: SCCCA_COMMENTREFERENCE
dwSubType: None
dwData1: Type of the comment reference anchor. SCCCA_COMMENT_PARAGRAPH, SCCCA_COMMENT_CELL, SCCCA_COMMENT_SLIDE, or SCCCA_COMMENT_VECTORPAGE.
dwData2: id of the associated subdoc
dwData3: Reserved
dwData4: Reserved
pDataBuf: Not used
Returns the file identification information for a document. This property is generated by the CAReadFirst function.
This section lists the applicable parameters and corresponding values.
dwType: SCCCA_FILEPROPERTY
dwSubType: SCCCA_FILEID
dwData1: One of the file identifier values (FI_*) defined in sccfi.h
dwData2: The input file's initial character set
dwData3: Reserved
dwData4: Reserved
pDataBuf: Not used
Identical to SCCCA_TEXT, except that the characters come not from the original document, but from some other non-character data (numbers in spreadsheets, dates, etc.). Because the text is not from the original document, the characters do not contribute toward character counts.
This section lists the applicable parameters and corresponding values.
dwType: SCCCA_GENERATED
dwSubType: Possible values include the following:
SCCCA_DOCUMENTTEXT: Regular document text is returned with this subtype.
SCCCA_SPECIALTEXT: Used to return text elements that are manufactured by the technology due to special formatting attributes.
SCCCA_REVISIONDELETE: Will be OR-ed with either SCCCA_DOCUMENTTEXT or SCCCA_SPECIALTEXT when text has been deleted from the final version of a document as a result of a revision.
SCCCA_URLTEXT: Text for the Link Location part of a URL.
SCCCA_XMPMETADATA: Text from embedded XMP metadata.
dwData1: Number of characters provided in pDataBuf
dwData2: Original character set of the text in pDataBuf
dwData3: Reserved
dwData4: Reserved
pDataBuf: Text buffer. Filled with one or more single- or double-byte characters.
This content type is provided to allow the developer to access the content of SubObjects, like embedded graphics or objects in an archive. The SubObject can then be opened by DAOpenDocument, filling the IOSPECSUBOBJECT or the IOSPECARCHIVEOBJECT parameter with one of the following values:
dwType: SCCCA_OBJECT
dwSubType: Set to SCCCA_EMBEDDEDOBJECT (0) if the sub-object is an embedding or is set to the type of node if the object is from an archive. Possible values include the following:
SCCCA_EMBEDDEDOBJECT
SCCCA_ARCHIVEITEMCONTAINER
SCCCA_COMPRESSEDFILE
SCCCA_MESSAGE
SCCCA_CONTACT
SCCCA_CALENDARENTRY
SCCCA_NOTE
SCCCA_TASK
SCCCA_JOURNALENTRY
SCCCA_ATTACHMENT
dwData1: The internal SubObject identifier or a node identifier.
dwData2: Stream identifier for an alternate graphic.
dwData3: Stream identifier for an OLE object if one exists. Otherwise, it is CA_INVALIDITEM.
dwData4: Reserved
pDataBuf: Not used
This content type contains only the sheet name (worksheet in a spreadsheet, slide in presentation, etc.). This content is not optional. It is always created if the information is present. Of course, the client can ignore this text when it is returned.
This section lists the applicable parameters and corresponding values.
dwType: SCCCA_SHEET
dwSubType: Reserved
dwData1: The length of the name in pDataBuf in characters.
dwData2: The original character set of the name in pDataBuf.
dwData3: Reserved
dwData4: Reserved
pDataBuf: Points to the sheet name in whatever output character set has been requested.
The SCCCA_STYLECHANGE content type is used to indicate changes in style information. This style information can be used to delimit particularly interesting content.
This section lists the applicable parameters and corresponding values.
dwType: SCCCA_STYLECHANGE
dwSubType: Possible values include the following:
SCCCA_PARASTYLE: pDataBuf indicates the name of the style.
SCCCA_HEIGHTANDSPACING: When dwSubType is SCCCA_HEIGHTANDSPACING, dwData1 can be SCCCA_HEIGHT (dwData2 represents the new character height), SCCCA_SPACING (dwData3 represents the new line spacing) or both of these values OR-ed together.
SCCCA_INDENTS: When dwSubType is SCCCA_INDENTS, dwData1 can be SCCCA_LEFTINDENT (dwData2 represents the left indent), SCCCA_RIGHTINDENT (dwData3 represents the right indent), SCCCA_FIRSTINDENT (dwData4 represents the first line indent), or any of these values OR-ed together.
SCCCA_OCE: This content type provides information about the original charsets of the characters that follow. dwData1 represents the charset as defined in vtchars.h.
dwData1: Depends on the value of dwSubType.
dwData2: Depends on the value of dwSubType.
dwData3: Depends on the value of dwSubType.
dwData4: Depends on the value of dwSubType.
pDataBuf: Text buffer. Filled with one or more single- or double-byte characters.
dwDataBufSize: Size of pDataBuf, in bytes.
This content type denotes document text, including special characters such as page breaks and tabs.
The technology guarantees that the text generated by the Content Access technology is identical to the text generated by the Outside In Viewer technology raw-text feature. This allows character counts generated at indexing time using Content Access to be directly mapped to viewer positions at viewing time for search-hit highlighting. However, Content Access has abilities beyond the raw-text feature of the Viewer, such as the ability to retrieve non-visible text such as document properties and hidden text, and the ability to retrieve text from embedded documents.
When the output character is DBCS or Unicode, the character count will not be the same as the buffer byte count because these character sets may generate more than one byte per character. The byte ordering used for multi-byte character sets such as these will be system-dependent; on a computer using an Intel processor, the low byte will be first.
It is important to note that generated numeric data fields, such as date, time, and spreadsheet numbers, are not included in the content returned by SCCCA_TEXT. For information on how such text can be returned by Content Access, see "SCCCA_GENERATED: Generated Information".
This section lists the applicable parameters and corresponding values.
dwType: SCCCA_TEXT
dwSubType: One of the following values:
SCCCA_DOCUMENTTEXT: Regular document text is returned with this subtype.
SCCCA_SPECIALTEXT: Used to return text elements that are manufactured by the technology due to special formatting attributes.
SCCCA_DOCUMENTTEXT or SCCCA_SPECIALTEXT can be optionally OR-ed with any of the following to specify the type of text to be returned:
SCCCA_ALLCAPS
SCCCA_BOLD
SCCCA_DUNDERLINE
SCCCA_HIDDEN
SCCCA_ITALIC
SCCCA_OUTLINE
SCCCA_REVISIONDELETE: Text that has been deleted from the final version of a document as a result of a revision.
SCCCA_REVISIONADD: Text that has been added to the final version of a document as a result of a revision.
SCCCA_SMALLCAPS
SCCCA_STRIKEOUT
SCCCA_UNDERLINE
SCCCA_UNKNOWNMAP: This flag is set when PDF files don't contain a ToUnicode map. This indicates that the mappings may or may not be correct.
dwData1: Number of characters provided in pDataBuf
dwData2: Original character set of the text in pDataBuf
dwData3: Reserved
dwData4: Reserved
pDataBuf: Text buffer. Filled with one or more single- or double-byte characters.
Email Delimiter: 0x09
End of Database Record: 0x0A
End of File: 0x0D
End of Paragraph: 0x0D
End of Table Cell: 0x0D
End of Table Row: 0x0D
Hard Hyphen: 0x2D
Hard Line Break: 0x0A
Hard Page Break: 0x0C
Hard Space: 0x20
Implied Space: 0x20
Section Separator: 0x0D
Syllable Hyphen: 0x2D
Tab: 0x09
This content type contains information to be used in the SOTREENODELOCATOR structure, which is used by DAOpenRandomTreeRecord and DASaveRandomTreeRecord.