5 Searching Documents

This chapter provides tips for searching documents.

This chapter includes the following sections:

5.1 Searching

Text search can be initiated by the developer by calling the Search method (see Search). This method's parameters specify search direction, case sensitivity, search starting location and can optionally display a default search dialog for search-text entry. If the text is found, this method returns 0. If an error occurred, search returns a -1, otherwise a return value of 1 indicates that the EOF was reached before the text was found. If the search is successful, the located text will be highlighted and scrolled into the viewing window. All properties related to text selection will be updated as well as the caret position which will be located at the beginning of the highlighted text.

SearchNext (see SearchNext) will continue (in either the same or a different direction) to look for the next occurrence of the text specified in the previous Search method. The return values and behavior are the same as for the Search method.

Private Sub Search_Ctrl_Click()
   'Dim ret As Integer
   ' If the search text has changed, call the Search method, 
   ' otherwise call SearchNext
   If LastSearchText = SearchText_Ctrl.Text Then
      ret = oixctrl1.SearchNext(SCCVW_SEARCHFORWARD) ' search forward
   Rem ** Search for text from the SearchText edit box control
   '      with no dialog, case sensitivity 
   '      from the current caret position and search forward 
   End If

   If ret = 0 Then
      Search_Ctrl.Caption = "Search Next"   ' change the 
                                            ' search button 
                                            ' to a search 
                                            ' next
      LastSearchText = SearchText_Ctrl.Text ' save off the 
                                            ' text used in 
                                            ' the search
      If ret = 1 Then ' EOF reached. Display message and 
                      ' start over
      MsgBox "End of File Reached - wrapping to beginning of document"
      End If
   End If
End Sub
Private Sub SearchText_Ctrl_Change()
  LastSearchText = ""            ' When the text in the 
                                 ' search text edit 
  Search_Ctrl.Caption = "Search" ' box changes, reset the 
                                 ' button caption to "Search"
End Sub

5.2 Positions

The ActiveX control uses the concept of a position to specify locations within a file. Each position is a placeholder or bookmark into the currently viewed document. It is important to understand how positions are manipulated to be able to use some of the more powerful features of the Outside In control. For example, a number of the annotation related methods/properties work with positions to specify the location in the file for the annotation.

The position variable is actually a COM object that is referred to by both Visual Basic and Visual C++ as an OixPos, and is easily manipulated in both environments. When working with an OixPos, the following rule of thumb should be observed. When passing an OixPos to a method, pass it as an object (or by value). That is, always create a NEW OixPos before sending it to a method. However, OixPos properties will hold a reference to a OixPos object.

For the currently viewed document there is always a caret position defined. The caret position is the location on the screen where the cursor is displayed and is stored relative to the beginning of the file. The caret position can only be manipulated by the user through the use of the mouse and/or keyboard and may appear anywhere within the displayed text of the document.

There are four methods available to set an OixPos object/variable:

  1. SetPositionToCurrent (see SetPositionToCurrent) sets the position object to the location of the caret position.

  2. If the user has selected an area of text, SetPositionToSelection (see SetPositionToSelection) sets two parameters to the starting and ending positions of the currently selected text.

  3. You can also set the position directly using the SetrActualCount (see SetActualCount) method. This method sets a position relative to the number of characters from the beginning of the document. Because different document types have different notions of page layout, this method is mostly used with word-processing documents. For example, the number of characters into a spreadsheet and a presentation can take on completely different meanings.

  4. The last method to set an OixPos object is FindPosition (see FindPosition). FindPosition takes a starting position and a technique for locating the resulting OixPos object. The found position can be any of the following:

    1. First position in the current page (starting position is ignored)

    2. Last position in the current page (starting position is ignored)

    3. The position just to the left of the starting position

    4. The position just to the right of the starting position

    5. Start of the current selection (starting position is ignored)

    6. End of the current selection (starting position is ignored)

    7. Location at the top/left of the viewing window (starting position) is ignored)

    8. Location at the bottom/right of the viewing window (starting position is ignored)

    9. Beginning of the line which contains the starting position

    10. End of the line which contains the ending position

    11. Beginning of the previous line (relative to the starting position)

    12. End of the previous line (relative to the starting position)

    13. Beginning of the word to the left of the starting position

    14. Beginning of the next word to the right of the starting position

    15. Beginning of the section to the left of the starting position

    16. Beginning of the next section to the right of the starting position

Four methods can be used to manipulate or access the position objects.

CopyPosition and ComparePositions (see CopyPosition, and ComparePositions) both take two OixPos objects as parameters and either copy the location information from one to the other or return which one is closer to the beginning of the file. GetActualCount (see GetActualCount) is the counterpart to the SetActualCount (see SetActualCount) method and returns the number of characters from the beginning of the file specified by the OixPos object.

The DisplayPosition (see DisplayPosition) method will redraw the currently viewed document relative to the OixPos parameter. The following flags determine where the position will be located relative to the viewing window.

  1. Top of the view window

  2. Middle of the view window

  3. Bottom of the view window

The following example illustrates the use of these methods.

Private Sub Annotate_Ctrl_Click()
   Dim oixStart As New OixPos
   Dim oixEnd As New OixPos

   Rem ** Check to see if any text is selected. If so, set 
   Rem    oixStart and oixEnd to the selected text. If not, 
   Rem    set oixStart to the current caret position
   Rem    and oixEnd to the beginning of the next word. **

   If Selection_Ctrl.Caption <= 0 Then
      oixctrl1.SetPositionToCurrent oixStart
      oixctrl1.FindPosition oixStart, oixEnd, 14 ' set the end
                                                 ' position to
                                                 ' be at the 
                                                 ' beginning of
                                                 ' the next word
   oixctrl1.FindPosition oixEnd, oixStart, 13    ' set the
                                                 ' beginning 
                                                 ' position of
                                                 ' the "current 
                                                 ' word" word
   oixctrl1.SelectionAnchor = oixStart
      oixctrl1.SelectionEnd = oixEnd
       oixctrl1.SetPositionToSelection oixStart, oixEnd
       ' Swap the start and end positions if oixEnd is closer to 
       ' the top of the document than oixStart

       If oixctrl1.ComparePositions(oixStart, oixEnd) 0 Then
         Dim oixTemp As New OixPos
         oixctrl1.CopyPosition oixTemp, oixStart
         oixctrl1.CopyPosition oixStart, oixEnd
         oixctrl1.CopyPosition oixEnd, oixTemp
      End If
   End If

   AnnotateSelection oixStart, oixEnd
   oixctrl1.DisplayPosition oixStart, 1   ' reposition text near 
                                          ' the top of viewer

End Sub

5.3 Annotations

Outside In provides a powerful way to bookmark or annotate selected areas of text and locate previously defined annotations, bookmarks or URLs. Annotations, document-defined bookmarks and URLs are treated as individual annotation types. Document defined bookmarks and URLs are automatically annotated based on the information present in the file being processed. Annotations are user-defined and must be created programmatically using the Annotation API. There are three types of annotations: hilited (text is highlighted), hidden (text is hidden), and picture (a picture is inserted into the document).

To hilite and/or annotate a block of text, two OixPos objects are needed: the starting and ending position. These can be obtained by retrieving the user's current text selection, by using position methods to locate text, or by using the Search method (see Search). Given two position objects, the text can be hilited using the AddAnnotationHilite method, or can be hidden using the AddAnnotationHideText method (see AddAnnotationHilite, and AddAnnotationHideText).

Another way to add an annotation is to insert a picture as an annotation. This type of annotation will be inserted at the designated OixPos position when the AddAnnotationPicture method (see AddAnnotationPicture) is called with a picture object as a parameter. This picture will be displayed inline as the document is viewed.

Private Sub AnnotateSelection(lType As Integer, Optional ByRef 
   OixStart As OixPos, Optional ByRef oixEnd As OixPos)

REM ** Create an annotation association with the selection 
REM    between oixStart and oixEnd based on the type requested 
REM    in the lType parameter. **
Dim Text As String
Select Case lType
   Case 0: ' Hilite
      Text = GetAnnotation(oixStart, annotateid)
      oixctrl1.AddAnnotationHilite annotateid, 1, 
         SCCVW_ANNOTATION_SCLICK, 0, 0, Text, oixStart, oixEnd
   Case 1: ' Hide Text
   oixctrl1.AddAnnotationHideText annotateid, 0, oixStart, 
   Case 2: ' Picture
      Dim P As New StdPicture
      Set P = LoadPicture( CommonDialog1.Filename )
      oixctrl1.AddAnnotationPicture annotateid, P, 2, 0, 

   End Select

annotateid = annotateid + 1
End Sub
Private Function GetAnnotation(ByRef Oix As OixPos, annotate As 
   Integer) As String
   Rem ** AnnotationText_Form is a form with a Title text field,    
   Rem    Annotation Text control and a message text control
   Rem    which is "popped-up" with a Show method to collect 
   Rem    annotation text. **

   AnnotationText_Form.Left = oixctrl1.Left + 0.25 * 
      oixctrl1.Width + Form1.Left
   AnnotationText_Form.Top = oixctrl1.Top + 0.4 * oixctrl1.Height 
      + Form1.Top
   AnnotationText_Form.Title.Text = "Annotation #: " + 
   AnnotationText_Form.AnnotationText_Ctrl.Text = ""
   AnnotationText_Form.Show 1, Form1
   GetAnnotation = AnnotationText_Form.AnnotationText_Ctrl.Text
End Function

When adding an annotation, one of the parameters to the AddAnnotationHilite method (see AddAnnotationHilite) describes the user-interaction that will trigger an AnnotationEvent event (see AnnotationEvent). These include: single-click, double-click, and cursor-over. When one of these actions occurs, the action type and annotation information is passed to the AnnotationEvent event handler. The developer can then use this information to display additional annotation information.

Another parameter used when adding an annotation is an annotation style. Annotation styles are defined programmatically using the HiliteStyle method (see HiliteStyle). Styles must be defined before annotations are added. Each style is uniquely identified with an ID and define the way the viewer hilites the annotated text; foreground color, background color and font attributes can all be defined. Once defined, the unique style identifier can be passed as an argument to the AddAnnotationHilite method (see AddAnnotationHilite). Additionally, once a style has been defined and associated with an ID, the ID can not be reused or re-assigned until the currently loaded file changes.

Private Sub Form_Load()
   Dim fg As OLE_COLOR
   Dim bg As OLE_COLOR
   Dim ret As Boolean

   fg = RGB(255, 255, 255)
   bg = RGB(128, 128, 128)

   oixctrl1.HiliteStyle 1, 3, fg, bg, 0
End Sub
Private Sub oixctrl1_AnnotationEvent(ByVal lEvent As Long,
                                     ByVal varData As Variant)
   Rem ** If the user single clicks on the annotation, popup the 
   Rem    annotation form **

      ShowAnnotation oixctrl1.AnnotationDataType, lId, varData
   End If
End Sub
Private Sub ShowAnnotation(lType As Long, annotate As Long, data 
   As Variant)
   Rem ** Display annotation text - called from AnnotationEvent 
   Rem    and the AnnotationList_Ctrl_Click event **

   Select Case lType
      Case 0: ' User annotation
      AnnotationText_Form.Title.Text = "Annotation #: " + 
         Str(annotate) + "(" +
                                        Hex(annotate) + ")"
      Case 1: ' URL
      AnnotationText_Form.Title.Text = "URL: " + Str(annotate) + 
                                        + Hex(annotate) + ")"
      Case 2: ' Bookmark
      AnnotationText_Form.Title.Text = "Bookmark: " + 
         Str(annotate) + "(" 
                                        + Hex(annotate) +")"
   End Select

   AnnotationText_Form.AnnotationPict.Visible = False
   AnnotationText_Form.AnnotationText_Ctrl = True
   AnnotationText_Form.AnnotationText_Ctrl.Text = data
   AnnotationText_Form.Left = oixctrl1.Left + 0.25 * 
      oixctrl1.Width + Form1.Left
   AnnotationText_Form.Top = oixctrl1.Top + 0.4 * oixctrl1.Height 
      + Form1.Top
   AnnotationPicture_Form.Show 0, Form1
End Sub

Annotations can be manipulated using the ClearAnnotations, FindAnnotation, and GoToAnnotation methods (see ClearAnnotations, FindAnnotation, and GoToAnnotation). Each of these methods take a parameter which allows the developer to select multiple annotations using an ID mask. An annotation will match if the logical "and" of the annotation ID and the ID mask is equal to the ID mask. For example, if the ID mask is 225 (11100001), the annotation ID of 227 (11100011) would match, however, the annotation ID of 226 (11100010) would not. Calling the ClearAnnotations method (see ClearAnnotations) will remove all matching annotations.

Locating existing annotations is accomplished using the FindAnnotation and GoToAnnotation methods (see FindAnnotation and GoToAnnotation). Both methods locate user-defined annotations, document-defined bookmarks and URLs. Also, these methods can start from the beginning/end or look for the next/previous annotation and both utilize matching via a mask ID as described in the preceding paragraph. The main difference between the two methods is while FindAnnotation will update the read-only properties, it will not update the view window to bring the annotation into view. GoToAnnotation will update the view window to display the matched annotation, however, it will not update the read-only property values. This can be done manually by calling the GetAnnotationData method (see GetAnnotationData) with the annotation ID and type (annotation, bookmark or URL).

One final method is provided to copy the OixPos position of the last found annotation to the caret position. If the AnnotationSetPos method (see AnnotationSetPos) is passed a TRUE value, the text associated with the annotation is selected, otherwise, the caret position is just moved to the beginning of the annotation and the view updated.

The Annotation methods and the Annotation event populate the following read-only properties. AnnotationId contains the last used annotation ID (see AnnotationId); AnnotationStartPos and AnnotationEndPos store the OixPos objects for the last used annotation (see AnnotationStartPos and AnnotationEndPos). Each annotation can have additional data that is associated with it. The data and its type for the last used annotation are stored in the AnnotationData and AnnotationData read-only properties (see AnnotationData). For document-defined bookmarks and URLs, the Annotation property should be interpreted as a BSTR object.

It should be noted that all annotations are volatile. Once a new file is loaded, the annotation information is cleared. Therefore, the developer should provide a mechanism to save and restore annotations each time a document is viewed.

Private Sub ListAnnotations_Click()
   Rem ** Use the FindAnnotation method to enumerate through the 
   Rem    document and populate the AnnotationList_Ctrl list 
   Rem    box. **

   Dim currentOix As New OixPos
   Dim Text As String

   oixctrl1.SetActualCount currentOix, 0   ' initialize 
                                           ' currentOix to top
                                           ' of file

   While (oixctrl1.FindAnnotation(3, 0, 0, currentOix))
      oixctrl1.CopyPosition currentOix, 
      Select case oixctrl1.AnnotationDataType
         Case 0:  ' Annotation
                  Text = "AN:"
         Case 1:  ' URL
                  Text = "URL:"
         Case 2:  ' Bookmark
                  Text = "BM:"
      End Select

      AnnotationListBox_Ctrl.AddItem Text + 
         (AnnotationListBox_Ctrl.NewIndex) = 

End Sub

Private Sub AnnotationListBox_Ctrl_Click()
   Rem ** On a single click of the annotation list box, scroll 
   Rem    the document to the selected annotation. **

   If (oixctrl1.GotoAnnotation(0, 1, 
      (AnnotationListBox_Ctrl.ListIndex)) = 0) 
      MsgBox "Annotation not found"
      oixctrl1.AnnotationSetPos 0  ' set the caret position 
                                   ' to the annotation
      oixctrl1.SetFocus            ' reset the focus to the 
                                   ' viewer so arrow keys work
   End If

End Sub

Private Sub AnnotationListBox_Ctrl_DblClick()
   Rem ** On a double click of the annotation list box, display    Rem    the data of the annotation in the annotation
   Rem    data form. **

   oixctrl1.FindAnnotation 0, 2, 
      (AnnotationListBox_Ctrl.ListIndex), 0
         ShowAnnotation oixctrl1.AnnotationDataType, 

End Sub

Private Sub AnnotationListBox_Ctrl_KeyDown(KeyCode As Integer, 
   Shift As Integer)
   REM ** If delete key is pressed while a annotation is selected 
   REM    in the annotation list box, we delete it using 
   REM    ClearAnnotation **

   If KeyCode = vbKeyDelete Then
      oixctrl1.ClearAnnotations 3, 
   End If
End Sub

5.4 Raw Text

Often used in conjunction with the Annotation API, the Raw Text methods, events and properties allow the developer to programmatically access the data in the viewer as if it were all text. For example, a routine could be written to search through the raw text as the document was being loaded and add an annotation for each occurrence of a given character string. When the document is viewed, each matched character string would be hilited.

To enable raw text processing, the SystemRawText property (see SystemRawText) must be set to TRUE, and a RawTextEvent event handler must be written (see RawTextEvent). During the initial reading of the document, the RawTextEvent event is passed an ID that locates the raw text. The GetRawText method (see GetRawText) will populate RawTextOffset, RawTextString, and RawTextCharSet properties based on this ID (see RawTextOffset, RawTextString, and RawTextCharSet). Because the document is read in small increments, the developer should expect to see many calls to the RawTextEvent handler (each with a different raw text locator) before the entire document has been read.

As each page of the document is read, the raw text is accumulated into a raw text buffer within the control. The ID that is passed to the RawTextEvent event handler is actually an offset into this raw text buffer. These offsets may be stored in array fashion for later use. Passing the offset value into the GetRawText method copies the raw text information into the read-only properties.

Private Sub MainOIX_RawTextEvent(ByVal lTextOffset As Long)
  Rem ** RawText event handler will only get called when user
  Rem checks the appropriate check box on the form **
  Dim oixStart As New OixPos
  Dim oixEnd As New OixPos
  Dim pos As Long
  Dim searchStr As String

  searchStr = "the"
  pos = 1
  MainOIX.GetRawText (lTextOffset)
  Cs = MainOIX.RawTextCharSet
  pos = InStr(pos, MainOIX.RawTextString, searchStr, vbTextCompare)
  While (pos)
    ' pos is 1 based as returned from
    ' InStr but character count is 0 based
    MainOIX.SetActualCount oixStart, lTextOffset + (pos - 1)
    MainOIX.SetActualCount oixEnd, lTextOffset + (pos - 1) + 
    MainOIX.AddAnnotationHilite Annotateid, 1, 
      SCCVW_ANNOTATION_SCLICK, 0, 0, "Search Text", oixStart, 
    pos = InStr(pos + 1, MainOIX.RawTextString, searchStr, 
    Annotatedid = Annotateid + 1
End Sub