This documentation is for an older version. If you're using the most current version, select the documentation for that version with the version switch in the upper right corner of the online documentation, or by downloading a newer PDF or EPUB file.

12.9.7 Adding a Collation for Full-Text Indexing

This section describes how to add a new collation for full-text searches. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in Section 10.4, “Adding a Collation to a Character Set”; it is assumed that you have read it and are familiar with the files involved.

To add a collation for full-text indexing, use this procedure:

  1. Add a collation to the Index.xml file. The collation ID must be unused, so choose a value different from 62 if that ID is already taken on your system.

    <charset name="latin1">
    ...
    <collation name="latin1_fulltext_ci" id="62"/>
    </charset>
    
  2. Declare the sort order for the collation in the latin1.xml file. In this case, the order can be copied from latin1_swedish_ci:

    <collation name="latin1_fulltext_ci">
    <map>
    00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
    20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
    30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
    40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
    50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
    60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
    50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
    80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
    90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
    A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
    B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
    41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49
    44 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 59 DE DF
    41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49
    44 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59 59 DE FF
    </map>
    </collation>
    
  3. Modify the ctype array in latin1.xml. Change the value corresponding to 0x2D (which is the code for the '-' character) from 10 (punctuation) to 01 (small letter). In the following array, this is the element in the fourth row down, third value from the end.

    <ctype>
    <map>
    00
    20 20 20 20 20 20 20 20 20 28 28 28 28 28 20 20
    20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
    48 10 10 10 10 10 10 10 10 10 10 10 10 01 10 10
    84 84 84 84 84 84 84 84 84 84 10 10 10 10 10 10
    10 81 81 81 81 81 81 01 01 01 01 01 01 01 01 01
    01 01 01 01 01 01 01 01 01 01 01 10 10 10 10 10
    10 82 82 82 82 82 82 02 02 02 02 02 02 02 02 02
    02 02 02 02 02 02 02 02 02 02 02 10 10 10 10 20
    10 00 10 02 10 10 10 10 10 10 01 10 01 00 01 00
    00 10 10 10 10 10 10 10 10 10 02 10 02 00 02 01
    48 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
    10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
    01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
    01 01 01 01 01 01 01 10 01 01 01 01 01 01 01 02
    02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
    02 02 02 02 02 02 02 10 02 02 02 02 02 02 02 02
    </map>
    </ctype>
    
  4. Restart the server.

  5. To employ the new collation, include it in the definition of columns that are to use it:

    mysql> DROP TABLE IF EXISTS t1;
    Query OK, 0 rows affected (0.13 sec)
    
    mysql> CREATE TABLE t1 (
        -> a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci,
        -> FULLTEXT INDEX(a)
        -> ) ENGINE=MyISAM;
    Query OK, 0 rows affected (0.47 sec)
    
  6. Test the collation to verify that hyphen is considered as a word character:

    mysql> INSERT INTO t1 VALUEs ('----'),('....'),('abcd');
    Query OK, 3 rows affected (0.22 sec)
    Records: 3  Duplicates: 0  Warnings: 0
    
    mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE);
    +------+
    | a    |
    +------+
    | ---- |
    +------+
    1 row in set (0.00 sec)