MySQL 8.0 Reference Manual Including MySQL NDB Cluster 8.0
This section discusses the procedure for adding a character set to MySQL. The proper procedure depends on whether the character set is simple or complex:
If the character set does not need special string collating routines for sorting and does not need multibyte character support, it is simple.
If the character set needs either of those features, it is complex.
      For example, greek and swe7
      are simple character sets, whereas big5 and
      czech are complex character sets.
    
      To use the following instructions, you must have a MySQL source
      distribution. In the instructions,
      MYSET represents the name of the
      character set that you want to add.
    
          Add a <charset> element for
          MYSET to the
          sql/share/charsets/Index.xml file. Use
          the existing contents in the file as a guide to adding new
          contents. A partial listing for the latin1
          <charset> element follows:
        
<charset name="latin1">
  <family>Western</family>
  <description>cp1252 West European</description>
  ...
  <collation name="latin1_swedish_ci" id="8" order="Finnish, Swedish">
    <flag>primary</flag>
    <flag>compiled</flag>
  </collation>
  <collation name="latin1_danish_ci" id="15" order="Danish"/>
  ...
  <collation name="latin1_bin" id="47" order="Binary">
    <flag>binary</flag>
    <flag>compiled</flag>
  </collation>
  ...
</charset>
          The <charset> element must list all
          the collations for the character set. These must include at
          least a binary collation and a default (primary) collation.
          The default collation is often named using a suffix of
          general_ci (general, case-insensitive). It
          is possible for the binary collation to be the default
          collation, but usually they are different. The default
          collation should have a primary flag. The
          binary collation should have a binary flag.
        
You must assign a unique ID number to each collation. The range of IDs from 1024 to 2047 is reserved for user-defined collations. To find the maximum of the currently used collation IDs, use this query:
SELECT MAX(ID) FROM INFORMATION_SCHEMA.COLLATIONS;
This step depends on whether you are adding a simple or complex character set. A simple character set requires only a configuration file, whereas a complex character set requires C source file that defines collation functions, multibyte functions, or both.
          For a simple character set, create a configuration file,
          MYSET.xmlsql/share/charsets directory. You
          can use a copy of latin1.xml as the basis
          for this file. The syntax for the file is very simple:
        
              Comments are written as ordinary XML comments
              (<!-- ).
            text
              -->
              Words within <map> array elements
              are separated by arbitrary amounts of whitespace.
            
              Each word within <map> array
              elements must be a number in hexadecimal format.
            
              The <map> array element for the
              <ctype> element has 257 words.
              The other <map> array elements
              after that have 256 words. See
              Section 12.13.1, “Character Definition Arrays”.
            
              For each collation listed in the
              <charset> element for the
              character set in Index.xml,
              MYSET.xml<collation>
              element that defines the character ordering.
            
For a complex character set, create a C source file that describes the character set properties and defines the support routines necessary to properly perform operations on the character set:
              Create the file
              ctype-
              in the MYSET.cstrings directory. Look at one
              of the existing ctype-*.c files (such
              as ctype-big5.c) to see what needs to
              be defined. The arrays in your file must have names like
              ctype_,
              MYSETto_lower_,
              and so on. These correspond to the arrays for a simple
              character set. See Section 12.13.1, “Character Definition Arrays”.
            MYSET
              For each <collation> element
              listed in the <charset> element
              for the character set in Index.xml,
              the
              ctype-
              file must provide an implementation of the collation.
            MYSET.c
If the character set requires string collating functions, see Section 12.13.2, “String Collating Support for Complex Character Sets”.
If the character set requires multibyte character support, see Section 12.13.3, “Multi-Byte Character Support for Complex Character Sets”.
          Modify the configuration information. Use the existing
          configuration information as a guide to adding information for
          MYSYS. The example here assumes
          that the character set has default and binary collations, but
          more lines are needed if MYSET has
          additional collations.
        
              Edit mysys/charset-def.c, and
              “register” the collations for the new
              character set.
            
Add these lines to the “declaration” section:
#ifdef HAVE_CHARSET_MYSETextern CHARSET_INFO my_charset_MYSET_general_ci; extern CHARSET_INFO my_charset_MYSET_bin; #endif
Add these lines to the “registration” section:
#ifdef HAVE_CHARSET_MYSETadd_compiled_collation(&my_charset_MYSET_general_ci); add_compiled_collation(&my_charset_MYSET_bin); #endif
              If the character set uses
              ctype-,
              edit MYSET.cstrings/CMakeLists.txt and add
              ctype-
              to the definition of the
              MYSET.cSTRINGS_SOURCES variable.
            
              Edit cmake/character_sets.cmake:
            
                  Add MYSET to the value of
                  with CHARSETS_AVAILABLE in
                  alphabetic order.
                
                  Add MYSET to the value of
                  CHARSETS_COMPLEX in alphabetic
                  order. This is needed even for simple character sets,
                  so that CMake can recognize
                  -DDEFAULT_CHARSET=.
                MYSET
Reconfigure, recompile, and test.