Asian Languages Require Multibyte Characters - Oracle® Solaris Studio 12.4: C User's Guide

Oracle^® Solaris Studio 12.4: C User's Guide

Exit Print View

» ...Documentation Home » Oracle Solaris Studio 12.4 Information Library » Oracle^® Solaris Studio 12.4: C ... » Transitioning to ISO C » Multibyte Characters and Wide Characters » Asian Languages Require Multibyte Characters

Updated: March 2015

Oracle^® Solaris Studio 12.4: C User's Guide

Document Information

Using This Documentation

Chapter 1 Introduction to the C Compiler

Chapter 2 C-Compiler Implementation-Specific Information

Chapter 3 Parallelizing C Code

Chapter 4 lint Source Code Checker

Chapter 5 Type-Based Alias Analysis

Chapter 6 Transitioning to ISO C

6.1 New-Style Function Prototypes

6.1.1 Writing New Code

6.1.2 Updating Existing Code

6.1.3 Mixing Considerations

6.2 Functions With Varying Arguments

6.3 Promotions: Unsigned Versus Value Preserving

6.3.1 Some Background History

6.3.2 Compilation Behavior

6.3.3 Example: The Use of a Cast

6.3.4 Example: Same Result, No Warning

6.3.5 Integral Constants

6.3.6 Example: Integral Constants

6.4 Tokenization and Preprocessing

6.4.1 ISO C Translation Phases

6.4.2 Old C Translation Phases

6.4.3 Logical Source Lines

6.4.4 Macro Replacement

6.4.5 Using Strings

6.4.6 Token Pasting

6.5 const and volatile

6.5.1 Types for lvalue Only

6.5.2 Type Qualifiers in Derived Types

6.5.3 const Means readonly

6.5.4 Examples of const Usage

6.5.5 Examples of volatile Usage

6.6 Multibyte Characters and Wide Characters

6.6.1 Asian Languages Require Multibyte Characters

6.6.2 Encoding Variations

6.6.3 Wide Characters

6.6.4 C Language Features

6.7 Standard Headers and Reserved Names

6.7.1 Standard Headers

6.7.2 Names Reserved for Implementation Use

6.7.3 Names Reserved for Expansion

6.7.4 Names Safe to Use

6.8 Internationalization

6.8.2 setlocale() Function

6.8.3 Changed Functions

6.8.4 New Functions

6.9 Grouping and Evaluation in Expressions

6.9.1 Expression Definitions

6.9.2 K&R C Rearrangement License

6.9.3 ISO C Rules

6.9.4 Parentheses Usage

6.9.5 The As If Rule

6.10 Incomplete Types

6.10.2 Completing Incomplete Types

6.10.3 Declarations

6.10.4 Expressions

6.10.5 Justification

6.10.6 Examples: Incomplete Types

6.11 Compatible and Composite Types

6.11.1 Multiple Declarations

6.11.2 Separate Compilation Compatibility

6.11.3 Single Compilation Compatibility

6.11.4 Compatible Pointer Types

6.11.5 Compatible Array Types

6.11.6 Compatible Function Types

6.11.7 Special Cases

6.11.8 Composite Types

Chapter 7 Converting Applications for a 64-Bit Environment

Chapter 8 cscope: Interactively Examining a C Program

Appendix A Compiler Options Grouped by Functionality

Appendix B C Compiler Options Reference

Appendix C Features of C11

Appendix D Features of C99

Appendix E Implementation-Defined ISO/IEC C99 Behavior

Appendix F Implementation-Defined ISO/IEC C90 Behavior

Appendix G ISO C Data Representations

Appendix H Performance Tuning

Appendix I Oracle Solaris Studio C: Differences Between K&R C and ISO C

Language:

6.6.1 Asian Languages Require Multibyte Characters

The basic difficulty in an Asian-language computer environment is the huge number of ideograms needed for I/O. To work within the constraints of usual computer architectures, these ideograms are encoded as sequences of bytes. The associated operating systems, application programs, and terminals understand these byte sequences as individual ideograms. Moreover, all of these encodings allow intermixing of regular single-byte characters with the ideogram byte sequences. The level of difficulty recognizing distinct ideograms depends on the encoding scheme used.

The term “multibyte character” is defined by ISO C to denote a byte sequence that encodes an ideogram, no matter what encoding scheme is employed. All multibyte characters are members of the “extended character set.” A regular single-byte character is just a special case of a multibyte character. The only requirement placed on the encoding is that no multibyte character can use a null character as part of its encoding.

ISO C specifies that program comments, string literals, character constants, and header names are all sequences of multibyte characters.

Copyright © 1991, 2015, Oracle and/or its affiliates. All rights reserved. Legal Notices