Copyright © 2019 Oracle and/or its affiliates · All Rights Reserved · License
This document proposes changes to The Java(R) Language Specification, Java SE 13 Edition in support of Text Blocks, a preview feature of Java SE 13.
See JEP 355 for an overview of Text Blocks.
Last updated 2019-06-03
The pre-existing section 3.10.6, “Escape Sequences for Character and String Literals”, will become 3.10.7, “Escape Sequences”.
The pre-existing section 3.10.7, “The Null Literal”, will be renumbered to 3.10.8.
A text block consists of zero or more characters enclosed by opening and closing delimiters. Characters may be represented by escape sequences (3.10.7), but the newline and double quote characters that must be represented with escape sequences in a string literal may be represented directly in a text block.
The following productions from 3.3, 3.4, and 3.6 are shown here for convenience:
The opening delimiter is a sequence that starts with three double quote characters ("""
), continues with zero or more space, tab, and form feed characters, and concludes with a line terminator.
The closing delimiter is a sequence of three double quote characters.
The content of a text block is the sequence of characters that begins immediately after the line terminator of the opening delimiter, and ends immediately before the first double quote of the closing delimiter.
Unlike in a string literal, it is not a compile-time error for a line terminator to appear in the content of a text block.
A text block is always of type String
(4.3.3).
Example 3.10.6-1. Text Blocks
When multi-line strings are desired, a text block is usually more readable than a concatenation of string literals. For example, compare these alternative representations of a snippet of HTML:
String html = "<html>\n" +
" <body>\n" +
" <p>Hello, world</p>\n" +
" </body>\n" +
"</html>\n";
String html = """
<html>
<body>
<p>Hello, world</p>
</body>
</html>
""";
Here are some examples of text blocks:
String season = """
winter"""; // the six characters w i n t e r
String period = """
winter
"""; // the seven characters w i n t e r LF
String greeting =
"""
Hi, "Bob"
"""; // the ten characters H i , SP " B o b " LF
String salutation =
"""
Hi,
"Bob"
"""; // the eleven characters H i , LF SP " B o b " LF
String empty = """
"""; // the empty string (zero length)
String quote = """
"
"""; // the two characters " LF
String backslash = """
\\
"""; // the two characters \ LF
The use of the escape sequences \"
and \n
is permitted in a text block, but not necessary or recommended. However, representing the sequence """
in a text block requires the escaping of at least one "
character, to avoid mimicking the closing delimiter.
Example 3.10.6-2. Escape sequences in text blocks
The following snippet of text would be less readable if the " characters were escaped:
String story = """
"When I use a word," Humpty Dumpty said,
in rather a scornful tone, "it means just what I
choose it to mean - neither more nor less."
"The question is," said Alice, "whether you
can make words mean so many different things."
"The question is," said Humpty Dumpty,
"which is to be master - that's all."""";
If a text block is to denote another text block, then it is recommended to escape the first " of the embedded opening and closing delimiters:
The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order:
Line terminators are normalized to the ASCII LF character, as follows:
An ASCII CR character followed by an ASCII LF character is translated to an ASCII LF character.
An ASCII CR character is translated to an ASCII LF character.
Incidental white space is removed, as if by execution of String::stripIndent
on the characters in the content.
Escape sequences are interpreted, as in a string literal.
Example 3.10.6-3. Order of transformations on text block content
Interpreting escape sequences last allows developers to use \n, \f, and \r for vertical formatting of a string without affecting the normalization of line terminators, and to use \b and \t for horizontal formatting of a string without affecting the removal of incidental white space. For example, consider this text block that mentions the escape sequence \r (CR):
The \r escapes are not interpreted until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), and using | to visualize the left margin, the final result is:
When this specification says that a text block contains a particular character or sequence of characters, or that a particular character or sequence of characters is in a text block, it means that the string represented by the text block (as opposed to the content of the text block) contains the character or sequence of characters.
A text block is a reference to an instance of class String
that denotes the string represented by the text block.
A text block always refers to the same instance of class String
. This is because the strings represented by text blocks - or, more generally, strings that are the values of constant expressions (15.28) - are “interned” so as to share unique instances (12.5).
Example 3.10.6-4. Text blocks evaluate to strings
Text blocks can be used wherever an expression of type String
is allowed, such as in string concatenation (15.18.1), in method invocation on class String
, and in annotations with String
elements:
3.1, final paragraph: add mention of text blocks.
4.3.3, third paragraph: add mention of text blocks.
12.5, second paragraph, list should start as follows:
“Loading of a class or interface that contains a string literal (§3.10.5) or a text block (§3.10.6) may create a new String
object to represent the string literal or text block. (This will not occur if a string an instance of String
denoting the same sequence of Unicode code points as the string literal or text block has previously been interned.)”
15.8.1, fifth bullet: add mention of text blocks.
15.28, first bullet: add mention of text blocks:
“Literals of primitive type (§3.10.1, §3.10.2, §3.10.3, §3.10.4, §3.10.5) , and string literals (§3.10.5), and text blocks (§3.10.6).”
JVMS 4.7.16.1, const_value_index
: rephrase from “denotes either a primitive constant value or a String literal as the value of …” to “denotes a constant of either a primitive type or the type String
as the value of …”.
Some clarification of terminology around “escapes” is desirable:
3.10.5: Characters may be represented by escape sequences (§3.10.67)- one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF.
3.3: A compiler for the Java programming language (“Java compiler”) first recognizes Unicode escapes in its input, … and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters. … One Unicode escape can represent characters in the range U+0000 to U+FFFF. Representing supplementary characters in the range U+010000 to U+10FFFF requires two consecutive Unicode escapes.
3.5: The input characters and line terminators that result from Unicode escape processing …