public class HTMLLexer extends AbstractLexer implements HTMLTokens
HTMLLexer
is an implementation of the
Lexer
interface for the HTML language. It can be used
to retrieve a token stream for a regular HTML document, as well as
a JSP document. To retrieve JSP-specific tags, enable JSP
recognition by calling setRecognizeJSP(true)
. To
enable recognition of script text within <script> and
</script> tags, use setRecognizeScripts(true)
. Note that even when you enable recognition of JSP tags, you still need to enable recognition of embedded tags also. This means that it will properly handle an embedded tag found in an attribute value while scanning an HTML tag. This does not however check if it is legal for the embedded tag to be present - that is beyond the scope of this lexer. It is the caller's responsibility to do so.
This lexer does not assist in deciphering the contents of an HTML tag, nor does it help in identifying element bodies. All this lexer does is locate HTML and JSP tags within the document.
Lexer
,
HTMLTokens
AbstractLexer.DefaultLexerToken
Modifier and Type | Field and Description |
---|---|
protected boolean |
recognizeEmbeddedTags
Whether to recognize embedded tags.
|
protected boolean |
recognizeJSP
Whether to recognize JSP tags or not.
|
currentPos, textBuffer
TK_HTML_COMMENT, TK_HTML_DOCUMENT_TYPE, TK_HTML_PROCESSING_INSTRUCTION, TK_HTML_SCRIPT, TK_HTML_STYLE, TK_HTML_TAG, TK_HTML_TEXT, TK_JSP_COMMENT, TK_JSP_DECLARATION, TK_JSP_DIRECTIVE, TK_JSP_EXPRESSION, TK_JSP_SCRIPLET, TK_PHP_ASPTAG, TK_PHP_TAG
TK_EOF, TK_NOT_FOUND
Constructor and Description |
---|
HTMLLexer()
Constructs a default
HTMLLexer with a starting
position of 0. |
Modifier and Type | Method and Description |
---|---|
void |
backup()
Unlexes the last found token.
|
protected boolean |
isEmbeddedTagStart(int searchPosition)
Utility routine to determine whether the given search
position is the start of an embedded tag.
|
int |
lex(LexerToken lexedToken)
Scans the text buffer at the current position and returns the
token that was found.
|
void |
setCaretPosition(int caretPosition)
Sets a caretPosition, if this is set to a value other than -1,
it indicates that the lexer is being used for code insight.
|
void |
setPosition(int offset)
Sets the current lex (read) position to the given offset in the
buffer.
|
void |
setRecognizeEmbeddedTags(boolean recognizeEmbeddedTags)
Sets whether the lexer should recognize embedded HTML or JSP
expression tags within an attribute value.
|
void |
setRecognizeJSP(boolean recognizeJSP)
Sets whether the
TagLexer should recognize JSP
tag symbols. |
void |
setRecognizePHP(boolean recognizePHP)
Deprecated.
The HTMLLexer should not be used for parsing PHP file.
|
void |
setRecognizeScripts(boolean recognizeScripts)
Sets whether the
HTMLLexer should recognize script
start and end tags and generate TK_HTML_SCRIPT tokens for script
text. |
void |
setRecognizeStyles(boolean recognizeStyles)
Sets whether the
HTMLLexer should recognize style
start and end tags and generate TK_HTML_STYLE tokens for style
text. |
void |
setSkipComments(boolean skipComments)
Sets whether the
HTMLLexer should generate tokens
for Java comments. |
protected void |
skipEmbeddedTag()
Utility routine to skip over a found embedded tag.
|
protected void |
skipHTMLTag()
Utility routine which scans through the text buffer to find
the end of an HTML tag.
|
protected void |
skipJSPEL()
Utility routine which scans through the text buffer to find
the end of a JSP EL expression.
|
protected void |
skipJSPScriplet()
Utility routine which scans through the text buffer to find
the end of a JSP scriplet tag.
|
protected void |
skipPHPASPTag()
Utility routine which scans to locate the end of an ASP-styled
PHP tag.
|
protected void |
skipPHPTag()
Utility routine which scans through the text buffer to locate the
end of a PHP tag.
|
static java.lang.String |
tokenToString(int token)
Utility routine to map the token to a string representation of
the token (for debug printing.)
|
static java.lang.String |
tokenToText(int token)
Utility routine to map the token to the original text (if
retrievable) of the token (for debug printing.)
|
createLexerToken, getTextBuffer, setTextBuffer
protected boolean recognizeJSP
protected boolean recognizeEmbeddedTags
public HTMLLexer()
HTMLLexer
with a starting
position of 0. Clients must call setTextBuffer()
to
initialize the text buffer used for the Lexer. To start lexing
from an offset other than 0, call setPosition()
.public void setSkipComments(boolean skipComments)
HTMLLexer
should generate tokens
for Java comments.skipComments
- true to ignore comments in token generationpublic void setRecognizeScripts(boolean recognizeScripts)
HTMLLexer
should recognize script
start and end tags and generate TK_HTML_SCRIPT tokens for script
text.recognizeScripts
- whether to recognize scriptspublic void setRecognizeStyles(boolean recognizeStyles)
HTMLLexer
should recognize style
start and end tags and generate TK_HTML_STYLE tokens for style
text.recognizeStyles
- whether to recognize styles@Deprecated public void setRecognizePHP(boolean recognizePHP)
public void setCaretPosition(int caretPosition)
public int lex(LexerToken lexedToken)
lexedToken
instance passed in to the
call.lex
in interface Lexer
lex
in class AbstractLexer
lexedToken
- the instance passed in where token info is storedlexedToken.getToken()
(for convenience)public void backup()
lex()
will return the last token and offset information found.backup
in interface Lexer
backup
in class AbstractLexer
public void setPosition(int offset)
setPosition
in interface Lexer
setPosition
in class AbstractLexer
offset
- the offset for the next lex()
operation.protected void skipPHPTag()
protected void skipPHPASPTag()
public static java.lang.String tokenToString(int token)
token
- the token to mappublic static java.lang.String tokenToText(int token)
token
- the token to mappublic void setRecognizeJSP(boolean recognizeJSP)
TagLexer
should recognize JSP
tag symbols.recognizeJSP
- true to recognize JSP tag symbol characterspublic void setRecognizeEmbeddedTags(boolean recognizeEmbeddedTags)
recognizeEmbeddedTags
- whether to recognize embedded tagsprotected void skipHTMLTag()
protected boolean isEmbeddedTagStart(int searchPosition)
searchPosition
- the offset in the buffer to check for the startprotected void skipEmbeddedTag()
isEmbeddedTagStart()
, and that the current position
is still at the start of the embedded tag. This will place the
position at the character after the end of the tag.protected void skipJSPScriplet()
protected void skipJSPEL()