This chapter provides information on the 5800 system query language.
For details of the metadata system and how it is configured, see Chapter 8, “Configuring Metadata and Virtual File System Views” in Sun StorageTek 5800 Storage System Administration Guide.
The following topics are discussed:
The 5800 system Java and C APIs both have a query function that passes a query string to the 5800 system. Queries in the 5800 system are presented to the name-value metadata cache.
The query format is similar to the where clause of an SQL query. The two main differences are that 5800 system queries do not contain embedded subqueries, and that the only “columns” that are available are the attributes defined in the 5800 system schema.
Many features of the underlying metadata database’s own query language can be used in queries. There is a recommended subset of queries, however, that is most likely to be portable from the 5800 system emulator to a live 5800 system cluster. That subset is described in the sections Supported Expression Types and Queries Not Supported in Version 1.1. These are the query expression types that should work identically on the 5800 system emulator and a live 5800 system cluster.
Long— 8-byte integer value.
Double— 8-byte IEEE 754 double-precision floating point value.
String— now allows all Unicode values from the Basic Multilingual Plane (BMP). The encoding used is UTF-8. the schema definition of each String attribute must specify a length. String(N) is used as the convention to refer to the type of a String attribute whose length is set to N.
char— similar to String, except that it is limited to 8-bit characters in the ISO-8859-1 (Latin-1) character set.
Date— corresponds to the JDBC SQL DATE type. Year/Month/Day.
Time— corresponds to the JDBC SQL TIME type. The Java java.sql.Time type only allows specification of whole seconds.
Timestamp— corresponds to the JDBC SQL TIMESTAMP type with precision 3 (absolute Year/Month/Day/Hour/Minute/Second/Millisecond).
Binary— string of binary bytes.
Objectid— similar to binary, with internal support for sub-fields. Reserved for use by the system.object_id field. Other fields that must store an OID should use the string or binary type for that field.
A query in the 5800 system query language is translated into an equivalent query for the underlying database that implements the query engine. The database used in a live 5800 system is Sun's High Availability Database (HADB). The database used by the 5800 system emulator is ApacheTM Derby. Since the SQL query language used by HADB and Derby differ in substantial ways, a subset of the query language is provided for portability between the cluster and the 5800 system emulator to enable application development in the emulator environment and subsequent deployment of the applications to a live 5800 system.
The following provides a summary of the translation of the 5800 system queries to SQL queries that are presented to the underlying database.
The metadata schema specifies the layout of fields into tables and columns. When the schema is committed, a particular set of actual tables and columns is created in the underlying database that matches the format of the table layout in the schema.
When translating a 5800 system query to SQL, each field name in the query is translated into a reference to the particular column and particular table that represents that field. Typed literal values are translated into a form that the extended metadata cache knows how to deal with. Specifically, most literal values are replaced with an equivalent dynamic parameter. Thus, the list of dynamic parameters that the underlying database uses combines both the dynamic parameters and also many of the literal values from the 5800 system query. Finally, an implicit INNER JOIN is introduced between all the tables containing the translated query fields. Everything else (usually database expression syntax) is left unchanged, allowing almost all the database engine's powerful query syntax to be used with 5800 system queries.
The presence of the INNER JOIN has important consequences when queries are evaluated. An object is only returned by a query when all of the fields referenced by the query itself and all of the fields referenced in the select list of the query all have non-null values. Queries with OR clauses, in particular, can produce non-intuitive results. As an extreme example, consider a query: "fieldA is not null OR fieldB is not null." This query will not select an object unless both fieldA and fieldB are non-null, because of this implicit inner join.
Any string in double quotes (for example, "filename") and any dotted string in Java Identifier format (for example, mp3.title) will automatically be treated as an attribute name. The double quotes can optionally be omitted even on a non-dotted name as long as the attribute does not match an SQL reserved word in any of the Sun StorageTek 5800 underlying metadata databases.
Attribute names must appear in the current 5800 system schema to be used in a query. This is because the proper type information about each attribute must be derived to build the query.
General Unicode characters outside of the ASCII range in queries are allowed in only two places to the 5800 system. Specifically, both attribute names and literal values may contain general Unicode characters. All text that is not either an attribute name nor a literal value is passed unchanged to the underlying query engine, and must consist of ASCII characters only. An attempt to pass non-ASCII characters in a query will result in an error.
This section details the kinds of literals that can occur in 5800 system queries.
5800 system queries allow dynamic parameters. A dynamic parameter is indicated by the presence of a question mark in the query string (for example, the query name=? AND address=?). The bindParameter call is used in Java to bind typed values for use in place of the question marks.
For the Java API, the syntax is:
import com.sun.honeycomb.client.PreparedStatement; import com.sun.honeycomb.client.QueryResultSet; Date date_value= new Date(); PreparedStatement stmt = new PreparedStatement( "system.test.type_date=?"); stmt.bindParameter(date_value,1); QueryResultSet qrs = query(stmt);
For further information, see query (with PreparedStatement), query (with PreparedStatement and selectKeys) and PreparedStatement.
String literals are surrounded by single quotes (for example, 'The Lighter Side’). You can embed single quote characters in a query by doubling them (for example, ’Susan’’s House’). Any UTF-8 string can be included in a string literal (except the null character, which is treated as a string terminator by the C API).
Only ASCII digits are recognized as numeric literals. For example, 45, -1, 3.14, 5.2E10. Digits from other parts of the Unicode code space will cause a parse error.
For each 5800 system data type, there is a syntax to include literals of that type in a query string. The syntax is {type_name ’stringliteral’}. For example, consider the query:
timestamp_field<{timestamp ’2006-10-26T12:00:00Z’}
In particular, this syntax can be used to query for a particular object ID:
system.object_id = {objectid ’0200011e61c159bdfa654e11db8a45cafecafecafe000000000200000000’}
For comparing against binary values, either of the following forms may be used:
binary_field = x’beeffeed’
binary_field = {binary ’beeffeed’}
For more information, see Canonical String Format.
Each type in a 5800 system has a canonical representation as a string value. The canonical string representation of each type is shown in Table 4–1.
Table 4–1 Canonical String Representation of Data Types
Data Type |
Canonical String Representation |
---|---|
STRING |
The string itself. |
CHAR |
The string itself. |
BINARY |
Hexadecimal dump of the value with two hex digits per byte. |
LONG |
Result of Long.toString. For example, 88991 or -7975432785. |
DOUBLE |
Result of Double.toString. For example, 1.45 or NaN or -Infinity or -1.56E200. |
DATE |
YYYY-mm-dd. For example, 2001-01-01. |
TIME |
HH:mm:sss. For example, 12:02:01. |
TIMESTAMP |
YYYY-mm-ddThh:mm:ss.fffZ (time relative to UTC). For example, 1969-12-31T23:59:59.999Z. |
OBJECTID |
60-digit hexadecimal dump of the objectid. |
This canonical string encoding is used in the following places:
When exposing the field as a directory component or a filename component in a virtual view
When converting a typed value to a string as the result of the getAsString operation on a NameValueRecord or a QueryResultSet operation
When parsing a literal value as described in Literals for 5800 System Data Types to create a typed query value from a string representation of that value.
The inverse of the canonical string encoding is used in the following places:
It is always allowed to store a string value into any metadata field, no matter what the type of the field is. The actual data stored is the result of applying the canonical string decode operation to the incoming string value.
On a virtual view lookup operation, the canonical string decode operation is used on the supplied filename to derive the actual metadata values to look up, given their string representations in the filename.
The decode operation is allowed to accept incoming string values that would never be a legal output for an encode operation. Some examples of this include:
decodeBinary of an odd number of hex digits. The convention is to left-justify the supplied digits in the binary value. For example, the string "b0a" corresponds to the binary literal [b0a0].
decodeDate of a non-standard date format.
A double value encoded with a non-canonical number of digits. For example, .00145E20 instead of 1.45E17.
If you take a value V and encode it into a string S, and then perform the canonical decode operation on S to get a new value V’. Does V always equal V’? The answer is yes in most cases, but not always.
What is actually guaranteed is the weaker statement that if encode(V) = S and if decode(S)=V’, then encode(V’) is also equal to S.
Use the 5800 system literal format for all the Date and Time operations, for example, {date '12-31-2007'}.
The JDBC standard escape sequences for date ({d ’YYYY-mm-dd’}) and time ({t ’hh:mm:ss’}) literals are available. However, usage of JDBC format date and time literals may produce inconsistent results. In particular, when JDBC format is used, the literal format is interpreted as being relative to the local time zone, and the time zone usually differs between a 5800 system cluster and client machine.
The following JDBC function escapes supported:
TIMESTAMPDIFF, TIMESTAMPADD, CURRENT_TIMESTAMP, CURRENT_TIME, CURRENT_DATE, HOUR, MINUTE, SECOND.
Some SQL reserved words (such as BETWEEN or LIKE) are allowed in queries and are expected to occur. An SQL reserved word cannot be used as an attribute name unless it is enclosed in double quotes (for example, "FIRST"). Some other SQL reserved words (such as SELECT or CREATE) are forbidden from occurring unquoted in queries. Any query containing these words unquoted will immediately return an error. These forbidden words can be used as attribute names by enclosing them in double quotes.
The following expression types are explicitly supported in this release of the 5800 system:
Make use of JDBC escape syntax wherever possible. The escape syntax makes the query syntax more portable without losing functionality.
Comparison operations: expr1 OP expr2, where OP is one of =, !=, <>, <=, >=, <, or >.
The comparison operations can compare any two expressions (for example, two attribute values or one attribute value and one literal value). The two values must be of compatible types. For example, a < comparison cannot be used between a numeric literal value and a string-valued attribute. Note that <> and != are synonyms.
The concatenation operator, expr1 || expr2.
The concatenation operator || is an SQL standard way of concatenating two expressions to produce a combined output string.
Use of parentheses to indicate precedence of evaluation.
Boolean operators AND, OR, and NOT.
expr1 [NOT] LIKE expr2.
Attempts to match a character expression to a character pattern, which is a character string that includes one or more wildcards.
% matches any number (zero or more) of characters in the corresponding position in expr1.
_ matches one character in the corresponding position in expr1.
Any other character matches only that character in the corresponding position in the character expression.
expr1 BETWEEN expr2 AND expr3
expr [NOT] IN (valueslist).
The 5800 system emulator supports (but the cluster database does not) a JDBC “escape” clause that allows you to treat either % or _ as constant characters. There is currently no way to accomplish this in a LIKE clause in a query on a live cluster.
The following JDBC function escapes have been tested and are supported:
{fn UCASE(string)} to convert a string to uppercase.
{fn LCASE(string)} to convert a string to lowercase.
{fn ABS(value)} to take the absolute value of a numeric expression.
{fn LENGTH(string)} to get the length of a string.
{fn SUBSTRING(string, start, length)} to get a character string formed by extracting length characters from string beginning at start.
{fn LOCATE(string1, string2[, start])} to locate the position in string2 of the first occurrence of string1, searching from the beginning of string2. If start is specified, the search begins from position start. 0 is returned if string2 does not contain string1.
{fn LTRIM(string)} to remove the leading blank spaces in a character string
{fn RTRIM(string)} to remove the trailing blank spaces of a character string
{fn CONCAT(string1, string2)} to get the concatanated character string by appending string2 to string1. If a string is NULL, the result is DBMS-dependent.
{fn TIMESTAMPDIFF(interval, timestamp1, timestamp2)}. An integer representing the number of interval by which timestamp2 is greater than timestamp1. interval may be one of the following: SQL_TSI_FRAC_SECOND, SQL_TSI_SECOND, SQL_TSI_MINUTE, SQL_TSI_HOUR, SQL_TSI_DAY, SQL_TSI_MONTH, SQL_TSI_QUARTER, or SQL_TSI_YEAR.
{fn TIMESTAMPADD(interval, count, timestamp)}. A timestamp calculated by adding count number of intervals to timestamp. interval may be one of the following: SQL_TSI_FRAC_SECOND, SQL_TSI_SECOND, SQL_TSI_MINUTE, SQL_TSI_HOUR, SQL_TSI_DAY, SQL_TSI_WEEK, SQL_TSI_MONTH, SQL_TSI_QUARTER, or SQL_TSI_YEAR.
field_name IS NOT NULL to return all objects that have a value for a specific field. In particular, the query "system.object_id IS NOT NULL" can be used to query all the objects in the system.
The form "field_name IS NULL" to identify all the objects that lack a value for a specific field is not supported and will not work properly in all cases.
{fn LCASE(mp3.artist)} LIKE ’%floyd%’ AND system.object_size > 2000000
(object_size < 200) OR "Collation" = ’en-US’
{fn TIMESTAMPDIFF(SQL_TSI_YEAR, system.test.type_timestamp, ’2007-04-02 01:50:50.999’)} < 3
{fn TIMESTAMPADD(SQL_TSI_YEAR, 2, system.test.type_timestamp )} > ’2007-04-03 01:50:50.999’
The following JDBC escapes are not supported:
CHAR(code)
INSERT(string1, start, length, string2)
LEFT(string, count)
REPEAT(string, count)
REPLACE(string1, string2, string3)
RIGHT(string, count)
SPACE(count)
ROUND(value, number)
TRUNCATE(value, number)
POWER(value, power)
ACOS(float)
ASIN(float)
ATAN(float)
ATAN2(float1, float2)
CEILING(number)
COS(float)
COT(float)
DEGREES(number)
EXP(float)
FLOOR(number)
LOG(float)
LOG10(float)
ASCII(string)
MOD(float1, float2)
PI()
POWER(number, power)
RADIANS(number)
RAND(integer)
SIGN(number)
SIN(float)
SQRT(float)
TAN(float)
Some SQL reserved words (such as BETWEEN or LIKE) are allowed in queries and are expected to occur. An SQL reserved word cannot be used as an attribute name unless it is enclosed in double quotes (for example, "FIRST").
The following reserved words are allowed:
ABS, ABSOLUTE, AFTER, AND, AS, ASCII, AT, BEFORE, BETWEEN, BINARY, BIT, BIT_LENGTH, BOOLEAN, BOTH, CASE, CAST, CHAR, CHARACTER, CHARACTER_LENGTH, CHAR_LENGTH, COALESCE, COLLATE, COLLATION, CONTAINS,COUNT, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, DATE, DAY, DEC,DECIMAL, DOUBLE, ELSE, ELSEIF, END, ESCAPE, EXISTS, EXIT, EXPAND,EXTRACT, FALSE, FIRST, FOR, FROM, HOUR, IF, IN, INT, INTEGER, INTERVAL,IS, LCASE, LEADING, LEFT, LIKE, LOWER, MATCH, MAX, MIN, MINUTE, MONTH,NCHAR, NO, NOT, NULL, NUMERIC, OCTET_LENGTH, OF, R, PAD, PI,POSITION, REAL, RIGHT, RTRIM, SECOND, SIZE, SMALLINT, SUBSTRING, THEN,TIME, TIMESTAMP, TIMESTAMPDIFF, TIMESTAMPADD, TIMEZONE_HOUR, TIMEZONE_MINUTE, TO, TRAILING, TRIM,TRUE, UCASE, UNKNOWN, UPPER, VARBINARY, VARCHAR, VARYING, WHEN,WHENEVER, YEAR, ZONE.
Even if an SQL term is on the list of allowed query terms, it is not guaranteed that the term can actually be used in a working query. The function of this list is to determine words that will not be treated as an identifier unless enclosed in double quotes.
Some SQL reserved words (such as SELECT or CREATE) are forbidden from occurring unquoted in queries. The 5800 system server will immediately raise an exception when one of the forbidden reserved words is used in a query. These forbidden words can be used as attribute names by enclosing them in double quotes.
The following reserved words are forbidden:
ACTION, ADD, ALL, ALLOCATE, ALTER, ANY, APPLICATION, ARE, AREA, ASC, ASSERTION, ATOMIC, AUTHORIZATION, AVG, BEGIN, BY, CALL, CASCADE, CASCADED, CATALOG, CHECK, CLOSE, COLUMN, COMMIT, COMPRESS, CONNECT, CONNECTION, CONSTRAINT, CONSTRAINTS, CONTINUE, CONVERT, CORRESPONDING, CREATE, CROSS, CURRENT, CURRENT_PATH, CURRENT_SCHEMA, CURRENT_SCHEMAID, CURRENT_USER, CURRENT_USERID, CURSOR, DATA, DEALLOCATE, DECLARE, DEFAULT, DEFERRABLE, DEFERRED, DELETE, DESC, DESCRIBE, DESCRIPTOR, DETERMINISTIC, DIAGNOSTICS, DIRECTORY, DISCONNECT, DISTINCT, DO, DOMAIN, DOUBLEATTRIBUTE, DROP, EACH, EXCEPT, EXCEPTION, EXEC, EXECUTE, EXTERNAL, FETCH, FLOAT, FOREIGN, FOUND, FULL, FUNCTION, GET, GLOBAL, GO, GOTO, GRANT, GROUP, HANDLER, HAVING, IDENTITY, IMMEDIATE, INDEX, INDEXED, INDICATOR, INITIALLY, INNER, INOUT, INPUT, INSENSITIVE, INSERT, INTERSECT, INTO, ISOLATION, JOIN, KEY, LANGUAGE, LAST, LEAVE, LEVEL, LOCAL, LONGATTRIBUTE, LOOP, MODIFIES, MODULE, NAMES, NATIONAL, NATURAL, NEXT, NULLIF, ON, ONLY, OPEN, OPTION, ORDER, OUT, OUTER, OUTPUT, OVERLAPS, OWNER, PARTIAL, PATH, PRECISION, PREPARE, PRESERVE, PRIMARY, PRIOR, PRIVILEGES, PROCEDURE, PUBLIC, READ, READS, REFERENCES, RELATIVE, REPEAT, RESIGNAL, RESTRICT, RETURN, RETURNS, REVOKE, ROLLBACK, ROUTINE, ROW, ROWS, SCHEMA, SCROLL, SECTION, SELECT, SEQ, SEQUENCE, SESSION, SESSION_USER, SESSION_USERID, SET, SIGNAL, SOME, SPACE, SPECIFIC, SQL, SQLCODE, SQLERROR, SQLEXCEPTION, SQLSTATE, SQLWARNING, STATEMENT, STRINGATTRIBUTE, SUM, SYSACC, SYSHGH, SYSLNK, SYSNIX, SYSTBLDEF, SYSTBLDSC, SYSTBT, SYSTBTATT, SYSTBTDEF, SYSUSR, SYSTEM_USER, SYSVIW, SYSVIWCOL, TABLE, TABLETYPE, TEMPORARY, TRANSACTION, TRANSLATE, TRANSLATION, TRIGGER, UNDO, UNION, UNIQUE, UNTIL, UPDATE, USAGE, USER, USING, VALUE, VALUES, VIEW, WHERE, WHILE, WITH, WORK, WRITE, ALLSCHEMAS, ALLTABLES, ALLVIEWS, ALLVIEWTEXTS, ALLCOLUMNS, ALLINDEXES, ALLINDEXCOLS, ALLUSERS, ALLTBTS, TABLEPRIVILEGES, TBTPRIVILEGES, MYSCHEMAS, MYTABLES, MYTBTS, MYVIEWS, SCHEMAVIEWS, DUAL, SCHEMAPRIVILEGES, SCHEMATABLES, STATISTICS, USRTBL, STRINGTABLE, LONGTABLE, DOUBLETABLE.