|
Oracle NoSQL Database version 11gR2.2.0.26 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface AvroCatalog
A catalog of Avro schemas and bindings for a store.
Manages schemas and provides AvroBinding
s for use with the Avro
data format. The bindings are used along with KVStore
APIs for
storing and retrieving key-value pairs. The bindings are used to serialize
Avro values before writing them, and deserialize Avro values after reading
them. An AvroCatalog is obtained by calling KVStore.getAvroCatalog()
.
WARNING: We strongly recommend using an AvroBinding
. NoSQL
Database will leverage Avro in the future to provide additional features and
capabilities.
WARNING: To take advantage of the Avro data format, the bindings in
this class must be used. The Value
byte array is constructed by the
binding to include an internal reference to the schema used for
serialization. The Value
byte array may not be manipulated directly
by the application.
AvroCatalog
class.
An Avro schema is created in JSON format, typically using a text editor and initially saved in a text file. Of course, to create an Avro schema the developer must understand the Avro schema syntax. For more information see Avro Schemas in the Getting Started Guide and the Avro schema specification.
Once created and saved in a text file, the schema is added to the store
using the ddl add-schema
administrative command, using the text file
as input; see Adding Schema in the Getting Started Guide. Until a schema is
added, it may not be used in the client API to store values. The use of the
schema in the client API is described further below.
Note that the use of Avro schemas allows serialized values to be stored in a
very space-efficient binary format. Each value is stored without any
metadata other than a small internal schema identifier, between 1 and 4
bytes in size. One such reference is stored per key-value pair. In this
way, the serialized Avro data format is always associated with the schema
used to serialize it, with minimal overhead. This association is made
transparently to the application, and the internal schema identifier is
managed by the bindings supplied by the AvroCatalog
class. The
application never sees or uses the internal identifier directly.
Two example schemas are shown below along with the administrative commands for adding them to the store. These schemas are used further below in other examples.
The schemas might be stored in a simple text file, schema1.txt
:
{ "type": "record", "name": "MemberInfo", "namespace": "avro", "fields": [ {"name": "name", "type": { "type": "record", "name": "FullName", "fields": [ {"name": "first", "type": "string", "default": ""}, {"name": "last", "type": "string", "default": ""} ] }, "default": {}}, {"name": "age", "type": "int", "default": 0} ] }The administrative command for adding the above schemas is:
> ddl add-schema -file schema1.txt
ddl add-schema
administrative command with the
-evolve
option; see Changing Schema in the Getting Started Guide.
The modified schema is saved in a text file, which is passed to this command
as input. For example, fields may be added, removed or renamed.
For example, if a middle name property is added in the future to the
schema, it might be stored in schema2.txt
. Note that a new field
must be given a default value.
{ "type": "record", "name": "MemberInfo", "namespace": "avro", "fields": [ {"name": "name", "type": { "type": "record", "name": "FullName", "fields": [ {"name": "first", "type": "string", "default": ""}, { "name": "middle", "type": "string", "default": "" }, {"name": "last", "type": "string", "default": ""} ] }, "default": {}}, {"name": "age", "type": "int", "default": 0} ] }The administrative command for adding the new version of the schema is:
> ddl add-schema -file schema2.txt -evolve
When a schema is changed, multiple versions of the schema will exist and be
maintained by the store. The version of the schema used to serialize a
value, before writing it to the store, is called the writer
schema. The writer schema is specified by the application when
creating a binding. It is associated with the value when calling the
binding's AvroBinding.toValue(T)
method to serialize the data. As
mentioned above, the writer schema is associated internally with every
stored value.
The reader schema is used to deserialize a value after reading it
from the store. Like the writer schema, the reader schema is specified by
the client application when creating a binding. It is used to deserialize
the data when calling the binding's AvroBinding.toObject(oracle.kv.Value)
method,
after reading a value from the store.
When the reader and writer schemas are different, schema evolution is applied during deserialization. Schema evolution is applied by transforming the data during deserialization, so that data stored according to the writer schema is transformed to conform to the reader schema. When the reader and writer schemas are the same, no data transformation is necessary. Also note that no data transformation takes place during serialization; i.e., data is always written according to the writer schema.
Reader and writer schemas can be different when a client is changed to use a new version of the schema, and then reads data that was written using the old version. Schema versions can also be different when two clients are operating concurrently using two different versions of a schema. In a distributed system such as NoSQL Database, it is normally not possible or desirable to upgrade all clients simultaneously, since this would require downtime. Therefore, for some period of time there will be a mix of clients operating concurrently using different versions of a schema. Fortunately, this situation is handled gracefully by virtue of schema evolution.
For example, imagine that a new field is added to a schema and there are two versions of the schema. The new field is only present in the new version of the schema. The new field must be assigned a default value in the new schema. There are three possible cases.
To support schema evolution, be sure never to change a schema's name or namespace. A schema is uniquely identified by its Avro full name, which is similar to a full Java class name and consists of a combination of the Avro schema namespace and the schema name.
AvroCatalog
provides a variety of AvroBinding
s that
serialize and deserialize the Avro data format. A summary of each binding
is below.
SpecificAvroBinding
is recommended when the schema(s) of the
object(s) in the database are known when the application is being written.
The names of the fields, and how to access them, are known at build time.
A POJO (Plain Old Java Object) class for each schema is generated using
the Avro compiler tools. The POJO classes have property getters and
setters that provide type safety. This makes the SpecificAvroBinding
the easiest of the bindings to use.
GenericAvroBinding
is recommended when the schema(s) of the
object(s) in the database are not known at build-time. Rather than access
the objects using predefined getters and setters, a program using GenericAvroBinding
passes in the names of the fields to a generalized
getter to retrieve data from an Avro object. For example, a generalized
NoSQL Database record browser would require this capability.
JsonAvroBinding
is recommended when interoperability with
other components or external systems that use JSON objects is needed.
With the JsonAvroBinding
, the Jackson API is used to manipulate
JSON data objects. Note that certain Avro data types are not conveniently
represented as JSON values; see JsonAvroBinding
for details.
RawAvroBinding
is recommended when an "escape" from the
built-in serialization provided by the other bindings is needed. The
RawAvroBinding
does not perform serialization, but instead allows
specifying the Avro binary data as a byte array. Serialization can be
performed in any way desired, or not at all in the case where Avro binary
data is exchanged with other components or external systems. Because it is
low level and provides complete flexibility, the RawAvroBinding
provides the least safety and is the most difficult of the bindings to
use.
The detailed trade-offs for using each type of binding are described in
their javadoc: SpecificAvroBinding
, GenericAvroBinding
,
JsonAvroBinding
, and RawAvroBinding
.
getSpecificBinding
, getGenericBinding
and getJsonBinding
) and a multiple schema variant (getSpecificMultiBinding
, getGenericMultiBinding
and getJsonMultiBinding
).
A single schema binding provides type checking. Only values with the given schema (or class, in the case of a specific class binding) can be used with the binding. A single schema specific class binding provides compile-time type checking, while a a single schema generic or JSON binding provides run-time type checking.
A single schema binding is safer than a multiple schema binding and often
preferable for that reason. However, a multiple schema binding may be more
useful when retrieving key-value pairs of different types. A KVStore
method may return values of different types if the application
stores multiple types for a single key, or if a method is called that
returns multiple key-value pairs such as multiGet
,
multiGetIterator
, or storeIterator
. There are several ways of determining
which type is returned in these cases.
SpecificRecord
, GenericRecord
or JsonRecord
, and then the schema name or a property of the object can be
examined.
SpecificRecord
, and then instanceof
can be used to
determine the concrete class.
Note that both single and multiple schema bindings perform class evolution when deserializing a value. The deserialized value will conform to the schema specified as an argument of the getXxxBinding or getXxxMultiBinding method.
A special use case for a generic or JSON multiple schema binding is when the
application treats values dynamically based on their schema, rather than
using a fixed set of schemas that is known in advance to the client
application. In this case the getCurrentSchemas
method can be used to obtain a map of the most current schemas, which can
be passed to getGenericMultiBinding
or
getJsonMultiBinding
.
toObject
method will transform the serialized data
such that the returned object conforms to the schema known to the
application.
The application specifies its known, embedded schemas in different ways, depending on the type of binding used.
Schema
objects from
the schema text, the Schema.Parser
class may be used by the application. After creating it, the schema
object is passed to the getXxxBinding method. A schema object is also
passed to the constructor of GenericRecord
, JsonRecord
and
RawRecord
.
As described further above, all schemas used by an application must be
defined using the NoSQL Database administrative interface. If a schema
specified by the application via the client API has not been defined in the
store, an UndefinedSchemaException
will be thrown by the
getXxxBinding method (if the schema is passed to this method), or by one of
the methods of the returned binding. Matching of the application specified
schemas with schemas in the store is performed using the Schema.equals(java.lang.Object)
method.
One exception to the above is that an application may choose to use the
current version of schemas in the store that are returned by getCurrentSchemas
; in this case the set of schemas used
in the application need not be fixed at build time. A second exception is
when the application chooses to use a raw binding and does not serialize or
deserialize the data, for example, when the serialized byte array is copied
to or from another component or system.
WARNING: The application should not create new Schema
objects
unnecessarily, since schema creation is an expensive operation. The
expected approach is to create each distinct Schema
only once, and
reuse that object whenever it is needed. Also note that all Schema
objects created by the application and passed to an API method in this
package are cached. This cache is associated with the AvroCatalog
instance, which is associated with the KVStore
instance. The cached
references to the Schema
objects are not discarded until the KVStore
instance is closed and discarded. For example, a very undesirable
approach would be for the application to create a new Schema
object
for each serialization or deserialization operation; in this case,
performance would suffer greatly and the cached schemas would eventually
fill the JVM heap.
Method Summary | ||
---|---|---|
Map<String,Schema> |
getCurrentSchemas()
Returns an immutable Map containing the most current version of all schemas from the KVStore client schema cache. |
|
GenericAvroBinding |
getGenericBinding(Schema schema)
Returns a binding for representing a value as an Avro GenericRecord , for values that conform to a single given expected
schema. |
|
GenericAvroBinding |
getGenericMultiBinding(Map<String,Schema> schemas)
Returns a binding for representing a value as an Avro GenericRecord , for values that conform to multiple given expected
schemas. |
|
JsonAvroBinding |
getJsonBinding(Schema schema)
Returns a binding for representing a value as a JsonRecord , for
values that conform to a single given expected schema. |
|
JsonAvroBinding |
getJsonMultiBinding(Map<String,Schema> schemas)
Returns a binding for representing a value as a JsonRecord , for
values that conform to multiple given expected schemas. |
|
RawAvroBinding |
getRawBinding()
Returns a binding for representing a value as a RawRecord
containing the raw Avro serialized byte array and its associated schema. |
|
|
getSpecificBinding(Class<T> cls)
Returns a binding for representing values as instances of a generated Avro specific class, for a single given class. |
|
SpecificAvroBinding<SpecificRecord> |
getSpecificMultiBinding()
Returns a binding for representing values as instances of generated Avro specific classes, for any Avro specific class. |
|
void |
refreshSchemaCache(Consistency consistency)
Refreshes the cache of stored schemas, adding any new schemas or new versions of schemas to the cache that have been stored via the administration interface since the cache was last refreshed. |
Method Detail |
---|
<T extends SpecificRecord> SpecificAvroBinding<T> getSpecificBinding(Class<T> cls)
cls
- an Avro specific class that was previously generated using
the Avro code generation tools.
UndefinedSchemaException
- if the schema associated with the given
class parameter has not been defined using the NoSQL Database
administration interface.SpecificAvroBinding
SpecificAvroBinding<SpecificRecord> getSpecificMultiBinding()
SpecificAvroBinding
GenericAvroBinding getGenericBinding(Schema schema)
GenericRecord
, for values that conform to a single given expected
schema.
schema
- the Avro schema expected for all values and GenericRecord
s used with this binding.
UndefinedSchemaException
- if the given schema has not been
defined using the NoSQL Database administration interface.GenericAvroBinding getGenericMultiBinding(Map<String,Schema> schemas)
GenericRecord
, for values that conform to multiple given expected
schemas.
schemas
- the Avro schemas expected for all values and GenericRecord
s used with this binding. The key in the map is the full
name of the schema.
UndefinedSchemaException
- if any of the given schemas has not
been defined using the NoSQL Database administration interface.JsonAvroBinding getJsonBinding(Schema schema)
JsonRecord
, for
values that conform to a single given expected schema.
schema
- the Avro schema expected for all values and JsonRecord
s used with this binding.
UndefinedSchemaException
- if the given schema has not been
defined using the NoSQL Database administration interface.JsonAvroBinding getJsonMultiBinding(Map<String,Schema> schemas)
JsonRecord
, for
values that conform to multiple given expected schemas.
schemas
- the Avro schemas expected for all values and JsonRecord
s used with this binding. The key in the map is the full
name of the schema.
UndefinedSchemaException
- if any of the given schemas has not
been defined using the NoSQL Database administration interface.RawAvroBinding getRawBinding()
RawRecord
containing the raw Avro serialized byte array and its associated schema.
Map<String,Schema> getCurrentSchemas()
KVStore
client schema cache. The Map key is
the full name of the schema.
A special use case for a generic or JSON multiple schema binding is when
the application treats values dynamically based on their schema, rather
than using a fixed set of known schemas. The getCurrentSchemas
method can be used to obtain a map of the most
current schemas, which can be passed to getGenericMultiBinding
or getJsonMultiBinding
. See GenericAvroBinding
and JsonAvroBinding
for an example of this use case.
void refreshSchemaCache(Consistency consistency)
Calling this method is normally not necessary, since the schema cache is automatically refreshed whenever a schema is specified via any of the Avro binding APIs, and that schema is not already present in the cache.
Calling this method periodically may be necessary when the KVStore
handle is long lived, the getCurrentSchemas()
method is
used to obtain current schemas, and the application wishes to obtain
schemas that were recently added using the administration interface.
WARNING: Calling this method often from multiple threads may cause blocking during the query for schema changes. Also note calling this method often could impact the performance of other operations, since it queries kv pairs in the store.
consistency
- determines the consistency associated with the read
used to query for new schemas. If null, the default consistency
is used.
|
Oracle NoSQL Database version 11gR2.2.0.26 |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |