Oracle NoSQL Database
version 11gR2.2.0.26

oracle.kv.avro
Interface AvroCatalog


public interface AvroCatalog

A catalog of Avro schemas and bindings for a store.

Manages schemas and provides AvroBindings for use with the Avro data format. The bindings are used along with KVStore APIs for storing and retrieving key-value pairs. The bindings are used to serialize Avro values before writing them, and deserialize Avro values after reading them. An AvroCatalog is obtained by calling KVStore.getAvroCatalog().

WARNING: We strongly recommend using an AvroBinding. NoSQL Database will leverage Avro in the future to provide additional features and capabilities.

WARNING: To take advantage of the Avro data format, the bindings in this class must be used. The Value byte array is constructed by the binding to include an internal reference to the schema used for serialization. The Value byte array may not be manipulated directly by the application.

Avro Schemas

When the Avro data format is used, each stored value must be associated with an Avro schema. The Avro schema describes the fields allowed in the value, along with their data types. An Avro schema is created by the application developer, added to the store using the NoSQL Database administration interface, and used in the client API via the AvroCatalog class.

An Avro schema is created in JSON format, typically using a text editor and initially saved in a text file. Of course, to create an Avro schema the developer must understand the Avro schema syntax. For more information see Avro Schemas in the Getting Started Guide and the Avro schema specification.

Once created and saved in a text file, the schema is added to the store using the ddl add-schema administrative command, using the text file as input; see Adding Schema in the Getting Started Guide. Until a schema is added, it may not be used in the client API to store values. The use of the schema in the client API is described further below.

Note that the use of Avro schemas allows serialized values to be stored in a very space-efficient binary format. Each value is stored without any metadata other than a small internal schema identifier, between 1 and 4 bytes in size. One such reference is stored per key-value pair. In this way, the serialized Avro data format is always associated with the schema used to serialize it, with minimal overhead. This association is made transparently to the application, and the internal schema identifier is managed by the bindings supplied by the AvroCatalog class. The application never sees or uses the internal identifier directly.

Two example schemas are shown below along with the administrative commands for adding them to the store. These schemas are used further below in other examples.

The schemas might be stored in a simple text file, schema1.txt:

  { 
  "type": "record",
  "name": "MemberInfo",
  "namespace": "avro",
  "fields": [
      {"name": "name", "type": {
          "type": "record",
          "name": "FullName",
          "fields": [
              {"name": "first", "type": "string", "default": ""},
              {"name": "last", "type": "string", "default": ""}
          ]
      }, "default": {}},
      {"name": "age", "type": "int", "default": 0}
   ]
 }
The administrative command for adding the above schemas is:
  > ddl add-schema -file schema1.txt

Schema Evolution

A schema may be changed, even after data values are stored using that schema, using the ddl add-schema administrative command with the -evolve option; see Changing Schema in the Getting Started Guide. The modified schema is saved in a text file, which is passed to this command as input. For example, fields may be added, removed or renamed.

For example, if a middle name property is added in the future to the schema, it might be stored in schema2.txt. Note that a new field must be given a default value.

  { 
  "type": "record",
  "name": "MemberInfo",
  "namespace": "avro",
  "fields": [
      {"name": "name", "type": {
          "type": "record",
          "name": "FullName",
          "fields": [
              {"name": "first", "type": "string", "default": ""},
              { "name": "middle", "type": "string", "default": "" },
              {"name": "last", "type": "string", "default": ""}
          ]
      }, "default": {}},
      {"name": "age", "type": "int", "default": 0}
   ]
 }
The administrative command for adding the new version of the schema is:
  > ddl add-schema -file schema2.txt -evolve

When a schema is changed, multiple versions of the schema will exist and be maintained by the store. The version of the schema used to serialize a value, before writing it to the store, is called the writer schema. The writer schema is specified by the application when creating a binding. It is associated with the value when calling the binding's AvroBinding.toValue(T) method to serialize the data. As mentioned above, the writer schema is associated internally with every stored value.

The reader schema is used to deserialize a value after reading it from the store. Like the writer schema, the reader schema is specified by the client application when creating a binding. It is used to deserialize the data when calling the binding's AvroBinding.toObject(oracle.kv.Value) method, after reading a value from the store.

When the reader and writer schemas are different, schema evolution is applied during deserialization. Schema evolution is applied by transforming the data during deserialization, so that data stored according to the writer schema is transformed to conform to the reader schema. When the reader and writer schemas are the same, no data transformation is necessary. Also note that no data transformation takes place during serialization; i.e., data is always written according to the writer schema.

Reader and writer schemas can be different when a client is changed to use a new version of the schema, and then reads data that was written using the old version. Schema versions can also be different when two clients are operating concurrently using two different versions of a schema. In a distributed system such as NoSQL Database, it is normally not possible or desirable to upgrade all clients simultaneously, since this would require downtime. Therefore, for some period of time there will be a mix of clients operating concurrently using different versions of a schema. Fortunately, this situation is handled gracefully by virtue of schema evolution.

For example, imagine that a new field is added to a schema and there are two versions of the schema. The new field is only present in the new version of the schema. The new field must be assigned a default value in the new schema. There are three possible cases.

  1. The writer schema and reader schema are the same. Schema evolution is not necessary and no data transformation is applied.

  2. The writer schema is the old version and the reader schema is the new version. Because the writer schema is the old version, the new field is not present in the stored data. When a client uses the new version as a reader schema, the new field will appear to the client as having the default value.

  3. The writer schema is the new version and the reader schema is the old version. Because the writer schema is the new version, the new field is present in the stored data. When a client uses the old version as a reader schema, the new field will not appear at all to the client.
If instead a field were deleted from a schema, the same rules would apply but with the roles reversed. Renaming a field is also possible by adding a field alias to the schema; in this case the field is accessible by both the old and new name. For more information see Schema Evolution in the Getting Started Guide and the detailed rules for schema evolution in the Avro schema specification.

To support schema evolution, be sure never to change a schema's name or namespace. A schema is uniquely identified by its Avro full name, which is similar to a full Java class name and consists of a combination of the Avro schema namespace and the schema name.

Avro schema restrictions

The Avro type of a top-level schema, that is to be stored as the value in a key-value pair, must be the Avro type record.

Choosing a Binding

The AvroCatalog provides a variety of AvroBindings that serialize and deserialize the Avro data format. A summary of each binding is below.

The detailed trade-offs for using each type of binding are described in their javadoc: SpecificAvroBinding, GenericAvroBinding, JsonAvroBinding, and RawAvroBinding.

Single schema and multiple schema bindings

Specific, generic and JSON bindings have a single schema variant (getSpecificBinding, getGenericBinding and getJsonBinding) and a multiple schema variant (getSpecificMultiBinding, getGenericMultiBinding and getJsonMultiBinding).

A single schema binding provides type checking. Only values with the given schema (or class, in the case of a specific class binding) can be used with the binding. A single schema specific class binding provides compile-time type checking, while a a single schema generic or JSON binding provides run-time type checking.

A single schema binding is safer than a multiple schema binding and often preferable for that reason. However, a multiple schema binding may be more useful when retrieving key-value pairs of different types. A KVStore method may return values of different types if the application stores multiple types for a single key, or if a method is called that returns multiple key-value pairs such as multiGet, multiGetIterator, or storeIterator. There are several ways of determining which type is returned in these cases.

Note that both single and multiple schema bindings perform class evolution when deserializing a value. The deserialized value will conform to the schema specified as an argument of the getXxxBinding or getXxxMultiBinding method.

A special use case for a generic or JSON multiple schema binding is when the application treats values dynamically based on their schema, rather than using a fixed set of schemas that is known in advance to the client application. In this case the getCurrentSchemas method can be used to obtain a map of the most current schemas, which can be passed to getGenericMultiBinding or getJsonMultiBinding.

Using Schemas with Bindings

A client application normally embeds a copy of the schemas it uses, rather than getting the current schemas from the store. The client's schemas are specified when a binding is created by one of the getXxxBinding methods. This supports schema evolution (as described above), in that the toObject method will transform the serialized data such that the returned object conforms to the schema known to the application.

The application specifies its known, embedded schemas in different ways, depending on the type of binding used.

As described further above, all schemas used by an application must be defined using the NoSQL Database administrative interface. If a schema specified by the application via the client API has not been defined in the store, an UndefinedSchemaException will be thrown by the getXxxBinding method (if the schema is passed to this method), or by one of the methods of the returned binding. Matching of the application specified schemas with schemas in the store is performed using the Schema.equals(java.lang.Object) method.

One exception to the above is that an application may choose to use the current version of schemas in the store that are returned by getCurrentSchemas; in this case the set of schemas used in the application need not be fixed at build time. A second exception is when the application chooses to use a raw binding and does not serialize or deserialize the data, for example, when the serialized byte array is copied to or from another component or system.

WARNING: The application should not create new Schema objects unnecessarily, since schema creation is an expensive operation. The expected approach is to create each distinct Schema only once, and reuse that object whenever it is needed. Also note that all Schema objects created by the application and passed to an API method in this package are cached. This cache is associated with the AvroCatalog instance, which is associated with the KVStore instance. The cached references to the Schema objects are not discarded until the KVStore instance is closed and discarded. For example, a very undesirable approach would be for the application to create a new Schema object for each serialization or deserialization operation; in this case, performance would suffer greatly and the cached schemas would eventually fill the JVM heap.

Since:
2.0

Method Summary
 Map<String,Schema> getCurrentSchemas()
          Returns an immutable Map containing the most current version of all schemas from the KVStore client schema cache.
 GenericAvroBinding getGenericBinding(Schema schema)
          Returns a binding for representing a value as an Avro GenericRecord, for values that conform to a single given expected schema.
 GenericAvroBinding getGenericMultiBinding(Map<String,Schema> schemas)
          Returns a binding for representing a value as an Avro GenericRecord, for values that conform to multiple given expected schemas.
 JsonAvroBinding getJsonBinding(Schema schema)
          Returns a binding for representing a value as a JsonRecord, for values that conform to a single given expected schema.
 JsonAvroBinding getJsonMultiBinding(Map<String,Schema> schemas)
          Returns a binding for representing a value as a JsonRecord, for values that conform to multiple given expected schemas.
 RawAvroBinding getRawBinding()
          Returns a binding for representing a value as a RawRecord containing the raw Avro serialized byte array and its associated schema.
<T extends SpecificRecord>
SpecificAvroBinding<T>
getSpecificBinding(Class<T> cls)
          Returns a binding for representing values as instances of a generated Avro specific class, for a single given class.
 SpecificAvroBinding<SpecificRecord> getSpecificMultiBinding()
          Returns a binding for representing values as instances of generated Avro specific classes, for any Avro specific class.
 void refreshSchemaCache(Consistency consistency)
          Refreshes the cache of stored schemas, adding any new schemas or new versions of schemas to the cache that have been stored via the administration interface since the cache was last refreshed.
 

Method Detail

getSpecificBinding

<T extends SpecificRecord> SpecificAvroBinding<T> getSpecificBinding(Class<T> cls)
Returns a binding for representing values as instances of a generated Avro specific class, for a single given class.

Parameters:
cls - an Avro specific class that was previously generated using the Avro code generation tools.
Returns:
the AvroBinding that can be used for serialization and deserialization.
Throws:
UndefinedSchemaException - if the schema associated with the given class parameter has not been defined using the NoSQL Database administration interface.
See Also:
SpecificAvroBinding

getSpecificMultiBinding

SpecificAvroBinding<SpecificRecord> getSpecificMultiBinding()
Returns a binding for representing values as instances of generated Avro specific classes, for any Avro specific class.

Returns:
the AvroBinding that can be used for serialization and deserialization.
See Also:
SpecificAvroBinding

getGenericBinding

GenericAvroBinding getGenericBinding(Schema schema)
Returns a binding for representing a value as an Avro GenericRecord, for values that conform to a single given expected schema.

Parameters:
schema - the Avro schema expected for all values and GenericRecords used with this binding.
Returns:
the AvroBinding that can be used for serialization and deserialization.
Throws:
UndefinedSchemaException - if the given schema has not been defined using the NoSQL Database administration interface.

getGenericMultiBinding

GenericAvroBinding getGenericMultiBinding(Map<String,Schema> schemas)
Returns a binding for representing a value as an Avro GenericRecord, for values that conform to multiple given expected schemas.

Parameters:
schemas - the Avro schemas expected for all values and GenericRecords used with this binding. The key in the map is the full name of the schema.
Returns:
the AvroBinding that can be used for serialization and deserialization.
Throws:
UndefinedSchemaException - if any of the given schemas has not been defined using the NoSQL Database administration interface.

getJsonBinding

JsonAvroBinding getJsonBinding(Schema schema)
Returns a binding for representing a value as a JsonRecord, for values that conform to a single given expected schema.

Parameters:
schema - the Avro schema expected for all values and JsonRecords used with this binding.
Returns:
the AvroBinding that can be used for serialization and deserialization.
Throws:
UndefinedSchemaException - if the given schema has not been defined using the NoSQL Database administration interface.

getJsonMultiBinding

JsonAvroBinding getJsonMultiBinding(Map<String,Schema> schemas)
Returns a binding for representing a value as a JsonRecord, for values that conform to multiple given expected schemas.

Parameters:
schemas - the Avro schemas expected for all values and JsonRecords used with this binding. The key in the map is the full name of the schema.
Returns:
the AvroBinding that can be used for serialization and deserialization.
Throws:
UndefinedSchemaException - if any of the given schemas has not been defined using the NoSQL Database administration interface.

getRawBinding

RawAvroBinding getRawBinding()
Returns a binding for representing a value as a RawRecord containing the raw Avro serialized byte array and its associated schema.

Returns:
the AvroBinding that can be used for packaging and unpackaging the serialized value.

getCurrentSchemas

Map<String,Schema> getCurrentSchemas()
Returns an immutable Map containing the most current version of all schemas from the KVStore client schema cache. The Map key is the full name of the schema.

A special use case for a generic or JSON multiple schema binding is when the application treats values dynamically based on their schema, rather than using a fixed set of known schemas. The getCurrentSchemas method can be used to obtain a map of the most current schemas, which can be passed to getGenericMultiBinding or getJsonMultiBinding. See GenericAvroBinding and JsonAvroBinding for an example of this use case.

Returns:
an immutable Map of full schema name to schema object.

refreshSchemaCache

void refreshSchemaCache(Consistency consistency)
Refreshes the cache of stored schemas, adding any new schemas or new versions of schemas to the cache that have been stored via the administration interface since the cache was last refreshed.

Calling this method is normally not necessary, since the schema cache is automatically refreshed whenever a schema is specified via any of the Avro binding APIs, and that schema is not already present in the cache.

Calling this method periodically may be necessary when the KVStore handle is long lived, the getCurrentSchemas() method is used to obtain current schemas, and the application wishes to obtain schemas that were recently added using the administration interface.

WARNING: Calling this method often from multiple threads may cause blocking during the query for schema changes. Also note calling this method often could impact the performance of other operations, since it queries kv pairs in the store.

Parameters:
consistency - determines the consistency associated with the read used to query for new schemas. If null, the default consistency is used.

Oracle NoSQL Database
version 11gR2.2.0.26

Copyright (c) 2011, 2013 Oracle and/or its affiliates. All rights reserved.