C H A P T E R 6 |
Configuring Metadata and Virtual File System Views |
This chapter describes how to modify the default schema file to add metadata specific to your applications. This chapter also describes how to modify the default schema file to configure virtual file system views that allow users to browse data objects as though they were stored in a traditional hierarchical file structure.
This chapter contains the following sections:
Note - For instructions on accessing the CLI commands and GUI functions described in this chapter, see Using the Administrative Interfaces. |
The metadata schema specifies the metadata attributes that can be stored with objects in the 5800 system. The system comes preconfigured with a default metadata schema, which you can modify to specify metadata appropriate to your applications.
The following sections describe the metadata schema file and its components.
You specify what metadata the data objects in your system include and how that metadata is structured using the schema file. You also configure virtual views using the schema file. A predefined schema file, which contains the minimum set of attributes, is included with the 5800 system. You modify that schema file to add the extended metadata and file system views applicable to your configuration.
The schema file for the 5800 system is a standard XML file with the general format shown in CODE EXAMPLE 6-1. See CODE EXAMPLE 6-3 for an example of a schema file.
The Document Type Definition (DTD), which defines the structure of a the schema file, is shown in CODE EXAMPLE 6-2.
CODE EXAMPLE 6-3 shows an example of a schema file for a system storing MP3 music files.
Metadata is information that describes a data object. The 5800 system stores metadata about all data objects in a distributed database. Users can issue queries to search the database and find objects based on the metadata assigned to them. The 5800 system allows two types of metadata: system and extended.
The 5800 system automatically assigns system metadata to every data object when it is stored on the 5800 system. System metadata includes a unique identifier for each object, called the Object ID or OID. The application programming interface (API) included with the 5800 system can retrieve the object using this OID. System metadata also includes creation time, data length, and data hash.
Extended metadata goes beyond the system metadata to further describe each data object. For example, if the data stored on the 5800 system includes medical records, extended metadata attributes might include patient name, date of visit, doctor name, medical record number, and insurance company. Users can issue queries to retrieve data objects using these attributes. For example, a query could retrieve all records (data objects) for a given doctor and a particular insurance company.
The 5800 system supports metadata as sets of typed name-value pairs. TABLE 6-1 lists the supported metadata types.
You can group metadata into namespaces, or collections of metadata names, identified by a string. Namespaces are essentially directories of metadata names. Just as directories can include subdirectories, namespaces can include subnamespaces or namespaces within namespaces. You can have as many namespaces as you want in the 5800 system metadata schema. There is also no limit on the number of subnamespaces within a given namespace.
The full name of an attribute is the name of its namespace, followed by a dot, followed by the attribute name. For example, the attribute name yoyodyne.widget.oscillation.overthruster represents an attribute whose name is overthruster, which is grouped within the subnamespace oscillation, which is part of the subnamespace widget, which in turn is part of the namespace yoyodyne.
When defining a namespace in the metadata schema, you can define two optional properties:
If a namespace is writable, you can specify any field in the namespace when you store an object. If a namespace is nonwritable, it is read-only, and you cannot specify any of the fields. The system namespace, for example, is nonwritable (read-only). If a namespace is nonwritable, any subnamespaces you add will also be nonwritable.
By default, namespaces are extensible, which means that you can add attributes or subdomains to the namespace. You can change a namespace from extensible to nonextensible, but the reverse is not true.
The 5800 system reserves a namespace called system for metadata created by the 5800 system itself and a namespace called filesystem to specify how the file system layer presents files. For example, the system namespace includes the creation time for an object and the filesystem namespace includes the object’s user identifier (UID) and group identifier.
TABLE 6-2 lists the namespaces that 5800 system reserves.
TABLE 6-3 lists the contents of the reserved system namespace.
TABLE 6-4 lists the contents of the reserved filesystem namespace.
Applications must always use the fully qualified name of the attribute when storing metadata or querying. The fully qualified name includes of all the enclosed namespace names from the broadest to the narrowest, separated by dots, followed by the attribute name itself, as in namespace.subnamespace.fieldName.
You might want to use the name of your organization or company as the top level namespace, and something like project names as subnamespaces. For example, an organization named Yoyodyne, Inc. might set up their namespaces and subnamespaces as follows:
You partition the metadata schema into tables and specify each metadata field as a column within a particular table. Objects stored in the 5800 system become rows in one or more tables, depending on which metadata fields are associated with that data.
All fields in a query should come from the same table, since queries might fail if they include fields from different tables. The largest supported query string is 8080 bytes. The combined size of all query literals and parameters is also limited to 8080 bytes. If you must use queries that include fields from more than one table, make sure that no more than one query that references multiple tables is running at a time. Refer to the Sun StorageTek 5800 System Client API Reference Guide for complete information about query sizes and limitations.
Suppose you specify columns in a table reference in the metadata schema as shown in the following example.
<table name="reference"/> <column name="mp3.artist"/> <column name="mp3.album"/> <column name="mp3.title"/> <column name="dates.year"/> </table> |
The reference table you create would have the logical layout shown in TABLE 6-5.
When an object is stored in the 5800 system that has any of the specified metadata attributes associated with it (mp3.artist, mp3.album, mp3.title, or dates.year), that object OID is listed as a row in the reference table and the values of the attributes are listed in the column that corresponds to that attribute. If no value is assigned to an attribute for that object, no value is listed in the corresponding column.
If the object has other metadata associated with it, that object will also be stored in whichever tables include that other metadata as columns.
You specify a length attribute for fields of type string, binary, and char. The length attribute is important because there are limits to the number of bytes that each table row and each index can store. See Planning Tables and Planning Indexes for more information.
Note - The 5800 system emulator supports the same field length as that supported by the 5800 system, within the specified limits. |
Trying to store a string, binary, or char value that is longer than the specified field length will result in an error message.
You should store metadata attributes that occur together in queries in the same table, since queries that include fields from different tables may fail. Pay close attention to which metadata attributes occur together in your data, especially if those attributes are used in queries, and group those fields together into the same table.
Conversely, avoid putting fields into the same table that do not occur together in queries, since doing so wastes space and degrades query performance.
When planning tables, be aware that the maximum number of bytes allowed for any single row in the table is 8080.
You might want to specify as small a length value as possible for each field (column) in the table so that you can fit as many columns as possible in the table and any single row will not exceed the 8080-byte limit.
TABLE 6-6 lists the number of bytes that each element in a column consumes. The total amount of space consumed by all the columns in a table cannot exceed 8080 bytes.
Suppose the fields listed in TABLE 6-7 commonly occur together and will be used together in queries. (Three of these fields are in the namespace mp3 and one is in the namespace dates.)
Include each of these fields as columns in the same table, called, for example, reference. The maximum number of bytes allowed in any row in the table is 8080. When planning the reference table, calculate the total number of bytes used by all the columns combined, to make sure it is less than 8080, as follows:
78 (for system overhead) +
8 (2 per column for column overhead) +
512 (for mp3.artist) +
512 (for mp3.album) +
1024 (for mp3.title) +
8 (for dates.year)
Since 2142 bytes is less than 8080 bytes, the total combined size of all columns is acceptable.
For best results, consider this information when planning tables:
The system creates indexes on metadata fields to allow those fields to be queried more efficiently. You use virtual file system views to specify the content of the indexes the system creates and to maximize query performance.
Note - You also configure virtual file system views that have nothing to do with indexes. See Virtual File System Views for information about virtual file system views. |
For each virtual file system view you create, the system creates an index of up to 15 fields, as long as those fields all come from the same table.
For virtual file system views that you create in order to specify indexes that improve query performance, follow these guidelines:
This section includes two different examples of how you might go about planning for indexes.
Suppose you want to have a query on the fields listed in TABLE 6-9.
To maximize query performance, you include each of these fields as columns within the same table, called books. To maximize performance even further, you create a virtual file system view called, for example, bookview, that includes these fields and no others so that an index is created on these fields for querying.
Since all of the fields are from the same table, the system creates an index that includes all of these fields, as long as the total number of bytes required for the index does not exceed 1024. Calculate the number of bytes required for the index as follows:
78 (for system overhead) +
8 (2 per column for column overhead) +
100 (for book.author) +
100 (for book.series) +
100 (for book.title) +
8 (for dates.year)
Since 394 is less than 1024, the system indexes all of the fields, allowing them to be queried at maximum performance.
If you calculate that the fields in a query cannot be indexed because they require too much space, you might want to reduce the length specified for each field. Alternatively, you might want to define a virtual file system view with a smaller set of fields. An index of a subset of fields in the query might still help to speed up query performance.
Suppose your system is configured with the schema file shown in CODE EXAMPLE 6-4.
If you know that users are likely to do searches on the owner, date, and keywords fields, you could create an index called key_owner_index on those fields using the fsView tag, as shown in the schema file example shown in CODE EXAMPLE 6-5. (Since keyword is included as the filename property, it is automatically included as an attribute of the fsView and therefore included in the index.)
Users in this example might also commonly search on just the owner and keyword fields, and also sometimes on owner, keyword, and title. The system cannot process queries that do not exactly match an existing index as quickly as queries that do, but if the query fields are almost the same as the fields in the indexes, the performance might still be acceptable.
You should test the queries on your system to see if additional indexes are required to speed query performance.
By setting queryable = false, you can exclude that field from the metadata that is indexed and available for queries. You might want to exclude a field from indexes, for example, if you will only access that field through the retrieveMetadata example application, and never through queries.
To maximize the performance of queries, keep in mind these considerations when planning tables and indexes:
The 5800 system stores data as discrete objects that users retrieve through queries on object identifiers and/or metadata. The data is not stored in the hierarchical structure typical of file systems, which contain directories, subdirectories, and files.
However, you can set up a virtual view of the data, which presents the data objects in a hierarchical structure that mimics a file system. For example, for a 5800 system that stores MP3 files, you could set up a virtual view with a directory for the artist, a subdirectory for the album, and file names based on the title of the music files.
Users access the file system views of the data using a browser and the Web-based Distributed Authoring and Versioning (WebDAV) protocol.
You access virtual file system views of the data through the Web-based Distributed Authoring and Versioning (WebDAV) protocol, a set of extensions to the HTTP/1.1 protocol that enables you to read, add, and delete files on remote web servers.
WebDAV is not supported for multi-cell configurations.
Note - Virtual file system views are available to browse whenever the Status at a Glance panel in the GUI or the sysstat CLI command shows that the Query Engine status is HAFaultTolerant. See Monitoring the System or Obtaining System Status for more information. |
To access the virtual file system views using WebDAV, type the following in the browser’s address page:
where data-VIP is the data VIP address of the 5800 system. See Data IP Address for information about the data VIP address.
The following example shows a WebDAV screen that might be displayed in a user’s browser. It lists the virtual file system views that are defined on that system.
Clicking the links on this page enables users to browse objects as though they were arranged in a file system structure.
For example, suppose you have defined a virtual file system view byArtist that includes the subdirectories artist and album (in that order). You have indicated in the virtual file system view definition that the files should be named by track number (tracknum). Clicking byArtist in the browser would yield the list of artists, as follows:
Clicking Rush would list the album names, as follows:
Clicking Signals would list the album's track numbers, as follows:
Clicking the link for 1 would enable users to access the data object on the 5800 system associated with track 1 of the Rush album Signals.
Each file in the 5800 system virtual view appears as a file in the file system exported to WebDAV. The file attributes (stat data) are exported as WebDAV properties. TABLE 6-10 lists the WebDAV property names and corresponding system metadata attributes. These attributes are regular metadata values accessible through API queries.
As described in Metadata Attributes and WebDAV Properties, the 5800 system exports a number of file attributes as part of a virtual view. In addition to those attributes that are always exported, you can choose to have the remaining attributes in the filesystem namespace (filesystem.mimetype and filesystem.mtime) exported with the files.
If you choose this option, the WebDAV browser uses the filesystem.mimetype attribute as Content-type in the HTTP header. With Content-type supplied in the HTTP header, when a user clicks on a link to download the file, WebDAV opens the appropriate program. Without Content-type in the HTTP header, the WebDAV browser does not know the file’s type and simply prompts the user to save the file to disk.
If you are using the CLI to configure virtual views, select this option by setting fsattrs to true in the schema file, as shown in the Example Schema File.
If you are using the GUI to configure virtual views, select this option by selecting the Include Extended File System Fields checkbox on the Setup Virtual File Systems panel. See Configuring Virtual File System Views Using the GUI for more information on using the GUI to configure virtual views.
Note - Choosing this option to retrieve the additional file system attributes requires an additional query to the 5800 system and therefore might negatively affect system performance. |
You can use the filesonlyatleaflevel attribute to control which objects are displayed as part of a virtual file system view.
If you keep the filesonlyatleaflevel attribute at its default of true, an object is displayed as part of the virtual file system view only if it has metadata values stored in the 5800 system for all fields specified in the attribute list for the virtual file system view and also in the filename description.
For example, suppose you have set up a virtual view called byArtist as follows:
<fsView name="byArtist" namespace="mp3" filename="${title}.{type}" fsattrs="true" filesonlyatleaflevel="true"> <attribute name="artist"/> <attribute name="album"/> |
In this case, only objects with metadata values for title, type, artist, and album will appear in the virtual file system view. For example, the three objects shown here are stored with metadata values for title, type, artist, and album, and therefore appear at the bottom (or “leaf”) level of the directory in the virtual file system view.
An object that has metadata values for title and artist, but not for type or album, simply would not appear in the view.
If you set the filesonlyatleaflevel attribute to false, any object that has metadata values for all fields specified in the filename description, as well as metadata values for a subset of the fields in the attribute list, appears in the virtual file system view, at the upper levels of the directory (not at the “leaf-level”).
For example, in the preceding example, if the filesonlyatleaflevel attribute were set to false, an object with metadata values for title, type, and artist, but not album, would appear in the virtual file system view as the song “Shattered” by the Rolling Stones appears here:
Note - All attributes in a virtual file system view for which you have specified filesonlyatleaflevel = false must be in the same table. See Tables and Columns for more information about tables. |
The fsView section of the schema file determines the virtual file system views that users can browse using WebDAV. See The Metadata Schema for more information about virtual file system views.
Note the following for fsViews in the schema file:
Note - All attributes in the system namespace are read only. If you include a system attribute in an fsView entry, that entire entry is automatically read only. |
TABLE 6-11 summarizes the purpose and meaning of the fields you must specify and plan for when configuring the metadata schema.
Note - Before configuring the metadata schema, make sure that the query database is online by issuing the sysstat command and checking that the “Query Engine Status” indicates HAFaultTolerant. See sysstat for more information about the sysstat command. |
To Modify the Schema File Using the CLI |
1. Create a schema overlay to extend an existing schema.
A schema overlay is an XML file that follows the specification shown in Schema File DTD. It contains only the new namespaces and fields that you want to add.
If you want, you can use mdconfig followed by the -t or --template option. This returns an XML template file that you can use as a starting point to create that overlay.
Once a version of the overlay is available, you can perform a validation through the CLI. The purpose of the validation is to ensure that the XML syntax is correct and also to provide an overview of the operation that will be performed if the overlay occurs.
2. To perform a validation on the overlay.xml file, use the command mdconfig followed by the -p or --parse option.
For example, to validate the local overlay.xml file, type the following command from the system on the network where the overlay.xml file is stored:
$ cat overlay.xml | ssh admin@admin_IP mdconfig --parse
Once you are satisfied with the overlay, you must commit it so the 5800 system can execute it.
3. To commit the overlay.xml file, use the command mdconfig followed by the -a or --apply option.
For example, to continue the previous example, enter the following command from the system on the network where the overlay.xml file is stored:
$ cat overlay.xml | ssh admin@admin_IP mdconfig --apply
Note - The --apply option runs a validation before performing the commit operation. If the XML syntax is not correct, the system returns an error. |
If the system is under heavy load, you might see the following error message when you issue the mdconfig --apply command:
Timed out waiting for the state machine.
This message indicates that the new schema definition file has been committed to the system, but not all of the tables have been created.
In this case, reduce the load on the system if possible, and use the command mdconfig --retry to finish the table creation:
$ ssh admin@admin_IP mdconfig --retry
When you issue the mdconfig --retry command, the system finishes creating any tables that were not completed during the mdconfig -a operation. Tables that had already been created are not affected. You might have to issue the mdconfig --retry command several times before all tables are created.
This section includes procedures for using the GUI to display the current metadata schema and make changes to the schema.
To Display the Current Metadata Schema |
From the navigational panel, choose Configuration > Metadata Schema > View Schema.
The View Schema panel is displayed, listing the namespaces and tables that are configured in the schema.
To Display the Fields in a Namespace |
1. From the navigational panel, choose Configuration > Metadata Schema > View Schema.
The View Schema panel is displayed, listing the namespaces and tables that are configured in the schema.
2. In the Namespaces section, select the namespace for which you want to display fields.
The fields are listed in the Fields for Selected Namespace section.
To Display the Fields in a Table |
1. From the navigational panel, choose Configuration > Metadata Schema > View Schema.
The View Schema panel is displayed, listing the namespaces and tables that are configured in the schema.
2. In the Tables section, select the table for which you want to display fields.
The fields are listed in the Columns for Selected Table section.
To Change the Metadata Schema |
Note - Before making changes to the metadata schema, make sure that the query database is online by checking that the Status At A Glance panel indicates that “Query Engine Status” is HAFaultTolerant. |
1. From the navigation panel, choose Configuration > Metadata Schema.
The Set Up Schema panel is displayed.
3. Create namespaces as described in Creating Namespaces.
For information about namespaces, see Namespaces.
4. Create tables as described in Creating Tables.
For information about planning tables, see Planning Tables.
You cannot delete a namespace from the schema. Once a namespace is created, you can only add fields to the namespace, assuming the namespace is extensible. Therefore, review the following information before creating namespaces and namespace fields:
1. From the navigational panel, choose Configuration > Metadata Schema.
The Set Up Schema panel is displayed.
next to the Namespaces box.
The Add Namespace panel is displayed.
5. Choose the parent namespace from the Parent Namespace drop-down menu.
Note - Choosing root as the parent namespace, selecting the Is Extensible check box, and applying your changes causes this namespace to become a parent namespace. |
6. Define whether the namespace will be writable and/or extensible by clearing or selecting the appropriate check boxes.
next to the Fields box.
Columns are displayed in the Fields box.
The Create Namespace panel is closed, and the newly created namespace and its fields are displayed on the Set Up Schema panel.
10. Create tables for the fields in the namespace as described in Creating Tables.
You cannot delete a table from the schema. Therefore, review the following information before creating tables:
1. From the navigation panel, go to Configuration > Metadata Schema.
The Set Up Schema panel is displayed.
3. Create namespaces as described in Creating Namespaces.
next to the Tables box.
The Create File System Table panel is displayed.
For information on planning tables, see Planning Tables.
6. Choose the namespace that contains the fields you want to include in the table.
The available fields from the namespace are displayed in the Available Fields box.
7. Select the fields that you want included in the table and click the Move Right button
to move the fields to the Selected Fields box.
8. Perform Steps 5 and 6 for all fields that you want to include in the table.
The Create Filesystem Table panel is closed and the newly created table is displayed on the Set Up Schema panel.
To Add Fields to an Existing Namespace |
Note - You can add fields to existing namespaces only if they are extensible. |
1. From the navigation panel, go to Configuration > Metadata Schema.
The Set Up Schema panel is displayed.
3. Make sure the Show New/Modified Namespaces Only check box is not selected, so that all existing namespaces are displayed on the panel.
4. Select the namespace to which you want to add fields.
The namespace fields are displayed in the Fields for Selected Namespace box.
next to the Fields for Selected Namespace box.
The Add Namespace Fields panel is displayed.
6. Specify the following for this field:
7. If you want to add another new field, click the Add button
and repeat Steps 5 and 6.
The panel is closed and you are returned to the Set Up Schema Panel.
This section includes procedures for displaying the currently configured virtual file system views, creating new views, and browsing the views.
To Display the Current Virtual File System Views |
From the navigational panel, choose Configuration > Virtual File Systems > View Virtual File Systems.
The View Virtual File Systems Views panel is displayed, listing the views that are defined in the system.
To Display the Fields in a View |
1. From the navigational panel, choose Configuration > Virtual File Systems > View Virtual File Systems.
The View Virtual File Systems Views panel is displayed, listing the views that are defined in the system.
2. In the Views section, select the view for which you want to display fields.
The fields are listed in the Fields for Selected View section.
To Create New Virtual File System Views |
1. From the navigation panel, go to Configuration > Virtual File Systems.
2. Click Set Up Virtual File Systems.
The Set Up Virtual File Systems panel is displayed.
4. If you do not want users who browse this view to be able to add or delete objects, select the Read-Only check box.
5. If you want users who browse this view to see only files for which there are attributes at every level in the hierarchy, select the Files Only at Leaf Level check box.
If you want users to see files at higher levels in the hierarchy if there are no attributes at the lower levels, do not select this checkbox. See Directory Structure in a Virtual File System View for more information.
6. If you want to include the filesystem.mimetype and filesystem.mtime attributes for each file as part of the virtual view, select the Include Extended File System Fields check box.
See Including Additional File Attributes in Virtual Views for more information.
7. In the Available Fields box, select the fields that you want in the view and click the Move Right button
to move the fields to the Selected Fields box.
8. In the File Naming Convention For View section, choose a field from the Selected Fields drop-down menu and click Add To Pattern.
The fields you select are displayed in the Name Pattern field. This pattern specifies what the names of the objects will be in the virtual view.
For example, you might set up a virtual file system view named Songs, as shown in FIGURE 6-1. Users connecting to the 5800 system using WebDAV would see a virtual file system view that displayed song files in a hierarchy with album as the main folder, and artist and title as subfolders.
FIGURE 6-1 Virtual File System View Configuration Example
To Preview Virtual File System Views |
1. From the navigation panel, choose Configuration > Virtual File Systems.
2. Click Browse Virtual File Systems.
The virtual file systems configured on your system are displayed, as a user accessing the system using WebDAV would see them.
Copyright © 2008, Sun Microsystems, Inc. All Rights Reserved.