1.11 Loading and Exporting RDF Data

You can load RDF data into an RDF graph in the database and export that data from the database into a staging table.

To load RDF data into an RDF graph, use one or more of the following options:

  • Bulk load or append data into the RDF graph from a staging table, with each row containing the three components -- subject, predicate, and object -- of an RDF triple and optionally a named graph. This is explained in Bulk Loading RDF Data Using a Staging Table.

    This is the fastest option for loading large amounts of data.

  • Load data into the application table using SQL INSERT statements that call the SDO_RDF_TRIPLE_S constructor, which results in the corresponding RDF triple, possibly including a graph name, to be inserted into the RDF data store, as explained in Loading RDF Data Using INSERT Statements.

    This option is convenient for loading small amounts of data

  • Load data into the RDF graph with SPARQL Update statements executed through SEM_APIS.UPDATE_RDF_GRAPH, as explained in Support for SPARQL Update Operations on an RDF Graph.

    This option is convenient for loading small amounts of data, and can also be used to load larger amounts of data through LOAD statements.

  • Load data into the RDF graph using the Apache Jena-based Java API, which is explained in RDF Graph Support for Apache Jena.

    This option provides several ways to load both small and large amounts of data, and it supports many different RDF serialization formats.

Note:

Unicode data in the staging table should be escaped as specified in WC3 N-Triples format (http://www.w3.org/TR/rdf-testcases/#ntriples). You can use the SEM_APIS.ESCAPE_RDF_TERM function to escape Unicode values in the staging table. For example:

create table esc_stage_tab(rdf$stc_sub, rdf$stc_pred, rdf$stc_obj);

insert /*+ append nologging parallel */ into esc_stage_tab (rdf$stc_sub, rdf$stc_pred, rdf$stc_obj)
select sem_apis.escape_rdf_term(rdf$stc_sub, options=>’ UNI_ONLY=T '), sem_apis.escape_rdf_term(rdf$stc_pred, options=>’ UNI_ONLY=T '), sem_apis.escape_rdf_term(rdf$stc_obj, options=>’ UNI_ONLY=T ')
from stage_tab; 

To export RDF data, that is, to retrieve RDF data from Oracle Database where the results are in N-Triple or N-Quad format that can be stored in a staging table, use the SQL queries described in Exporting RDF Data.

Note:

Effective with Oracle Database Release 12.1, you can export and import a RDF network using the full database export and import features of the Oracle Data Pump utility, as explained in Exporting or Importing an RDF Network Using Oracle Data Pump.

1.11.1 Bulk Loading RDF Data Using a Staging Table

You can load RDF data (and optionally associated non-RDF data) in bulk using a staging table. Call the SEM_APIS.LOAD_INTO_STAGING_TABLE procedure (described in SEM_APIS Package Subprograms) to load the data, and you can have during the load operation to check for syntax correctness. Then, you can call the SEM_APIS.BULK_LOAD_RDF_GRAPH procedure to load the data into the RDF store from the staging table. (If the data was not parsed during the load operation into the staging table, you must specify the PARSE keyword in the flags parameter when you call the SEM_APIS.BULK_LOAD_RDF_GRAPH procedure.)

The following example shows the format for the staging table, including all required columns and the required names for these columns, plus the optional RDF$STC_graph column which must be included if one or more of the RDF triples to be loaded include a graph name:

CREATE TABLE stage_table (
                     RDF$STC_sub varchar2(4000) not null,
                     RDF$STC_pred varchar2(4000) not null,
                     RDF$STC_obj varchar2(4000) not null,
                     RDF$STC_graph varchar2(4000)
);

If you also want to load non-RDF data, specify additional columns for the non-RDF data in the CREATE TABLE statement. The non-RDF column names must be different from the names of the required columns. The following example creates the staging table with two additional columns (SOURCE and ID) for non-RDF attributes.

CREATE TABLE stage_table_with_extra_cols (
                     source VARCHAR2(4000),
                     id NUMBER,
                     RDF$STC_sub varchar2(4000) not null,
                     RDF$STC_pred varchar2(4000) not null,
                     RDF$STC_obj varchar2(4000) not null,
                     RDF$STC_graph varchar2(4000)
);

Note:

For either form of the CREATE TABLE statement, you may want to add the COMPRESS clause to use table compression, which will reduce the disk space requirements and may improve bulk-load performance.

Both the invoker and the network owner user must have the following privileges: SELECT privilege on the staging table, and INSERT privilege on the application table.

See also the following:

1.11.1.1 Loading the Staging Table

You can load RDF data into the staging table, as a preparation for loading it into the RDF store, in several ways. Some of the common ways are the following:

1.11.1.1.1 Loading N-Triple Format Data into a Staging Table Using SQL*Loader

You can use the SQL*Loader utility to parse and load RDF data into a staging table. If you installed the demo files from the Oracle Database Examples media (see Oracle Database Examples Installation Guide), a sample control file is available at $ORACLE_HOME/md/demo/network/rdf_demos/bulkload.ctl. You can modify and use this file if the input data is in N-Triple format.

Objects longer than NETWORK_MAX_STRING_SIZE bytes cannot be loaded. If you use the sample SQL*Loader control file, triples (rows) containing such long values will be automatically rejected and stored in a SQL*Loader "bad" file. However, you can load these rejected rows by inserting them into the application table using SQL INSERT statements (see Loading RDF Data Using INSERT Statements).

1.11.1.1.2 Loading N-Quad Format Data into a Staging Table Using an External Table

You can use an Oracle external table to load N-Quad format data (extended triple having four components) into a staging table, as follows:

  1. Call the SEM_APIS.CREATE_SOURCE_EXTERNAL_TABLE procedure to create an external table, and then use the SQL STATEMENT ALTER TABLE to alter the external table to include the relevant input file name or names. You must have READ and WRITE privileges for the directory object associated with folder containing the input file or files.
  2. Ensure the network owner has SELECT and INSERT privileges on the external table.
    If the network owner is invoking the routine to populate the staging table and then loading from the staging table, then ensure that the owner has SELECT privilege on the external table and both INSERT and SELECT privileges on the staging table.
  3. Call the SEM_APIS.LOAD_INTO_STAGING_TABLE procedure to populate the staging table.
  4. After the loading is finished, issue a COMMIT statement to complete the transaction.

Example 1-109 Using an External Table to Load a Staging Table

-- Create a source external table (note: table names are case sensitive)
BEGIN
  sem_apis.create_source_external_table(
    source_table    => 'stage_table_source'
   ,def_directory   => 'DATA_DIR'
   ,bad_file        => 'CLOBrows.bad'
   );
END;
/

-- Use ALTER TABLE to target the appropriate file(s)
alter table "stage_table_source" location ('demo_datafile.nt');
 
-- Load the staging table (note: table names are case sensitive)
BEGIN
  sem_apis.load_into_staging_table(
    staging_table => 'STAGE_TABLE'
   ,source_table  => 'stage_table_source'
   ,input_format  => 'N-QUAD');
END;
/

Rows where the objects and graph URIs (combined) are longer than NETWORK_MAX_STRING_SIZE bytes will be rejected and stored in a "bad" file. However, you can load these rejected rows by inserting them into the application table using SQL INSERT statements (see Loading RDF Data Using INSERT Statements).

Example 1-109 shows the use of an external table to load a staging table.

1.11.1.2 Recording Event Traces During Bulk Loading

If a table named RDF$ET_TAB exists in the invoker's schema and if the network owner user has been granted the INSERT and UPDATE privileges on this table, event traces for some of the tasks performed during executions of the SEM_APIS.BULK_LOAD_RDF_GRAPH procedure will be added to the table. You may find the content of this table useful if you ever need to report any problems in bulk load. The RDF$ET_TAB table must be created as follows:

CREATE TABLE RDF$ET_TAB (
  proc_sid VARCHAR2(128), 
  proc_sig VARCHAR2(200),
  event_name varchar2(200),
  start_time timestamp,
  end_time timestamp,
  start_comment varchar2(1000) DEFAULT NULL,
  end_comment varchar2(1000) DEFAULT NULL
);
-- Grant privileges on RDF$ET_TAB to network owner if network owner 
-- is not the owner of RDF$ET_TAB
GRANT SELECT, INSERT, UPDATE on RDF$ET_TAB to <network_owner>;

1.11.2 Loading RDF Data Using INSERT Statements

To load RDF data using INSERT statements, the data should be encoded using < > (angle brackets) for URIs, _: (underscore colon) for blank nodes, and " " (quotation marks) for literals. Spaces are not allowed in URIs or blank nodes. Use the SDO_RDF_TRIPLE_S constructor to insert the data, as described in Constructors for Inserting Triples. You must have INSERT privilege on the application table.

Note:

If URIs are not encoded with < > and literals with " ", statements will still be processed. However, the statements will take longer to load, since they will have to be further processed to determine their VALUE_TYPE values.

The following example assumes an RDF network named NET1 owned by RDFUSER. It includes statements with URIs, a blank node, a literal, a literal with a language tag, and a typed literal:

INSERT INTO nsu_data VALUES (SDO_RDF_TRIPLE_S('nsu','<http://nature.example.com/nsu/rss.rdf>',
  '<http://purl.org/rss/1.0/title>', '"Nature''s Science Update"', 'RDFUSER', 'NET1'));
INSERT INTO nsu_data VALUES (SDO_RDF_TRIPLE_S('nsu', '_:BNSEQN1001A',
  '<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>', 
  '<http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq>', 'RDFUSER', 'NET1'));
INSERT INTO nsu_data VALUES (SDO_RDF_TRIPLE_S('nsu',
  '<http://nature.example.com/cgi-taf/dynapage.taf?file=/nature/journal/v428/n6978/index.html>',
  '<http://purl.org/dc/elements/1.1/language>', '"English"@en-GB', 'RDFUSER', 'NET1'));
INSERT INTO nature VALUES (SDO_RDF_TRIPLE_S('nsu', '<http://dx.doi.org/10.1038/428004b>',
  '<http://purl.org/dc/elements/1.1/date>', '"2004-03-04"^^xsd:date', 'RDFUSER', 'NET1'));

1.11.2.1 Loading Data into Named Graphs Using INSERT Statements

To load an RDF triple with a non-null graph name using an INSERT statement, you must append the graph name, enclosed within angle brackets (< >), after the RDF graph name and colon (:) separator character, as shown in the following example:

INSERT INTO articles_rdf_data VALUES ( 
  SDO_RDF_TRIPLE_S ('articles:<http://examples.com/ns#Graph1>',
    '<http://nature.example.com/Article101>',
    '<http://purl.org/dc/elements/1.1/creator>',
    '"John Smith"', 'RDFUSER', 'NET1'));

1.11.3 Exporting RDF Data

This section contains the following topics related to exporting RDF data, that is, retrieving RDF data from Oracle Database where the results are in N-Triple or N-Quad format that can be stored in a staging table.

1.11.3.1 Retrieving RDF Data from an Application Table

RDF data can be retrieved from an application table using the member functions of SDO_RDF_TRIPLE_S, as shown in Example 1-110 (where the output is reformatted for readability). The example assumes a RDF network named NET1 owned by a database user named RDFUSER.

Example 1-110 Retrieving RDF Data from an Application Table

--
-- Retrieves model-graph, subject, predicate, and object
--
SQL> SELECT a.triple.GET_MODEL('RDFUSER','NET1') AS model_graph, 
            a.triple.GET_SUBJECT('RDFUSER','NET1') AS sub,
            a.triple.GET_PROPERTY('RDFUSER','NET1') pred,
            a.triple.GET_OBJ_VALUE('RDFUSER','NET1') obj
     FROM RDFUSER.NET1#RDFT_ARTICLES a;

MODEL_GRAPH
--------------------------------------------------------------------------------
SUB
--------------------------------------------------------------------------------
PRED
--------------------------------------------------------------------------------
OBJ
--------------------------------------------------------------------------------
ARTICLES
<http://nature.example.com/Article1>
<http://purl.org/dc/elements/1.1/title>
"All about XYZ"


ARTICLES
<http://nature.example.com/Article1>
<http://purl.org/dc/elements/1.1/creator>
"Jane Smith"


ARTICLES
<http://nature.example.com/Article1>
<http://purl.org/dc/terms/references>
<http://nature.example.com/Article2>


ARTICLES
<http://nature.example.com/Article1>
<http://purl.org/dc/terms/references>
<http://nature.example.com/Article3>


ARTICLES
<http://nature.example.com/Article2>
<http://purl.org/dc/elements/1.1/title>
"A review of ABC"


ARTICLES
<http://nature.example.com/Article2>
<http://purl.org/dc/elements/1.1/creator>
"Joe Bloggs"


ARTICLES
<http://nature.example.com/Article2>
<http://purl.org/dc/terms/references>
<http://nature.example.com/Article3>


7 rows selected.

1.11.3.2 Retrieving RDF Data from an RDF Graph

RDF data can be retrieved from an RDF graph using the SEM_MATCH table function (described in Using the SEM_MATCH Table Function to Query RDF Data), as shown in Example 1-111. The example assumes an RDF network named NET1 owned by a database user named RDFUSER.

Example 1-111 Retrieving RDF Data from an RDF Graph

--
-- Retrieves graph, subject, predicate, and object
--
SQL> select to_char(g$rdfterm) graph, to_char(x$rdfterm) sub, to_char(p$rdfterm) pred, y$rdfterm obj from table(sem_match('Select ?g ?x ?p ?y  WHERE { { GRAPH ?g {?x ?p ?y} } UNION { ?x ?p ?y }}',sem_models('articles'),null,null,null,null,' STRICT_DEFAULT=T PLUS_RDFT=T ',null,null,'RDFUSER','NET1')); 

GRAPH
------------------------------------------------------------
SUB
------------------------------------------------------------
PRED
------------------------------------------------------------
OBJ
---------------------------------------------------------------------------
<http://examples.com/ns#Graph1>
_:m99g3C687474703A2F2F6578616D706C65732E636F6D2F6E73234772617068313Egmb2
<http://purl.org/dc/elements/1.1/creator>
_:m99g3C687474703A2F2F6578616D706C65732E636F6D2F6E73234772617068313Egmb1
 
<http://examples.com/ns#Graph1>
<http://nature.example.com/Article102>
<http://purl.org/dc/elements/1.1/creator>
_:m99g3C687474703A2F2F6578616D706C65732E636F6D2F6E73234772617068313Egmb1
 
<http://examples.com/ns#Graph1>
<http://nature.example.com/Article101>
<http://purl.org/dc/elements/1.1/creator>
"John Smith"
 
<http://nature.example.com/Article1>
<http://purl.org/dc/elements/1.1/creator>
"Jane Smith"

1.11.3.3 Removing RDF Graph Information from Retrieved Blank Node Identifiers

Blank node identifiers retrieved during the retrieval of RDF data can be trimmed to remove the occurrence of RDF graph information using the transformations shown in the code excerpt in Example 1-112, which are applicable to VARCHAR2 (for example, subject component) and CLOB (for example, object component) data, respectively.

Example 1-113 shows the results obtained after using these two transformations in Example 1-112 on the sub and obj columns, respectively, using the RDF data retrieval query described in Retrieving RDF Data from an RDF Graph.

Example 1-112 Retrieving RDF Data from an Application Table

--
-- Transformation on column "sub VARCHAR2" 
-- holding blank node identifier values
--
Select (case substr(sub,1,2) when '_:' then '_:' || substr(sub,instr(sub,'m',1,2)+1) else sub end) from …
--
-- Transformation on column "obj CLOB" 
-- holding blank node identifier values
--
Select (case dbms_lob.substr(obj,2,1) when '_:' then to_clob('_:' || substr(obj,instr(obj,'m',1,2)+1)) else obj end) from …

Example 1-113 Results from Applying Transformations from Example 1-112

--
-- Results obtained by applying transformations on the sub and pred cols
-- 
SQL> select (case substr(sub,1,2) when '_:' then '_:' || substr(sub,instr(sub,'m',1,2)+1) else sub end) sub, pred, (case dbms_lob.substr(obj,2,1) when '_:' then to_clob('_:' || substr(obj,instr(obj,'m',1,2)+1)) else obj end) obj from (select to_char(g$rdfterm) graph, to_char(x$rdfterm) sub, to_char(p$rdfterm) pred, y$rdfterm obj from table(sem_match('Select ?g ?x ?p ?y  WHERE { { GRAPH ?g {?x ?p ?y} } UNION { ?x ?p ?y }}',sem_models('articles'),null,null,null,null,' STRICT_DEFAULT=T PLUS_RDFT=T ',null,null,'RDFUSER','NET1'));

 
SUB
------------------------------------------------------------
PRED
------------------------------------------------------------
OBJ
---------------------------------------------------------------------------
_:b2
<http://purl.org/dc/elements/1.1/creator>
_:b1
 
<http://nature.example.com/Article102>
<http://purl.org/dc/elements/1.1/creator>
_:b1

1.11.4 Exporting or Importing an RDF Network Using Oracle Data Pump

Effective with Oracle Database Release 12.1, you can export and import an RDF network using the full database export and import features of the Oracle Data Pump utility. The network is moved as part of the full database export or import, where the whole database is represented in an Oracle dump (.dmp) file.

The following usage notes apply to using Data Pump to export or import a RDF network:

  • The target database for an import must have the RDF Graph software installed, and there cannot be a pre-existing RDF network.

  • RDF networks using fine-grained access control (triple-level or resource-level OLS or VPD) cannot be exported or imported.

  • RDF document indexes for SEM_CONTAINS (MDSYS.SEMCONTEXT index type) and semantic indexes for SEM_RELATED (MDSYS.SEM_INDEXTYPE index type) must be dropped before an export and re-created after an import.

  • Only default privileges for RDF network objects (those that exist just after object creation) are preserved during export and import. For example, if user A creates an RDF graph M and grants SELECT on RDFM_M to user B, only user A's SELECT privilege on RDFM_M will be present after the import. User B will not have SELECT privilege on RDFM_M after the import. Instead, user B's SELECT privilege will have to be granted again.

  • The Data Pump command line option transform=oid:n must be used when exporting or importing RDF network data. For example, use a command in the following format:

    impdp system/<password-for-system> directory=dpump_dir dumpfile=rdf.dmp full=YES version=12 transform=oid:n
    

For Data Pump usage information and examples, see the relevant chapters in Part I of Oracle Database Utilities.

1.11.5 Moving, Restoring, and Appending an RDF Network

The SEM_APIS package includes utility procedures for transferring data into and out of an RDF network.

The contents of an RDF network can be moved to a staging schema. an RDF network in a staging schema can then be (1) exported with Oracle Data Pump or a similar tool, (2) appended to a different RDF network, or (3) restored back into the source RDF network. Move, restore and append operations mostly use partition exchange to move data rather than SQL inserts to copy data. Consequently, these operations are very efficient.

The procedures to move, restore, and append RDF network data are:

Special Considerations When Performing Move, Restore, and Append Operations

Move, restore, and append operations are not supported for RDF networks that use any of the following features:

  • Domain indexes on the RDF_VALUE$ table (for example, spatial indexes)
  • Oracle Label Security for RDF
  • Semantic indexing for documents
  • Incremental inference

Domain indexes and inferred graphs that use incremental inference should be dropped before moving the RDF network and then recreated after any subsequent restore or append operations.

Some restrictions apply to the target network used for an append operation.

  • The set of RDF terms in the target network must be a subset of the set of RDF terms in the source network.
  • The set of model (referred to as RDF graph) IDs used in the source and target RDF networks must be disjoint.
  • The set of entailment (referred to as inferred graph) IDs used in the source and target RDF networks must be disjoint.
  • The set of rulebase IDs used in the source and target RDF networks must be disjoint, with the exception of built-in rulebases such as OWL2RL.

Example 1-114 Moving and Exporting a Schema Private RDF Network

This first example uses Data Pump Export to export relevant network data to multiple .dmp files, so that the data can be imported into an RDF network in another database (as shown in the second example).

This example performs the following major actions.

  1. Creates a directory for a Data Pump Export operation.
  2. Creates a database user (RDFEXPIMPU) that will hold the output of the export of the RDF network.
  3. Moves the RDF network data to the RDFEXPIMPU schema.
  4. Uses Data Pump to export the moved RDF network data.
  5. Uses Data Pump to export any user tables referenced in RDF views.
  6. Optionally, restores the RDF network data in the current network.

Note that the example assumes that the schema-private network is named as NET1 and it is owned by RDFUSER. It also assumes that the tables EMP, WORKED_FOR, and DEPT, owned by RDFUSER, are used in the RDF view RDF graph(s) in the network.

conn sys/<password_for_sys> as sysdba;

-- create directory for datapump export
create directory dpump_dir as '<path_to_directory>';
grant read,write on directory dpump_dir to public;

-- create user to hold exported RDF network
grant connect, resource, unlimited tablespace to rdfexpimpu identified by <password_for_rdfexpimpu>;

-- connect as a privileged user to move the network
conn system/<password_for_system>
-- move RDF network data to RDFEXPIMPU schema
exec sem_apis.move_rdf_network_data(dest_schema=>'RDFEXPIMPU', network_owner=>'RDFUSER', network_name=>'NET1');

-- export moved network data with datapump
-- export rdfexpimpu schema
host expdp rdfexpimpu/<password_for_rdfexpimpu> DIRECTORY=dpump_dir DUMPFILE=expuser.dmp version=12.2 logfile=export_move_sem_network_data.log

-- export any user tables referenced in RDF Views
host expdp rdfuser/<password_for_rdfuser> tables=EMP,WORKED_FOR,DEPT DIRECTORY=dpump_dir DUMPFILE=exp_rdfviewtabs.dmp version=12.2 logfile=export_move_rdfview_tabs.log

-- optionally restore the network data or drop the source RDF network
exec sem_apis.restore_rdf_network_data(from_schema=>'RDFEXPIMPU', network_owner=>'RDFUSER', network_name=>'NET1');

Example 1-115 Importing and Appending a Schema Private RDF Network

This second example uses Data Pump Import to import relevant network data (from the first example), creates necessary database users, creates a new MDSYS-owned RDF network, and "appends" the imported network data into the newly created network.

This example performs the following major actions.

  1. Creates a database user (RDFEXPIMPU), if it does not already exist in the database, that will hold the output of the export of the RDF network.
  2. Uses Data Pump to import any RDF view component tables and previously moved RDF network data.
  3. Creates a new RDF network in which the imported data is to be appended.
  4. Appends the imported data into the newly created RDF network.
conn sys/<password_for_sys>

-- create a user to hold the imported RDF network data
grant connect, resource, unlimited tablespace to rdfexpimpu identified by <password_for_rdfexpimpu>;

-- create the network owner
grant connect, resource, unlimited tablespace to rdfuser identified by <password_for_rdfuser>;

conn system/<password_for_system>

-- import any RDF view component tables
host impdp rdfuser/<password_for_rdfuser> tables=EMP,WORKED_FOR,DEPT DIRECTORY=dpump_dir DUMPFILE=exp_rdfviewtabs.dmp version=12.2 logfile=import_append_rdfview_tabs.log

-- import the previously moved RDF network
host impdp rdfexpimpu/<password_for_rdfexpimpu> DIRECTORY=dpump_dir DUMPFILE=expuser.dmp version=12.2 logfile=import_append_atabs.log

-- create a new RDF network in which to append the imported one
exec sem_apis.create_rdf_network('rdf_tablespace', network_owner=>'RDFUSER', network_name=>'NET1');

-- append the imported RDF network
exec sem_apis.append_rdf_network_data(from_schema=>'RDFEXPIMPU', network_owner=>'RDFUSER', network_name=>'NET1');

1.11.6 Purging Unused Values

Deletion of triples over time may lead to a subset of the values in the RDF_VALUE$ table becoming unused in any of the RDF triples or rules currently in the RDF network. If the count of such unused values becomes large and a significant portion of the RDF_VALUE$ table, you may want to purge the unused values using the SEM_APIS.PURGE_UNUSED_VALUES subprogram.

Event traces for tasks performed during the purge operation may be recorded into the RDF$ET_TAB table, if present in the invoker's schema, as described in Recording Event Traces During Bulk Loading.