6.3.5 Create a Persistent Database Table from a Python Data Set

Use the oml.create function to create a persistent table in your database schema from data in your Python session.

The oml.create function creates a table in the database schema and returns an oml.DataFrame object that is a proxy for the table. The proxy oml.DataFrame object has the same name as the table.

Note:

When creating a table in Oracle Machine Learning for Python, if you use lowercase or mixed case for the name of the table, then you must use the same lowercase or mixed case name in double quotation marks when using the table in a SQL query or function. If, instead, you use an all uppercase name when creating the table, then the table name is case-insensitive: you can use uppercase, lowercase, or mixed case when using the table without using double quotation marks. The same is true for naming columns in a table.

You can delete the persistent table in a database schema with the oml.drop function.

Caution:

Use the oml.drop function to delete a persistent database table. Use the del statement to remove an oml.DataFrame proxy object and its associated temporary table; del does not delete a persistent table.

The syntax of the oml.create function is the following:

oml.create(x, table, oranumber=True, dbtypes=None, append=False)

The x argument is a pandas.DataFrame or a list of tuples of equal size that contain the data for the table. For a list of tuples, each tuple represents a row in the table and the column names are set to COL1, COL2, and so on. The table argument is a string that specifies a name for the table.

The SQL data types of the columns are determined by the following:

  • OML4Py determines default column types by looking at 20 random rows sampled from the table. For tables with less than 20 rows, it uses all rows in determining the column type.

    If the values in a column are all None, or if a column has inconsistent data types that are not None in the sampled rows, then a default column type cannot be determined and a ValueError is raised unless a SQL type for the column is specified by the dbtypes argument.

  • For numeric columns, the oranumber argument, which is a bool, determines the SQL data type. If True (the default), then the SQL data type is NUMBER. If False, then the data type is BINARY DOUBLE.

    If the data in x contains NaN values, then you should set oranumber to False.

  • For string columns, the default type is VARCHAR2(4000).

  • For bytes columns, the default type is BLOB.

With the dbtypes parameter, you can specify the SQL data types for the table columns. The values of dbtypes may be either a dict that maps str to str values or a list of str values. For a dict, the keys are the names of the columns. The dbtypes parameter is ignored if the append argument is True.

The append argument is a bool that specifies whether to append the x data to an existing table.

Example 6-13 Creating Database Tables from a Python Data Set

This example creates a cursor object for the database connection, creates a pandas.core.frame.DataFrame with columns of various data types, then creates a series of tables using different oml.create parameters and shows the SQL data types of the table columns.

import oml

# Create a cursor object for the current OML4Py database 
# connection to run queries and get information from the database.
cr = oml.cursor()

import pandas as pd

df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
                   'string' : [None, None, 'a', 'a', 'a', 'b'],
                   'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})

# Get the order of the columns
df.columns

# Create a table with the default parameters.
oml_df1 = oml.create(df, table = 'tbl1')

# Show the default SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
cr.fetchall()

# Create a table with oranumber set to False.
oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)

# Show the SQL data typea of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
cr.fetchall()

# Create a table with dbtypes specified as a dict mapping column names 
# to SQL data types.
oml_df3 = oml.create(df, table = 'tbl3', 
                     dbtypes = {'numeric': 'BINARY_DOUBLE',
                                'bytes':'RAW(1)'})

# Show the SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
cr.fetchall()

# Create a table with dbtypes specified as a list of SQL data types  
# matching the order of the columns.
oml_df4 = oml.create(df, table = 'tbl4', 
                     dbtypes = ['BINARY_DOUBLE','VARCHAR2','RAW(1)'])

# Show the SQL data type of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
cr.fetchall()

# Create a table from a list of tuples.
lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'), 
       (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
oml_df5 = oml.create(lst, table = 'tbl5',
                     dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])

# Close the cursor
cr.close()

# Drop the tables.
oml.drop('tbl1')
oml.drop('tbl2')
oml.drop('tbl3')
oml.drop('tbl4')
oml.drop('tbl5')

Listing for This Example

>>> import oml
>>>
>>> # Create a cursor object for the current OML4Py database
... # connection to run queries and get information from the database.
... cr = oml.cursor()
>>> 
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
...                    'string' : [None, None, 'a', 'a', 'a', 'b'],
...                    'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
>>>
>>> # Get the order of the columns.
... df.columns
Index(['numeric', 'string', 'bytes'], dtype='object')
>>> 
>>> # Create a table with the default parameters.
... oml_df1 = oml.create(df, table = 'tbl1')
>>>
>>> # Show the default SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
>>> cr.fetchall()
[('NUMBER',), ('VARCHAR2',), ('BLOB',)]
>>> 
>>> # Create a table with oranumber set to False.
... oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
>>>
>>> # Show the SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('BLOB',)]
>>> 
>>> # Create a table with dbtypes specified as a dict mapping column names 
... # to SQL data types.
... oml_df3 = oml.create(df, table = 'tbl3', 
...                      dbtypes = {'numeric': 'BINARY_DOUBLE',
...                                 'bytes':'RAW(1)'})
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('RAW',)]
>>> 
>>> # Create a table with dbtypes specified as a list of SQL data types 
... # matching the order of the columns.
... oml_df4 = oml.create(df, table = 'tbl4', 
...                      dbtypes = ['BINARY_DOUBLE','CHAR(1)', 'RAW(1)'])
>>>
>>> # Show the SQL data type of the columns
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>> 
>>> # Create a table from a list of tuples.
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'), 
...        (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>> oml_df5 = oml.create(lst, table ='tbl5', 
...                      dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>> 
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl5'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Close the cursor.
... cr.close()
>>>
>>> # Drop the tables
... oml.drop('tbl1')
>>> oml.drop('tbl2')
>>> oml.drop('tbl3')
>>> oml.drop('tbl4')
>>> oml.drop('tbl5')