7.1.1 About Preparing Data in the Database

OML4Py data type classes have methods that enable you to use Python to prepare database data for analysis.

You can perform data preparation operations on large quantities of data in the database and then continue operating on that data in-database or pull a subset of the results to your local Python session where, for example, you can use third-party Python packages to perform other operations.

The following table lists methods with which you can perform common data preparation tasks and indicates whether the OML4Py data type class supports the method.

Table 7-1 Methods Supported by Data Types

Method Description oml.Boolean oml.Bytes oml.Float oml.String oml.DataFrame
append Appends another oml data object of the same class to an oml object. Yes Yes Yes Yes Yes
ceil Computes the ceiling of each element in an oml.Float series data object. No No Yes No No
concat Combines an oml data object column-wise with one or more other data objects. Yes Yes Yes Yes Yes
count_pattern Counts the number of occurrences of a pattern in each string. No No No Yes No
create_view Creates an Oracle Database view for the data represented by the OML4Py data object. No No No No Yes
dot Calculates the inner product of the current oml.Float object with another oml.Float, or does matrix multiplication with an oml.DataFrame. No No Yes No No
drop Drops specified columns in an oml.DataFrame. No No No No Yes
drop_duplicates Removes duplicated elements from an oml series data object or duplicated rows from an oml.DataFrame. Yes Yes Yes Yes Yes
dropna Removes missing elements from an oml series data object, or rows containing missing values from an oml.DataFrame. Yes Yes Yes Yes Yes
exp Computes element-wise e to the power of values in an oml.Float series data object. No No Yes No No
find Finds the lowest index in each string in which a substring is found that is greater than or equal to a start index. No No No Yes No
floor

Computes the floor of each element in an oml.Float series data object.

No No Yes No No
head Returns the first n elements of an oml series data object or the first n rows of an oml.DataFrame. Yes Yes Yes Yes Yes
KFold Splits the oml data object randomly into k consecutive folds. Yes Yes Yes Yes Yes
len Computes the length of each string in an oml.Bytes or oml.String series data object. No Yes No Yes No
log Calculates an element-wise logarithm, to the given base, of values in the oml.Float series data object. No No Yes No No
materialize Pushes the contents represented by an OML4Py proxy object (a view, a table, and so on) into a table in Oracle Database. No No No No Yes
merge Joins another oml.DataFrame to an oml.DataFrame. No No No No Yes
replace Replaces an existing value with another value. No No Yes Yes Yes
rename Renames columns of an oml.DataFrame. No No No No Yes
round Rounds oml.Float values to the specified decimal place. No No Yes No No
select_types Returns the subset of columns that are included or excluded based on their oml data type. No No No No Yes
split

Splits an oml data object randomly into multiple sets.

Yes Yes Yes Yes Yes
sqrt Computes the square root of each element in an oml.Float series data object. No No Yes No No
tail Returns the last n elements of an oml series data object or the last n rows of an oml.DataFrame. Yes Yes Yes Yes Yes