3.2.1 Load Data
Access the data set from the SH Schema and explore the data to understand the attributes.
Access Data
You will be using the CUSTOMERS and SUPPLEMENTARY_DEMOGRAPHICS tables available in the SH schema.
See SH.CUSTOMERS for information about the CUSTOMERS table in SH Schema.
The following table displays information about the attributes from SUPPLEMENTARY_DEMOGRAPHICS:
| Attribute Name | Data Type | Information |
|---|---|---|
| CUST_ID | Numeric | The ID of the customer |
| EDUCATION | Character | Education level attained |
| OCCUPATION | Character | Occupation of the customer |
| HOUSEHOLD_SIZE | Character | Number of people living at residence |
| YRS_RESIDENCE | Numeric | Number of years customer lived at current residence |
| AFFINITY_CARD | Character | Indicates whether the customer holds an affinity card.
1 means Yes. 0 means No. |
| BULK_PACK_DISKETTES | Character | Product. Indicates whether the customer purchased the bulk pack diskettes.
1 means Yes. 0 means No. |
| FLAT_PANEL_MONITOR | Character | Product. Indicates whether the customer purchased flat panel monitor.
1 means Yes. 0 means No |
| HOME_THEATER_PACKAGE | Character | Product. Indicates whether the customer purchased home theatre package.
1 means Yes. 0 means No |
| BOOKKEEPING_APPLICATION | Character | Product. Indicates whether the customer purchased bookkeeping application.
1 means Yes. 0 means No |
| PRINTER_SUPPLIES | Character | Product. Indicates whether the customer purchased printer supplies.
1 means Yes. 0 means No |
| Y_BOX_GAMES | Character | Product. Indicates whether the customer purchased YBox Games.
1 means Yes. 0 means No |
| OS_DOC_SET_KANJI | Character | Product. Indicates whether the customer purchased the Kanji character set for the operating system documentation.
1 means Yes. 0 means No |
| COMMENTS | Character | Comments from customers |
To access database data from R using OML4R, you must first create a proxy object in R that represents a database table, view, or query. In this example, the proxy object is created using a query. Create proxy objects for SUPPLEMENTARY_DEMOGRAPHICS and CUSTOMERS and then merge them by inner join on a key column, in this case, CUST_ID. Assess the data to identify data types and data quality issues. Look for missing values, outlier numeric values, or inconsistently labeled categorical values.
- Run the following command in an R interpreter paragraph (using %r) in an OML notebook (or similar notebook environment) to import the Oracle Machine Learning for R libraries and suppress warnings regarding row ordering. Alternatively, this code can be run from the R command line or tools like RStudio.
library(ORE) options(ore.warn.order=FALSE) - Use the
ore.syncfunction to create theore.frameobject that is a proxy for theCUSTOMERStable in theSHschema database table.
- Use the
ore.syncfunction to create theore.frameobject that is a proxy for theSUPPLEMENTARY DEMOGRAPHICStable in theSHschema database table.ore.sync(query = c("SUPPLEMENTARY_DEMOGRAPHICS" = "select CUST_ID, HOUSEHOLD_SIZE, YRS_RESIDENCE, TO_CHAR(Y_BOX_GAMES) Y_BOX_GAMES from SH.SUPPLEMENTARY_DEMOGRAPHICS")) # The TO_CHAR function is used to have Y_BOX_GAMES treated as a categorical variable, not a numeric variable. z.show(head(SUPPLEMENTARY_DEMOGRAPHICS))
Parent topic: Clustering Use Case