1 What Is Machine Learning?
Orientation to machine learning technology.
Note:
Information about machine learning is widely available. No matter what your level of expertise, you can find helpful books and articles on machine learning.- What Is Machine Learning?
Machine learning is a technique that discovers previously unknown relationships in data. - What Can Machine Learning Do and Not Do?
Machine learning uncovers hidden data patterns but requires user knowledge to interpret and apply the insights effectively for business decisions. - The Oracle Machine Learning Process
The machine learning process is iterative, involving phases from defining business goals to deploying solutions and refining models based on new insights.
Related Topics
Parent topic: Introductions
1.1 What Is Machine Learning?
Machine learning is a technique that discovers previously unknown relationships in data.
Machine learning and AI are often discussed together. An important distinction is that although all machine learning is AI, not all AI is machine learning. Machine learning automatically searches potentially large stores of data to discover patterns and trends that go beyond simple statistical analysis. Machine learning uses sophisticated algorithms that identify patterns in data creating models. Those models can be used to make predictions and forecasts, and categorize data.
The key features of machine learning are:
-
Automatic discovery of patterns
-
Prediction of likely outcomes
-
Creation of actionable information
-
Ability to analyze potentially large volumes of data
Machine learning can answer questions that cannot be addressed through traditional deductive query and reporting techniques.
- Automatic Discovery
Machine learning models automatically discover patterns in data and can generalize these patterns to new data through a process known as scoring. - Prediction
Machine learning models predict outcomes based on data, providing probabilities and confidence levels for each prediction. - Grouping
Machine learning models identify natural groupings in data, segmenting populations based on shared characteristics. - Actionable Information
Machine learning derives actionable information from data, helping you to make informed decisions based on model predictions and segmentation. - Machine Learning and Statistics
Machine learning uses algorithms with fewer assumptions about data than traditional statistics, enabling automation and robust model creation with minimal user intervention. - Oracle Machine Learning and Analytic Views
Analytic views and Oracle Machine Learning complement each other by combining dimensional data organization with inductive inference for comprehensive data analysis. - Oracle Machine Learning and Data Warehousing
Data warehousing supports machine learning by facilitating data cleansing and preparation, ensuring the data is suitable for solving specific problems.
Parent topic: What Is Machine Learning?
1.1.1 Automatic Discovery
Machine learning models automatically discover patterns in data and can generalize these patterns to new data through a process known as scoring.
Machine learning is performed by a model that uses an algorithm to act on a set of data. Machine learning models can be used to mine the data on which they are built, but most types of models are generalizable to new data. The process of applying a model to new data is known as scoring.
Parent topic: What Is Machine Learning?
1.1.2 Prediction
Machine learning models predict outcomes based on data, providing probabilities and confidence levels for each prediction.
Many forms of machine learning are predictive. For example, a model can predict income based on education and other demographic factors. Predictions have an associated probability (How likely is this prediction to be true?). Prediction probabilities are also known as confidence (How confident can I be of this prediction?).
Some forms of predictive machine learning generate rules, which are conditions that imply a given outcome. For example, a rule can specify that a person who has a bachelor's degree and lives in a certain neighborhood is likely to have an income greater than the regional average. Rules have an associated support (What percentage of the population satisfies the rule?).
Parent topic: What Is Machine Learning?
1.1.3 Grouping
Machine learning models identify natural groupings in data, segmenting populations based on shared characteristics.
Other forms of machine learning identify natural groupings in the data. For example, a model might identify the segment of the population that has an income within a specified range, that has a good driving record, and that leases a new car on a yearly basis.
Parent topic: What Is Machine Learning?
1.1.4 Actionable Information
Machine learning derives actionable information from data, helping you to make informed decisions based on model predictions and segmentation.
Machine learning can derive actionable information from large volumes of data. For example, a town planner might use a model that predicts income based on demographics to develop a plan for low-income housing. A car leasing agency might use a model that identifies customer segments to design a promotion targeting high-value customers.
Parent topic: What Is Machine Learning?
1.1.5 Machine Learning and Statistics
Machine learning uses algorithms with fewer assumptions about data than traditional statistics, enabling automation and robust model creation with minimal user intervention.
There is a great deal of overlap between machine learning and statistics. In fact most of the techniques used in machine learning can be placed in a statistical framework. However, machine learning techniques are not the same as traditional statistical techniques.
Statistical models usually make strong assumptions about the data and, based on those assumptions, they make strong statements about the results. However, if the assumptions are flawed, the validity of the model becomes questionable. By contrast, the machine learning methods typically make weak assumptions about the data. As a result, machine learning cannot generally make such strong statements about the results. Yet machine learning can produce very good results regardless of the data.
Traditional statistical methods, in general, require a great deal of user interaction in order to validate the correctness of a model. As a result, statistical methods can be difficult to automate. Statistical methods rely on testing hypotheses or finding correlations based on smaller, representative samples of a larger population.
Less user interaction and less knowledge of the data is required for machine learning. The user does not need to massage the data to guarantee that a method is valid for a given data set. Oracle Machine Learning techniques are easier to automate than traditional statistical techniques.
Parent topic: What Is Machine Learning?
1.1.6 Oracle Machine Learning and Analytic Views
Analytic views and Oracle Machine Learning complement each other by combining dimensional data organization with inductive inference for comprehensive data analysis.
Analytic views organize data using a dimensional model. Analytic views provide a fast and efficient way to create analytic queries of data stored in existing database tables and views. Analytic Views and Oracle Machine Learning are complementary activities.
An analytic view includes navigation, join, aggregation, and calculation rules, thus eliminating the need to include these rules in queries. However, analytic views do not have inductive inference capabilities. Inductive inference, the process of reaching a general conclusion from specific examples, is a characteristic of machine learning. Inductive inference is also known as computational learning.
Analytic views provide a multidimensional view of the data, including support for hierarchies, and analytic view objects. From a business perspective, analytic views offer a way to present the data.
Oracle Machine Learning and analytic views can be used together in a number of ways. Analytic views can be used to analyze machine learning results at different levels of granularity. Machine learning can help you construct more interesting and useful analytic view. For example, the results of predictive machine learning can be added as custom measures to an analytic view or to suggest important attributes. Such measures can provide information such as "likely to default" or "likely to buy" for each customer. Analytic views can then aggregate and summarize the probabilities.
Parent topic: What Is Machine Learning?
1.1.7 Oracle Machine Learning and Data Warehousing
Data warehousing supports machine learning by facilitating data cleansing and preparation, ensuring the data is suitable for solving specific problems.
Data can be mined whether it is stored in flat files, spreadsheets, database tables, or some other storage format. The important criteria for the data is not the storage format, but its applicability to the problem to be solved.
Proper data cleansing and preparation are very important for machine learning, and a data warehouse can facilitate these activities. However, a data warehouse is of no use if it does not contain the data you need to solve your problem.
Parent topic: What Is Machine Learning?
1.2 What Can Machine Learning Do and Not Do?
Machine learning uncovers hidden data patterns but requires user knowledge to interpret and apply the insights effectively for business decisions.
Machine learning is a powerful tool that can help you find patterns and relationships within your data. But machine learning does not work by itself. It does not eliminate the need to know your business, to understand your data, or to understand analytical methods. Machine learning discovers hidden information in your data, but it cannot tell you the value of the information to your organization.
You might already be aware of important patterns as a result of working with your data over time. Machine learning can confirm or qualify such empirical observations in addition to finding new patterns that are not immediately discernible through simple observation.
It is important to remember that the predictive relationships discovered through machine learning are not causal relationships. For example, machine learning might determine that males with incomes between $50,000 and $65,000 who subscribe to certain magazines are likely to buy a given product. You can use this information to help you develop a marketing strategy. However, you must not assume that the population identified through machine learning buys the product because they belong to this population.
Machine learning yields probabilities, not exact answers. It is important to keep in mind that rare events can happen; they do not happen very often.
- Asking the Right Questions
Formulating precise questions is essential for meaningful machine learning results, as the discovered patterns depend on how the problem is framed. - Understanding Your Data
Understanding your data is crucial for machine learning, as algorithms are sensitive to data characteristics like outliers, irrelevant columns, and data coding.
Parent topic: What Is Machine Learning?
1.2.1 Asking the Right Questions
Formulating precise questions is essential for meaningful machine learning results, as the discovered patterns depend on how the problem is framed.
Machine learning does not automatically discover information without guidance. The patterns you find through machine learning are very different depending on how you formulate the problem.
To obtain meaningful results, you must learn how to ask the right questions. For example, rather than trying to learn how to "improve the response to a direct mail solicitation," you might try to find the characteristics of people who have responded to your solicitations in the past.
Parent topic: What Can Machine Learning Do and Not Do?
1.2.2 Understanding Your Data
Understanding your data is crucial for machine learning, as algorithms are sensitive to data characteristics like outliers, irrelevant columns, and data coding.
To ensure meaningful machine learning results, you must understand your data. Machine learning algorithms are often sensitive to specific characteristics of the data: outliers (data values that are very different from the typical values in your database), irrelevant columns, columns that vary together (such as age and date of birth), data coding, and data that you choose to include or exclude. Oracle Machine Learning can automatically perform much of the data preparation required by the algorithm. But some of the data preparation is typically specific to the domain or the machine learning problem. At any rate, you need to understand the data that was used to build the model to properly interpret the results when the model is applied.
Parent topic: What Can Machine Learning Do and Not Do?
1.3 The Oracle Machine Learning Process
The machine learning process is iterative, involving phases from defining business goals to deploying solutions and refining models based on new insights.
The following figure illustrates the phases, and the iterative nature, of a machine learning project. The process flow shows that a machine learning project does not stop when a particular solution is deployed. The results trigger new business questions, which in turn can be used to develop more focused models.
Figure 1-1 The Oracle Machine Learning Process
Description of "Figure 1-1 The Oracle Machine Learning Process"
- Define Business Goals
The first phase of machine learning process is to define business objectives. This initial phase of a project focuses on understanding the project objectives and requirements. - Understand Data
The data understanding phase involves data collection and exploration which includes loading the data and analyzing the data for your business problem. - Prepare Data
The preparation phase involves finalizing the data and covers all the tasks involved in making the data in a format that you can use to build the model. - Develop Models
In this phase, you select and apply various modeling techniques and tune the algorithm parameters, called hyperparameters, to desired values. - Evaluate
At this stage of the project, it is time to evaluate how well the model satisfies the originally-stated business goal. - Deploy
Deployment is the use of machine learning within a target environment. In the deployment phase, one can derive data driven insights and actionable information.
Parent topic: What Is Machine Learning?
1.3.1 Define Business Goals
The first phase of machine learning process is to define business objectives. This initial phase of a project focuses on understanding the project objectives and requirements.
Once you have specified the problem from a business perspective, you can formulate it as a machine learning problem and develop a preliminary implementation plan. Identify success criteria to determine if the machine learning results meet the business goals defined. For example, your business problem might be: "How can I sell more of my product to customers?" You might translate this into a machine learning problem such as: "Which customers are most likely to purchase the product?" A model that predicts who is most likely to purchase the product is typically built on data that describes the customers who have purchased the product in the past.
To summarize, in this phase, you will:
- Specify objectives
- Determine machine learning goals
- Define success criteria
- Produce project plan
Parent topic: The Oracle Machine Learning Process
1.3.2 Understand Data
The data understanding phase involves data collection and exploration which includes loading the data and analyzing the data for your business problem.
- Is the data complete?
- Are there missing values in the data?
- What types of errors exist in the data and how can they be corrected?
- Access and collect data
- Explore data
- Assess data quality
Parent topic: The Oracle Machine Learning Process
1.3.3 Prepare Data
The preparation phase involves finalizing the data and covers all the tasks involved in making the data in a format that you can use to build the model.
Data preparation tasks are likely to be performed multiple times, iteratively, and not in any prescribed order. Tasks can include column (attributes) selection as well as selection of rows in a table. You may create views to join data or materialize data as required, especially if data is collected from various sources. To cleanse the data, look for invalid values, foreign key values that don't exist in other tables, and missing and outlier values. To refine the data, you can apply transformations such as aggregations, normalization, generalization, and attribute constructions needed to address the machine learning problem. For example, you can transform a DATE_OF_BIRTH
column to AGE
; you can insert the median income in cases where the INCOME
column is null; you can filter out rows representing outliers in the data or filter columns that have too many missing or identical values.
Additionally you can add new computed attributes in an effort to tease information closer to the surface of the data. This process is referred as Feature Engineering. For example, rather than using the purchase amount, you can create a new attribute: "Number of Times Purchase Amount Exceeds $500 in a 12 month time period." Customers who frequently make large purchases can also be related to customers who respond or don't respond to an offer.
Note:
Oracle Machine Learning supports Automatic Data Preparation (ADP), which greatly simplifies the process of data preparation.
- Clean, join, and select data
- Transform data
- Engineer new features
Related Topics
Parent topic: The Oracle Machine Learning Process
1.3.4 Develop Models
In this phase, you select and apply various modeling techniques and tune the algorithm parameters, called hyperparameters, to desired values.
If the algorithm requires specific data transformations, then you need to step back to the previous phase to apply them to the data. For example, some algorithms allow only numeric columns such that string categorical data must be "exploded" using one-hot encoding prior to modeling. In preliminary model building, it often makes sense to start with a sample of the data since the full data set might contain millions or billions of rows. Getting a feel for how a given algorithm performs on a subset of data can help identify data quality issues and algorithm setting issues sooner in the process reducing time-to-initial-results and compute costs. For supervised learning problem, data is typically split into train (build) and test data sets using an 80-20% or 60-40% distribution. After splitting the data, build the model with the desired model settings. Use default settings or customize by changing the model setting values. Settings can be specified through OML's PL/SQL, R and Python APIs. Evaluate model quality through metrics appropriate for the technique. For example, use a confusion matrix, precision, and recall for classification models; RMSE for regression models; cluster similarity metrics for clustering models and so on.
Automated Machine Learning (AutoML) features may also be employed to streamline the iterative modeling process, including algorithm selection, attribute (feature) selection, and model tuning and selection.
- Explore different algorithms
- Build, evaluate, and tune models
Related Topics
Parent topic: The Oracle Machine Learning Process
1.3.5 Evaluate
At this stage of the project, it is time to evaluate how well the model satisfies the originally-stated business goal.
During this stage, you will determine how well the model meets your business objectives and success criteria. If the model is supposed to predict customers who are likely to purchase a product, then does it sufficiently differentiate between the two classes? Is there sufficient lift? Are the trade-offs shown in the confusion matrix acceptable? Can the model be improved by adding text data? Should transactional data such as purchases (market-basket data) be included? Should costs associated with false positives or false negatives be incorporated into the model?
It is useful to perform a thorough review of the process and determine if important tasks and steps are not overlooked. This step acts as a quality check based on which you can determine the next steps such as deploying the project or initiate further iterations, or test the project in a pre-production environment if the constraints permit.
- Review business objectives
- Assess results against success criteria
- Determine next steps
Parent topic: The Oracle Machine Learning Process
1.3.6 Deploy
Deployment is the use of machine learning within a target environment. In the deployment phase, one can derive data driven insights and actionable information.
Deployment can involve scoring (applying a model to new data), extracting model details (for example the rules of a decision tree), or integrating machine learning models within applications, data warehouse infrastructure, or query and reporting tools.
Because Oracle Machine Learning builds and applies machine learning models inside Oracle Database, the results are immediately available. Reporting tools and dashboards can easily display the results of machine learning. Additionally, machine learning supports scoring single cases or records at a time with dynamic, batch, or real-time scoring. Data can be scored and the results returned within a single database transaction. For example, a sales representative can run a model that predicts the likelihood of fraud within the context of an online sales transaction.
- Plan enterprise deployment
- Integrate models with application for business needs
- Monitor, refresh, retire, and archive models
- Report on model effectiveness
Related Topics
Parent topic: The Oracle Machine Learning Process