Machine Learning Security Considerations
It is important to understand the following security considerations while providing access to administrators and users.
CIC Advisor users don’t have visibility to the following data:
- Data in source applications outside their access purview
- Training data in CIC Advisor
Furthermore, they don’t have access to personal information (PI) data, ML models, and cannot change model code. At no point are the models exposed to organizations that could change access or inject malicious adjustments. Additionally, no PI is used in training or testing.
However, some cautions unique to security in machine learning are in order and discussed below:
- The CIC Advisor administrator role is very powerful and therefore must be granted judiciously.
The CIC Advisor administrator role grants access to the CIC Advisor Administration application. This administration application gives CIC Advisor administrators access to the ML Workbench page. On the ML Workbench page, administrators can explore and see the models to be trained or retrained and determine which feature selections to enable or disable for each model. When a model is retrained, if new data has been added into the training set, it could cause current predictions to change. Therefore, granting access to administration application and ML Workbench page should be limited and restricted.
- Administrators should be cautious of input poisoning.
Data used in training shapes future predictions. Malicious or bad data can lead to bad future predictions. CIC Advisor administrators should be aware of the projects opted into the system and also aware of which projects are used for training the models that leads to prediction accuracy. Use security best practices such as Separation of Duty controls outlined in the Product/Service Feature Guide of Oracle CIC Advisor (Doc ID 114.2) on My Oracle Support to ensure that those choosing the projects for CIC Advisor, which will also be used for training, opt in their target data appropriately.
Unintended or misleading source data can affect outputs. CIC Advisor is delivered with multiple off-the-shelf Seed Models, which are trained with sample data. These are not ideal models to use, but they give your organization a good starting point for enabling the system, and to see a first round of predictions while you understand how to train with your data.
- Irrelevant features can precipitate confounding and spurious correlations.
It is important to understand how certain features affect your predictions or how your data is reflected in the feature set. For example, if you are an organization without costs, you may want to make sure no cost features are selected. To get a basic implementation with the models you can choose SeedModel customerData. This model will use the Seed Model features with your data. Therefore select only the relevant features applicable for your data.
- Data Privacy and Access Controls
The models are protected for data used in training, and users have no access to this data.
Users have access to the dashboard unless they are administrators (CIC Advisor administrator) which is role based permissions controlled by the client side. Since a regular user does not have access to the administration role (CIC Advisor administrator), they cannot poison the models by training it through introducing malicious scenarios.
Training and prediction is also controlled by administrators (CIC Advisor administrator) which enables controlled training and model executions.
- Membership Inference Attack (MIA) / Model robustness attack (MRA)
This is an inherent weakness in machine learning.
Machine learning is prone to new attack vectors such as the Membership Inference Attack (MIA) where the user of a ML model may be able to infer the training data. Similarly it also prone to the Model Robustness Attack (MRA) where the user of a ML model may be skew the inputs imperceptibly to cause large errors in prediction. For better security, CIC Advisor makes such attempts difficult by not exposing the model code or its hyperparameters. To further enhance the product for good privacy-preservation, continuous attempts are being made to have models learn from the training data, but do not have them memorize it and enabling defense mechanism such as, Regularization.
Additionally, models continuously enhance to be robust by multiple tests to ensure that the accuracy does not change significantly from the base line accuracy under various conditions.
They evolve with multiple trainings and testing on similar data but different scenarios and data points with simultaneous customer usage.
Last Published Tuesday, December 24, 2024