About Data Labeling

Find out about Data Labeling, what it is and how to use it.

Data labeling is the process of identifying properties (labels) of documents, text, and images (records), and annotating (labeling) them with those properties. The topic of a news article, the sentiment of a tweet, the caption of an image, important words spoken in an audio recording, the genre of a video are all examples of a data label.

Many machine learning techniques require labeled data before they can be used to train machines to complete an autonomous task. Data labeling is thus an integral part of an Artificial Intelligence (AI) or Machine Learning (ML) project.

Data Labeling lets you create and browse datasets, view data records (documents, text, and images), and apply labels to build AI/ML models. Datasets can be exported as line-delimited JSON for use in machine learning model development. Datasets are accessible and interoperable across other Data and AI services to support supervised training. For example, Oracle Cloud Infrastructure Language can be used to create specialized models, but only if labeled data is available for training the model. Data Labeling lets you quickly start labeling raw datasets with a minimal number of configuration steps. Thus, it also provides the data labeling experience for Oracle Cloud Infrastructure AI services.

More information on the supported file types and content types for documents, text, and images is given in Supported File Formats.

Datasets are the core resource available in Data Labeling. They consist of data records and their associated labels. Data records represent a document, a single image, or a piece of text. Labels are strings of text, which become annotations when associated with a data record. Annotations have other associated data, for example, with object detection, bounding box coordinates. Data records can exist without an annotation. Datasets can be exported as a JSON manifest for use as an input to machine learning model development.

Watch a video introduction to the service..
To use Data Labeling:
  1. Set it up, including creating buckets in Object Storage and setting up your user policies.
  2. Create a dataset.
  3. Generate records in your dataset.
  4. Add labels to your documents, images, or pieces of text.
  5. Export the dataset to Object Storage for use elsewhere.