About exporting project data

You can export data from a project view and from a Results Table component. An export creates a file based on your data that you can either store in HDFS or download to your computer.

The exported data is based on the current refinement state of your project, so it will not include attributes that have been filtered out or hidden. You will be able to preview a small sample of the exported data, as well as a list of attributes that will not be included, before you export.

When you export project data, you must specify whether you want the file exported to HDFS or your computer. If you export to HDFS, you must specify a directory to save the file to. If you export to your computer, the file is automatically downloaded once it has been generated.

You must also select the type of file your data is exported to. This can be one of the following:
  • Delimited
  • Avro
  • A Hive table. This option is only available when you export to HDFS.

Note:

The exported data uses attribute display names by default. However, if you are exporting to an Avro file, any attribute display names that contain invalid Avro characters will be replaced with their attribute keys.

Transposing attributes

If the data set you're exporting contains multi-value attributes, you can have some or all of them transposed. Transposed attributes are replaced with sets of Boolean attributes representing their possible values.

Note:

You cannot transpose attributes when exporting to an Avro file.
For example, consider a multi-value attribute named Color that has the unique values "Red", "Blue", and "Green". If the Color attribute were transposed, it would be replaced in the exported data by the Boolean attributes Red, Blue, and Green. A record that originally had its Color attribute set to "Red, Blue" would now contain the following:
  • A Red attribute set to "True"
  • A Green attribute set to "False"
  • A Blue attribute set to "True"

Because multi-value attributes can have numerous possible values, you can specify a limit for the number of new columns added for each transposed attribute. Columns will only be added for the top values until the limit is reached.

Problems viewing non-ASCII attribute values

If the data set you export contains non-ASCII attributes, the non-ASCII attribute values may be corrupt when you open the CSV file. For example, this data corruption problem occurs with Chinese, Japanese, and Korean attribute values and other non-ASCII language encodings.

To work around this issue:
  1. Export the data set from Studio.
  2. Instead of opening the CSV file, save the file locally.
  3. Start Microsoft Excel and import the CSV file with UTF-8 encoding. You find this option in Excel under Data > Get External Data From Text > Import and then select UTF-8 in the import wizard.
After import, the attribute values render correctly.