Refining your data quality before importing

You can explore your imported data to evaluate its quality and usability. You can then prepare your data to improve its quality before you import it into the application. For example, you can remove white space, filter your data, delete duplicate rows or cells, find and replace data, concatenate columns, and so on. You can refine the data from different sources to standardize the format for all the data. You can also use data wrangling as a data discovery tool and to bypass traditional extraction, transformation, and loading (ETL) methods.

On the Wrangle Your Data page, you work with sample data (a subset of your actual data) to build a script of functions that will clean and prepare your data, also known as data wrangling.

To build the script:

  1. Select data. This can be a data column or part of a cell.

  2. Select a function to apply to the selected data. For example, you can split the selection into multiple columns or change the data type.

  3. The function and data are added to the script.

  4. Repeat the steps above until you have refined and improved your data.

  5. You can view all the changes that you made to your data, by scrolling through the script. To undo a function and all the functions applied after it, click the function in the History Script area. Similarly, you can redo functions by clicking a function in the History Script area. All undone functions applied before the selected function are re-applied.

  6. Save the completed script.

The script is saved with the dataset and therefore in the dashboard. The completed script is applied to your data when you publish the cube. You can also export the script, to save the functions in order. You can then import the saved script to apply to another set of data. Steps for both are included in the procedure.

You can:

If you republish your data, the data wrangling steps are applied to the new data. For example, if you changed a text column to all uppercase, the same text column will be displayed as uppercase in the republished data. For background information on republish and refreshing your data, see Republishing imported data.

Prerequisites

To refine your data before importing it:

  1. On the Preview page, click Wrangle. The Wrangle Your Data page opens, displaying a sample of your dataset.

  2. You can import a saved script of functions to apply to your data. To import a script, follow the steps below:

  1. Click Import in the History Script area. The Import History Script dialog box opens.

  2. Do one of the following:

  3. To import a saved script, click Browse.

  4. Copy a script into the interface.

  1. Click Apply. The functions are applied to your data sample and you are returned to the Wrangle Your Data page.

  2. If you do not want to make any additional changes to the data, click OK to apply the script, as displayed in the History Script area, to your actual data. You are returned to the Preview page.

  1. Select a data column or part of a cell to apply a function to. A list of suggested functions is displayed, based on the selected data.

  2. Select a function to apply to the selected data, from either the suggested functions or the Select Function drop-down list.

The sample data is updated to preview the effect of the function.

  1. To work with a smaller subset of data, filter the data using the following steps:

  1. Select the data to work with.

  2. Select one of the following from the Select Function drop-down list:

  3. To filter text by selecting values from the data, select Text Facet. A list of the text values in the selected data is displayed. To filter the selection, click the text value to display in the selection. You can instead display all values except the selected value, by clicking Exclude. This option can be helpful when you do not know the exact values in your data.

  4. To filter text by typing values, select Text Filter. Type the text to filter by.

  5. To filter numeric data, select Numeric Facet. A graph is displayed of the numbers in the selected data. Move the start point and end point of the graph to filter the range of values to display in the selection.

  6. To filter data consisting of dates, select Timeline Facet. A graph is displayed of the dates in the selected data. Move the start point and end point of the graph to filter the range of dates to display in the selection.

Rows are not deleted when you use a facet function; they are only hidden from the data preview.

  1. You can apply functions to the filtered data, as described above.

  2. To undo a specific filter, click the arrow icon in the title of the filter, and select Reset.

  3. To undo all filters, click the arrow icon of any filter title, and select Reset ALL Facets.

  4. To delete a specific filter, click the arrow icon in the title of the filter, and select Delete.

  5. To delete all filters, click the arrow icon of any filter title, and select Delete ALL Facets.

  1. To group similar data into a cluster and edit the entire cluster at once, complete the following steps:

  1. Select the data to work with.

  2. From the Select Function drop-down list, select Cluster and Edit. The Cluster and Edit dialog box opens. The selected data has been grouped into clusters of related data.

  3. To change the algorithm that determines the clusters, select a different algorithm from the Algorithm drop-down list.

  4. To replace all the cluster’s values with a new value, complete the following steps:

  1. Select the check box of the cluster.

  2. In the New Cell Value field, type the replacement value.

  1. Repeat the step above for each cluster to update.

  2. Do one of the following:

  3. To replace the values of the selected clusters and recluster the data, click Merge and Recluster. Repeat the steps above as needed.

  4. To replace the values of the selected clusters and continue refining data, click Merge and Close. You are returned to the Refine Your Data page.

  1. To edit data directly in a cell, complete the following steps:

  1. Hover your cursor over the cell to edit and click the Edit icon

  2. Make your changes in the dialog box that opens.

  3. You can choose to change the selected cell or the entire column. For example, if you select a cell that contains Books and change it to Book, you can choose to change only that cell. If you apply the change to all the cells, all the Books cells in the column are changed to Book. Select one of the following:

  4. To make the changes to the selected cell only, click Apply.

  5. To make the changes to all cells in the column of the selected cell, click Apply to All.

  1. To undo a function, click the function in the History Script area. All functions applied after the selected function are also undone.

  2. To redo a function, click the function in the History Script area. All undone functions applied before the selected function are also re-applied.

  3. To undo all the actions in the script, click Reset to Initial Data in the History Script area. Your data is returned to its original state. The history script is not deleted until you select another function, which causes a new script to begin.

  4. You can export the script, to save the functions in order. You can then import the saved script to apply to another set of data. To export the script, complete the following steps:

  1. Click Export in the History Script area. The Extract History Script dialog box opens, displaying the full script.

  2. Click Save to save the script into a file, to save the functions for later use.

  3. Click Close to return to the Wrangle Your Data page.

  1. When you have completed refining and cleaning your data, click OK to apply the script, as displayed in the History Script area, to your actual data. You are returned to the Preview page.

Related topics

Previewing your data and specifying data import options

About importing data

Best practices: Importing data

 

 

_____________________________