Extract Content from Excel

Design Studio has three steps for extracting content from a spreadsheet:

  • The Extract Cell step is used to extract text content from the found range.
  • The Extract Sheet name step is used to extract the sheet name of the sheet of the found range.
  • The Extract As HTML step is used to extract the found range of a spreadsheet as an HTML page containing a table with the cells of the range into a variable.

For the Extract Cell and Extract As HTML steps you can specify what to extract from the cells. This is controlled by the value of the Extract This option. The choice here is the same as the View Modes for the Spreadsheet View. The possible options are described in this topic.

Formatted Values

The extracted values are what you see in Excel and the values of dates and numbers are extracted formatted, which means that numbers may have fewer decimals than the actual values of the cells.

Plain Value

The extracted values are the actual values that Excel would show if the values of the cells were not formatted. For example, numbers would not have rounding of decimals.

Formulas

If a cell contains a formula, it is extracted or otherwise, it is the same value as for the Plain Values option is extracted.

If you create the steps by right-clicking the Spreadsheet View, the value of Extract This is set to the value of the selected View Mode. If you set the View Mode to Formulas and then right-click in the page view and select Extract > Extract Text from the context menu (into a text variable), the Extract This option of the Extract Cell action step is set to Formulas.

You may need to reformat (or normalize) the extracted content, and the Extract Cell action allows you to do this by configuring a list of data converters.

To do so, in the Spreadsheet view, right-click to create a step. Select the desired extract step and specify necessary parameters.

Shared formulas in Excel files

The built-in Excel driver does not support documents with shared formulas. A shared formula is a cell with a formula that is automatically copied to other cells. Any operation that changes the structure of an Excel document containing shared formulas, such as adding or removing rows, may lead to errors in this document.

This limitation is only observed in Excel documents created outside Design Studio. An Excel file created with a robot cannot contain shared formulas.

Workaround: Ensure that your Excel document does not contain any shared formulas. When copying a formula cell to many cells, do not copy it to multiple cells at once; instead, copy the formula to one cell at a time.

Alternatively, you can use the convertSharedFormulas.snippet file included in the Snippets folder of your Kofax RPA installation to convert any shared formulas in your Excel document. The snippet performs these steps:

  1. Takes an Excel document containing shared cells.
  2. Loops over all formula cells.
  3. For each of these cells, extracts the formula and set it on the cell again.