Store Data in Databases

This section explains how Kofax RPA database storage works.

Object Keys

The tables you create for a type in a database have a column for each of the attributes in your type, plus an additional 7 household fields, named: ObjectKey, RobotName, ExecutionId, FirstExtracted, LastExtracted, ExtractedInLastRun, and LastUpdated. The most important field is ObjectKey, as it is the primary key for the table.

Note The reason for the name "ObjectKey" is found in the terminology previously used in Kofax RPA. Previously, types and variables were called "objects." To adhere to the new terminology, "ObjectKey" should be called "ValueKey." Renaming it would cause quite a lot of backward compatibility problems, though, and therefore it has been allowed to keep its old name.

The ObjectKey for a type is what uniquely identifies values extracted from variables of that type when stored in a database. You have to figure out what uniquely identifies values of the type. If you are building a car repository, the VIN number may be enough to provide unique identification of each car. If you are collecting baseball results, you may need the year, team names, ballpark, and date to uniquely identify each match.

As you build the type you can select how the ObjectKey is going to be calculated. This is done by checking the "part of database key" option when creating a new attribute. For our car example, the VIN number would be the only attribute marked as part of the database key, for the baseball match example, the attributes year, team names, ballpark, and date would all be marked as part of database key.

The robot developer may also specify the key directly on the Store in Database action, to override the default algorithm defined on the type.

The attributes that are not part of database key are sometimes referred to as non-key fields. For example, the car might have a price attribute, but even if the price changed we would still consider it the same car.

Store in Database

Kofax RPA provides three actions for managing values in a database: Store in Database, Find in Database, Calculate Key, Delete from Database, Query Database, and Execute SQL. The Find and Delete actions are simple, but Store in Database does more than just store the value.

Store in Database may insert a new value into the table, or update an existing value that was previously stored. Here is a list of exactly what happens.

  1. When storing the value of some variable, the ObjectKey is calculated based on the variable's values of the attributes which in the variable's type are marked Part of Database Key. If the robot developer specifies a key on the action, this key is used instead.

  2. Using the calculated key, a check is made to see if the value already exists in the database.
  3. If the value does not exist, a new row is inserted into the database (under this ObjectKey).
  4. If the value already exists, it is updated, and all the non-key attributes are written to the table (under this ObjectKey).
Household fields

Whenever a value is inserted all the 7 household fields are updated. On update, only some fields change. The following table provides an overview.

Field

Description

Changed on

ObjectKey

The primary key for this value

Insert

RobotName

The name of the robot that stored this value.

Insert and Update

ExecutionId

The execution id for the robot execution that stored this value.

Insert and Update

FirstExtracted

The first time the value was stored.

Insert

LastExtracted

The last time the value was stored.

Insert and Update

LastUpdated

The date when the value was last updated.

Update

ExtractedInLastRun

If the value was extracted in the latest run (uses 'y' and 'n').

Insert and Update

After each robot execution (in which the robot used Store in Database), all values previously collected by this robot, but not stored during this run, will have ExtractedInLastRun set to "n" and LastUpdated set to "now", indicating that the value was not found on the website during the latest run.

Note If a value was found in the previous run, but no non-key fields have changed, then LastUpdated is not updated. However, if the value was not found in the previous run, but in a run prior to that, LastUpdated is updated even if the non-key fields have not changed. This means that the value was deleted from the site and then reappeared later.
Harvest Tables

The tables created by Kofax RPA are often referred to as harvest tables, as the robots are harvesting data into them.

To find out what information was available on a website the last time the robot was run, you can use the following SQL command:

SELECT * FROM table WHERE ExtractedInLastRun = 'y'

If you are running queries against a table at the same time as a robot is storing data into the table, the result is comprised of data from the previous run, mixed with whatever data the executing robot has stored so far. We recommend that you copy the data out of the harvest tables and into a different set of production tables, so you can run queries against a stable data set.

There are many solutions where robots are used to store data in a database, but most of them fall under one of the three scenarios listed in the following table.

Scenario

Description

Repository matching website (small data sets)

The idea is to have a repository that matches the items on a website 1-to-1.

The easiest way to accomplish this is have a truncated production table (deleting all rows) every time the robot is done executing, and then copy every record where ExtractedInLastRun='y' from the harvest table into this table. This works well for small data sets.

Repository matching website (large data sets)

Same as above, but the data set is too large to copy all data after every robot execution. Instead we want to update the production table after each robot execution, based on the changes that occur.

This is where the LastUpdated field comes in handy. All values that have been updated have a LastUpdated field value larger than the start time of the robot. You can get the start time from the database logging tables, or you can have the robot store it somewhere.

To detect deleted values, use the following command:

SELECT * FROM table WHERE LastUpdated > 'StartTime' AND ExtractedInLastRun = 'n'

To detect new values:

SELECT * FROM table WHERE LastUpdated > 'StartTime' AND ExtractedInLastRun = 'y' AND FirstExtracted > 'StartTime'

To detect updated values

SELECT * FROM table WHERE LastUpdated > 'StartTime' AND ExtractedInLastRun = 'y' AND FirstExtracted < 'StartTime'

Then update your Production table accordingly.

Historic data

The default setup allows you to see when a value was first extracted and when it was last updated, but you cannot see which run of the robot the value was found in.

In this case, you should copy all the data from your harvest table into another table after the robot run, but in your new table the ObjectKey should not be a primary key. Instead, create an extra column called RUN_ID and use it together with the ObjectKey to create a compound primary key. If you do not need a RUN_ID you could simply create an auto-incremented column and use that as the primary key of your secondary table. Truncate the harvest table before each run.

You do not have to copy all the household fields to your production table; only the ObjectKey is required for you to update your production tables

Concurrency Considerations

If you have multiple robots storing values of the same type to the same database, be aware of the following considerations.

  • Every time a value is stored, the RobotName column is updated. If you have two robots storing the same value (as identified by ObjectKey), only the last one will show after the robots are done executing.
  • If two robots store the same value at exactly the same time, you get an error. They both find that the value is not in the table and try to insert it, but only one of them will succeed. In most cases, the error can be ignored because it is the same value.
  • If you run the same robot twice at the same time and the robot stores data in a database, you break the way the ExtractedInLastRun column is used. When the first robot is done executing, it updates the ExtractedInLastRun to "n" for all values it has not stored. This includes all values stored by the second robot so far. Later when the second robot finishes, it sets ExtractedInLastRun to "n" for all values stored by the first robot, completely negating the first run.
Value Relations

The storage system does not provide an automated way of managing relations between values. If you have a value of type Person and one of type Address, and you want to link them, you have to maintain this link.

The easiest way to create a link is to have the ObjectKey of the Person value be a foreign key in the Address value that should be linked to this person.

If the ObjectKey is calculated automatically from the type, you can use the Calculate ObjectKey action to generate the key and assign it to each of the address values before you store them.

You should be careful when building robots with connections between stored values. If an error occurs when you store the Person value, make sure that no Address values are stored.

ObjetKey Caveats

If you are using MySQL or Oracle, review these important ObjectKeys rules.

  • On Oracle empty string is stored as null.
  • MySQL does not have millisecond precision in timestamps.

These two cases result in a potential loss of data when the data is stored in the database. The ObjectKey is calculated inside the robot based on the data in the given variable. If you later load the value from the database and try to recalculate the ObjectKey, the ObjectKey is different if data loss occurred in any of the attributes marked as part of database key.