Tag Finders

A Tag Finder is used to find a tag on an HTML/XML page. Tag Finders are used in steps, where they define how to find the tag(s) to which the step should be applied. The list of Tag Finders of the current step is located in the "Finders" tab in the Step View. Steps that work on spreadsheet content use Range Finders rather than Tag Finders.

Understanding Tag Paths

In understanding how to use Tag Finders, the concept of a tag path is important. A tag path is a compact text representation of where some tag is located on a page. Consider this tag path:

This tag path refers to an <a>-tag inside a <div>-tag inside a <body>-tag inside an <html>-tag.

html.body.div.a

A tag path can match more than one tag on the same page. For example, the above tag path will match all of the <a>-tags on this page, except the third one:

<html>
  <body>
    <div>
      <a href="url...">Link 1</a>
      <a href="url...">Link 2</a>
    </div>
    <p>
      <a href="url...">Link 3</a>
    </p>
    <div>
      <a href="url...">Link 4</a>
      <a href="url...">Link 5</a>
      <a href="url...">Link 6</a>
    </div>
  </body>
</html>

You can use indexes to refer to specific tags among tags of the same type at that level. Consider this tag path:

html.body.div[1].a[0]

This tag path refers to the first <a>-tag in the second <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the "Link 4" <a>-tag. Note that indexes start from 0. If no index is specified for a given tag on a tag path, the path matches any tag of that type at that level, as we saw in the first tag path above. If the index is negative, the matching tags are counted backwards, i.e. starting with the last matching tag which corresponds to index -1. Consider this tag path:

html.body.div[-1].a[-2]

This tag path refers to the second-to-last <a>-tag in the last <div>-tag in a <body>-tag inside an <html>-tag. So, on the page above, this tag path would only match the "Link 5" <a>-tag.

You can use an asterisk ('*') to mean any number of tags of any type. For example, the tag path

html.*.p|div|td.a

This tag path refers to an <a> tag inside a <p>-, <div>-, or <td>-tag located anywhere inside an <html> tag.

In a tag path, text on a page is referred to just as any other tag, using the keyword "text". Although text is not technically a tag, it is treated and viewed as such in a tag path. For example, consider this HTML:

<html>
  <body>
    <a href="url...">Link 1</a>
    <a href="url...">Link 2</a>
  </body>
</html>

The tag path "html.body.a[1].text" would refer to the text "Link 2".

Tag Finder Properties

A Tag Finder can be configured using the following properties.

Find Where

In this property, you can specify where to find the tag relative to a named tag. The default value is "Anywhere in Page", meaning that named tags are not used to find the tag.

Tag Path

In this property, you can specify the tag path as described in the previous section. The tag path can be specified in several ways using the Value Selector.

Attribute Name

In this property, you can specify that the tag must have a specific attribute, for example "align".

Attribute Value

In this property, you can specify that the tag must have an attribute with a specific value. If the Attribute Name property is set, the attribute value is bound to that specific attribute name.

These values are case-sensitive.

  • "Equals Text" specifies that the attribute value must match a specified text. Note that the text must match the entire attribute value.
  • "Contains Text" specifies that the attribute value must contain a specified text.
  • "Starts With Text" specifies that the attribute value must start with a specified text.
  • "Ends With Text" specifies that the attribute value must end with a specified text.
  • "Matches Pattern" specifies that the attribute value must match a specified pattern. Note that the pattern must match the entire attribute value.
  • "Does Not Equal Text" specifies that the attribute value must not be equal to a specified text.
  • "Does Not Contain Text" specifies that the attribute value must not contain a specified text.
  • "Does Not Start With Text" specifies that the attribute value must not start with a specified text.
  • "Does Not End With Text" specifies that the attribute value must not end with a specified text.
  • "Does Not Match Pattern" specifies that the attribute value must not match a specified pattern.
Tag Pattern

In this property, you can specify a pattern that the tag must match (including all tags inside it), for example ".*.*Stock Quotes.*.*". Some caution should be observed in using this property, as it can have considerable impact on the performance of you robot. This is because the "Tag Pattern" may be applied many times throughout a page just to find the one tag that it matches. One way to try and avoid this is to choose "Text Only" for the "Match Against" property.

Match Against

In this property, you can specify that the "Tag Pattern" should match only the text or the entire HTML of the tag. The default is to match only the text because this is normally much faster.

Tag Depth

This property determines which tag to use if matching tags are contained inside each other. The default value is "Any Depth" which accepts all matching tags. If you select "Outermost Tag", only the outermost tags are accepted, and similarly, if you select "Innermost Tag", only the innermost tags are accepted.

Tag Number

This property determines which tag to use if more than one tag matches the tag path and the other criteria. You specify the number of the tag to use, either counting forwards from the first tag or counting backwards from the last tag that matches.

Example

As an example, if you set the Tag Path property to "table", the Attribute Name property to "align", the Attribute Value property to Fixed Text where the text must be "center", and the Tag Pattern property to ".*Business News.*", then the Tag Finder would locate the first <table>-tag that is center aligned and that contains the text "Business News".