Crawling an Entire Site

  1. Add a step with the Load Page action that loads the main page.
  2. Add a new step and choose the Crawl Pages action.
  3. On the Rules tab, add a Crawling Rule that applies to all pages in the site, e.g. by specifying the domain that the pages belong to or by making a pattern that the URL should match. For these pages, the rule should specify "Crawl Entire Page" and "Output the Page".
  4. On the Rules tab, set the "For all Other Pages" property to "Do Not Crawl".
  5. After the step with the Crawl Pages action, add steps to handle each page, e.g. by extracting information into returned variables.