SiteOrbiter Help

Rules and Actions
SiteOrbiter has an ability to specify conditions that, if encountered at time of scan, enable automated actions to be taken. The act of specifying a condition is called a rule, and the SiteOrbiter user can specify what SiteOrbiter will do if it encounters the specific condition. This is called an action. Note that it is not required to specify an action for every rule. By default, SiteOrbiter will create a log entry for a positive "hit" on a specified condition. Actions allow you to extract contract from a web page via XPath, add tags to pages, record the URL in a file or do nothing. For example, by appending a URL to a file, a listening program which is tailing that file can then do something outside the sandbox of SiteOrbiter. Because SiteOrbiter runs in a security sandbox, SiteOrbiter cannot directly call a program itself.

To specify a new rule, you must have a Site defined and scanned initially. Once you complete an initial site scan, select the Rules tab in the main SiteOrbiter window. You will be presented with two choices: Add XQuery rule, or Add HTTP rule. An XQuery rule takes an XQuery argument, such as an XML path like /html/head/title, and if there is a result from that XQuery for the specified URL, the Action will fire. An HTTP rule matches a HTTP header value to a specific a specific value. Note that if you are using non-text values, you must specify the type, such as Date or Number. You can compare to values. The HTTP rules allow for all the standard HTTP headers, plus pseudo-headers, such as Age-Days, Age-Seconds to allow for more processing control.

Note that rules must be named in order to be saved.

Rules, once saved, will run at the next scan. They do not run independently of the scan operation.

Also note that you do not have to have a complete URL for rule matching, but merely a path match, so http:// will evaluate to any HTTP based URLS, will evaluate all pages on folder. You can specify that a rule only fires once per scan via the "Fire maximum one time per scan" checkbox in the rule.

If you have a row selected in Explore that URL will be used to seed the URL text of any new rules.

If, in HTTP rules, a Date format is chosen, please conform to the date structure that is specified. You can change the values. Also note that for standard dates, the greater dates are dates that are closer to the present and less than dates are dates that are essentially older. For example, an HTTP rule that has the following:

Last-Modified greater than Jul 28, 2008 12:08:47 PM

will fire on all rules that are newer than the specified date and time in 2008. If the user selects less than such as:

Last-Modified less than Jul 28, 2008 12:08:47 PM

The rule will fire on all pages older than July 28, 2008.

Note that the Age-Seconds and Age-Days is inverse, and a greater than operation will identify pages that are older than the specified value and less than will identify pages that are younger.

The results and actions taken for a specific scan can be reviewed via the Rule Results tab. This table lists what rule fired when, for what URL, the timestamp of the run, the type of rule, what result (value) triggered the condition, and any action that were taken.

Next: The Working Data Folder

Table Of Contents