This panel enables you to control which URL's are crawled (i.e. have their content scanned for URL's) or saved to disk by specifying restrictions (filters) on the URL's properties - like its content.
For example suppose you only want to crawl all files which contain the word "Blue Crab." Then you can choose "Crawl a resource only if the..." from the popup button, and enter "Blue Crab" into the "Content contains" text field.
It is important to note that filters are not applied to the starting URL.
All filters specified must be TRUE for the file to be processed.
An interesting aspect of this filter panel is the ability to run an AppleScript at each point the file is crawled or saved.
This enables you to extend the processing of URLs to other applications. For example you could use scripting to import certain grabbed data during the crawl into a database.
Scripts have access to the following variables. To access the value in a script include the embedding symbol listed after the variable name. Note that each embedding symbol is prefixed by a double underscore.
- URL: __theURL
The text of the URL itself.
- Host: __host
The host portion of the url.
- Path: __path
The path portion of the url.
- Search args: __searchArgs
The search args portion of the url.
- Path args: __pathArgs
The path args portion of the url.
- Header: __header
The header received by the server.
- Date: __Date
The "Date" header field received by the server.
- Last Modifed: __Last_Modified
The "Last-Modified" header field received by the server.
- Content Type: __Content_Type
The "Content-Type" header field received by the server.
- Content Length: __Content_Length
The "Content-Length" header field received by the server.
- Location: __Location
The "Location" header field received by the server.
- Parent URL: __parentURL
The URL the current URL was extracted from.
- Name: __name
The name of the URL itself.
- Suffix: __suffix
The suffix of the URL itself.
- Response: __theResponse
The response portion of the header received by the server.
- Data: __theData
The complete data received by the server. Blue Crab will properly modify the data so that it is treated as an AppleScript string.