Document Properties: Links

The Links properties page allows you to specify how to handle hyperlinks.

Maximum Link Depth

You tell iSiloX how far to follow hyperlinks by specifying a value for the Maximum link depth. A new installation of iSiloX has this value initialized to a default of one. The root source files are considered to be at a depth of zero. Files to which they link are at a depth of one. Files to which those files link are at a depth of two, and so on.

Recommendation

If you are creating a document based on a Web site, you are recommended to leave the Maximum link depth value at one because each additional increment in depth beyond one will likely cause an exponential increase in the size of the document. For example, at a link depth of one, if the converted document is one megabyte in size, at a link depth of two, it might be ten megabytes, and at a link depth of three, it could be 100 megabytes.

Off-site Links

An off-site link is defined as a link to a target in a different domain. iSiloX treats all file paths as belonging to the same domain. For URLs, iSiloX treats the domain as the protocol (e.g., http://) and the hostname. To tell iSiloX to not follow links to targets in different domains, uncheck the Follow off-site links checkbox. This is useful to limit the amount of irrelevant content brought into the document.

iSiloX performs the off-site link check anew for each root source file. What this means is that you can have root source files in different domains. For example, you can have two root source files, one with the URL <http://www.iSilo.com> and another with the URL <http://www.palm.com>. Assuming that you have unchecked the option to follow off-site links, then when iSiloX converts the content at <http://www.iSilo.com>, it only follows links from there with target URLs that begin with <http://www.iSilo.com>. When iSiloX converts the content at <http://www.palm.com>, it only follows links from there with target URLs that begin with <http://www.palm.com>. If the content at <http://www.palm.com> had a link to <http://www.iSilo.com/whatsnew.htm>, iSiloX will not follow that link.

Maximum off-site link depth

When you enable the option to follow off-site links, you can also specify how far to follow hyperlinks that go off-site. A value of zero is equivalent to unchecking the option to follow off-site links. The depth is relative to the source file containing the off-site link, rather than relative to the root source files.

If you uncheck the Follow off-site links option, then the Maximum off-site link depth setting has no effect.

Note that the value for the Maximum link depth setting still limits the total link depth. So if the maximum link depth is set to two and the maximum off-site link depth is set to one, and there is an off-site link from a source file at depth two, that link is not followed, although it is at a depth of one relative to the source file with that off-site link.

The maximum off-site link depth option is useful in the case where you specify a maximum depth value greater than one in order to include more content from a given site but want to allow links to off-site articles.

Following Only Sub-Folder Links

In many cases, websites are structured hierarchically within folders and sub-folders. And in such cases, it is also probably the case that the URLs referencing the pages of such a site are also orgznied as such, with slashes separating the different levels of folders. For example, the iSiloX.com website has all support pages within a folder named "support". Within the support folder, there are sub-folders for different categories of support, such as a sub-folder named "manual" where the manuals are located. However, such sub-folder pages may also have links to pages outside of the folder. If you want to limit followed links to only sub-folders of the root source pages then you can check the Follow only links that are sub-folders of the root source paths checkbox to do so. If you do, then iSiloX only follows links which match up to the last slash of any of the root source URLs.

As an example, if you wanted to get all the support pages from the iSiloX.com website, you might specify http://www.iSiloX.com/support/index.htm as the root source URL and check Follow only links that are sub-folders of the root source paths. The page http://www.iSiloX.com/support/index.htm has a reference to the home page of the site http://www.iSiloX.com. However, because you check the Follow only links that are sub-folders of the root source paths option, that link will not be followed. However, a link such as http://www.iSiloX.com/support/faq.htm to the frequently asked questions page will be followed.

Unresolved Link Detail

In most cases, since you can tell iSiloX to only follow links up to a given maximum depth and to not follow off-site links, you end up with a document that has hyperlinks to content not brought into the document. These hyperlinks are referred to as unresolved links. You can choose whether to include the target URLs of these unresolved links in the document or not by checking or unchecking the Include unresolved link detail checkbox.

Including unresolved link detail

If you choose to include the unresolved link detail, iSiloX creates a document with an additional page at the end that lists the URLs of all unresolved links. iSiloX sets the target of each unresolved link in the document to jump to its corresponding target URL on this last page. This is useful for later reference and for finding broken hyperlinks.

Not including unresolved link detail

If you choose not to include the unresolved link detail, the unresolved hyperlinks essentially have no target. When viewing the document within a reader and attempting to follow such a hyperlink, the reader will tell you that the hyperlink was unresolved, but gives no indication of the target URL.

Common sources of unresolved links

The most common sources of unresolved links are the following:
  • Links that are at a depth greater than that specified in the Maximum link depth setting.
  • Links that are outdated and thus are broken because the target has moved.
  • Links whose targets are specified incorrectly.
  • URL Filters

    Click URL Filters to access the URL Filters dialog to specify patterns for excluding images and the following of links based on the image or link target URL. URL filters are useful for excluding unwanted images and content and for reducing document sizes. A filter is specified using either a wildcard or regular expression pattern matching string.

    Exclusion filters

    If the URL of an image matches against one of the exclusion patterns, it is not included in the document. If the target URL of a link matches against one of the exclusion patterns, the link is not followed and hence the target content is not included in the document. Exceptions to exclusions can be specified using inclusion filters.

    Adding an exclusion filter

    Click Add Exclusion Filter to access the dialog for specifying a new exclusion filter. In the URL Filter dialog, select a pattern type of either Wildcard or Regular Expression: In the Pattern field, enter the pattern to use. Check Case-sensitive to perform a case-sensitive match. By default, matching is case-insensitive, with the lowercase letters 'a' through 'z' matching the uppercase letters 'A' through 'Z'.

    Deleting an exclusion filter

    Select one or more exclusion filters, then click Delete Selected Exclusion Filters to delete them. You will be asked for confirmation before the filters are deleted.

    Modifying an exclusion filter

    Double-click an exclusion filter to modify it.

    Inclusion filters

    An inclusion filter serves as an exception to the exclusion filters. If a given URL matches against an exclusion filter the inclusion filters are applied to the URL, and if there is a match against an inclusion filter, the URL is not excluded. Click Add Inclusion Filter to access the dialog for specifying a new inclusion filter. To delete one or more inclusion filters, select them, then click Delete Selected Inclusion Filters. To modify an inclusion filter, double-click it.

    Example

    This example specifies two exclusion filters and one inclusion filter. The first exclusion filter specifies a regular expression pattern that is case-insensitive. The pattern matches the text "table" followed by any digit character from '0' through '9' and then followed by the text ".jpg". So the pattern will match against any of the following: But the pattern will not match against any of the following: The second exclusion filter is a wildcard pattern and is also case-insensitive. The pattern matches the text "figures" followed by zero or more of any mix of characters, followed by the text "plant", followed by any single character, and finally followed by the text "blue". The pattern will thus match against any of the following: But the pattern will not match against any of the following: The first exclusion filter would exclude the URL "http://www.acme.org/Table8.jpg". However, because the inclusion filter notes it as an exception, the URL would actually not be excluded. Note that this inclusion pattern specifies a case-sensitive match, and so "http://www.acme.org/table8.jpg" would not be noted as an exception.

    External Documents

    Click External Documents to access the External Documents dialog to specify which links are to external documents. A document may have links to zero or more external documents.

    In the External Documents dialog, the External document list lists the document name and link prefix fields of each external document specification for the document. Generally you will have one external document specification for each external document to which the document will link.

    An external document specification consists of four pieces of information, as shown in the External Document Specification box of the dialog:

    Lookup method tradeoffs

    The lookup methods each have their own individual advantages and disadvantages.

    For the document storage space tradeoffs among the methods, the Name method requires the largest amount of storage space in the linking document as well as in the targeted external document unless the number of target names are very few and short in length. The ID and Offset methods require approximately the same amount of storage space as each other in the linking document. In the targeted external document, the Offset method requires no additional storage space, while the ID method requires an amount of storage space that is generally less than the Name method.

    In terms of the speed of performing the lookup when a jump occurs to an external document, the difference perceived by the user is probably negligible. But the Name method requires the most amount of processing. The ID method comes next, while the Offset method requires the least amount of processing for lookup.

    The other important tradeoff among the methods concerns synchronization between a document and the external documents to which it links. For the purposes of this discussion, let us say that we have a document named DocSource that has links to an external document named DocTarget and that DocTarget is updated indepedent of DocSource. The content and targets in DocTarget change periodically such that content and targets may be added and removed. Assume though that the targets to which DocSource links to in DocTarget are always there, though the specific location of the targets within the content of DocTarget may change.

    Given the scenario just described, if the lookup method is Name, even though DocTarget may undergo many changes and DocSource stays the same, the links from DocSource to DocTarget will always work.

    If the lookup method is ID this may not be the case. The IDs assigned to each target within DocTarget depend to some extent on all other external targets within DocTarget. If DocTarget gets a new target or one is removed, the target IDs for the other targets may change. As a result, the target IDs stored in DocSource for the targets in DocTarget may become invalid. However, if only the content in DocTarget changes, the target IDs will still be valid.

    If the lookup method is Offset, then neither the content nor the targets in DocTarget may change if the links from DocSource to DocTarget are to remain valid.

    The Name lookup method, though requiring the most storage space, is the best method to use for documents that can change independent of one another. The Offset lookup method requires the least amount of storage space and is a good method to use for documents that will change together. The ID lookup method generally requires only a modest amount of storage space compared to the Name method and is a good method to use when only changes to the content, such as minor corrections, are expected to occur in an external document.

    Adding a new external document specification

    To add a new external document specification, fill in the fields in the External Document Specification box and then click Add.

    Modifying an existing external document specification

    In the External document list select the specification to modify. The fields in the External Document Specification box change to show the values for the selected specification. Make the modifications, then click Modify.

    Deleting one or more external document specifications

    To delete individual specifications, select them individually, then click Delete. To delete all specifications, click Delete All.

    Changing the order of the specifications

    The order of the specifications may be important for your document set. To change the order of a specification, select it and then use the Move Up and Move Down buttons to move the specification up and down, respectively, in the list. Specifications are applied in order from top to bottom. The first specification that matches a given link is the one used.