GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  CHECKLINK (1)

.ds Aq ’

NAME

checklink - check the validity of links in an HTML or XHTML document

CONTENTS

SYNOPSIS

<B>checklinkB> [ options ] uri ...

DESCRIPTION

This manual page documents briefly the <B>checklinkB> command, a.k.a. the W3CX Link Checker.

<B>checklinkB> is a program that reads an HTML or XHTML document, extracts a list of anchors and lists and checks that no anchor is defined twice and that all the links are dereferenceable, including the fragments. It warns about HTTP redirects, including directory redirects, and can check recursively a part of a web site.

The program can be used either as a command line tool or as a CGI script.

OPTIONS

This program follow the usual GNU command line syntax, with long options starting with two dashes (‘-’). A summary of options is included below.
<B>-?, -h, --helpB> Show summary of options.
<B>-V, --versionB> Output version information.
<B>-s, --summaryB> Show result summary only.
<B>-b, --brokenB> Show only the broken links, not the redirects.
<B>-e, --directoryB> Hide directory redirects - e.g. <http://www.w3.org/TR> -> <http://www.w3.org/TR/>.
<B>-r, --recursiveB> Check the documents linked from the first one.
<B>-D, --depthB> n Check the documents linked from the first one to depth n (implies <B>--recursiveB>).
<B>-l, --locationB> uri Scope of the documents checked (implies <B>--recursiveB>). Can be specified multiple times in order to specify multiple recursion bases. If the URI of a candidate document is downwards relative to any of the bases, it is considered to be within the scope. If not specified, the default is the base URI of the initial document, for example for <http://www.w3.org/TR/html4/Overview.html> it would be <http://www.w3.org/TR/html4/>.
<B>-X, --excludeB> regexp Do not check links whose full, canonical URIs match regexp. Note that this option limits recursion the same way as <B>--exclude-docsB> with the same regular expression would.
<B>--exclude-docsB> regexp In recursive mode, do not check links in documents whose full, canonical URIs match regexp. This option may be specified multiple times.
<B>--suppress-redirectB> URI->URI Do not report a redirect from the first to the second URI. The -> is literal text. This option may be specified multiple times. Whitespace may be used instead of -> to separate the URIs.
<B>--suppress-redirect-prefixB> URI->URI Do not report a redirect from a child of the first URI to the same child of the second URI. The \->\ is literal text. This option may be specified multiple times. Whitespace may be used instead of -> to separate the URIs.
<B>--suppress-temp-redirectsB> Do not report warnings about temporary redirects.
<B>--suppress-brokenB> CODE:URI Do not report a broken link with the given CODE. CODE is the HTTP response, or -1 for robots exclusion. The : is literal text. This option may be specified multiple times. Whitespace may be used instead of : to separate the CODE and the URI.
<B>--suppress-fragmentB> URI Do not report the given broken fragment URI. A fragment URI contains #. This option may be specified multiple times.
<B>-L, --languagesB> accept-language The Accept-Language HTTP header to send. In command line mode, this header is not sent by default. The special value auto causes a value to be detected from the LANG environment variable, and sent if found. In CGI mode, the default is to send the value received from the client as is.
<B>-c, --cookiesB> cookie-file Use cookies, load/save them in cookie-file. The special value tmp causes non-persistent use of cookies, i.e. they are used but only stored in memory for the duration of this link checker run.
<B>-R, --no-refererB> Do not send the Referer HTTP header.
<B>-q, --quietB> No output if no errors are found. Implies <B>--summaryB>.
<B>-v, --verboseB> Verbose mode.
<B>-i, --indicatorB> Show progress while parsing as percentage of lines processed. No indicator is shown for documents containing no linefeeds.
<B>-u, --userB> username Specify a username for authentication.
<B>-p, --passwordB> password Specify a password for authentication.
<B>--hide-same-realmB> Hide 401’s that are in the same realm as the document checked.
<B>-S, --sleepB> secs Sleep the specified number of seconds between requests to each server. Defaults to 1 second, which is also the minimum allowed.
<B>-t, --timeoutB> secs Timeout for requests, in seconds. The default is 30.
<B>-C, --connection-cacheB> number Maximum number of cached connections. Using this option overrides the Connection_Cache_Size configuration file parameter, see its documentation below for the default value and more information.
<B>-d, --domainB> domain Perl regular expression describing the domain to which the authentication information (if present) will be sent. The default value can be specified in the configuration file. See the Trusted entry in the configuration file description below for more information.
<B>--masqueradeB> ‘‘real-prefix surrogate-prefix’’ Perform a simple string substitution: URIs which begin with the string real-prefix are rewritten using the surrogate-prefix before being dereferenced. Useful for making a local directory masquerade as a remote one. For example:



  --masquerade "http://example.com/x/y/z/ file:///my/local/dir/"



If the document being checked contains a link to http://example.com/x/y/z/foo.html, then the local file system will be checked for file:///my/local/dir/foo.html.

<B>--masqueradeB> takes a single argument consisting of two URIs, separated by whitespace. The quote marks are not part of the argument, but one usual way of providing a value with embedded whitespace is to enclose it in quotes.

<B>-H, --htmlB> HTML output.

FILES

/etc/w3c/checklink.conf The main configuration file. You can use the W3C_CHECKLINK_CFG environment variable to override the default location.

Trusted specifies a regular expression for matching trusted domains (ie. domains where HTTP basic authentication, if any, will be sent). The regular expression will be matched case insensitively against host names. The default behavior (when unset, that is) is to send the authentication information only to the host which requests it; usually you don’t want to change this. For example, the following configures only the w3.org domain as trusted:



    Trusted = \.w3\.org$



Allow_Private_IPs is a boolean flag indicating whether checking links on non-public IP addresses is allowed. The default is true in command line mode and false when run as a CGI script. For example, to disallow checking non-public IP addresses, regardless of the mode, use:



   Allow_Private_IPs = 0



Forbidden_Protocols is a comma separated list of additional protocols/URI schemes that the link checker is not allowed to use. The javascript and mailto schemes are always forbidden, and so is the file scheme when running as a CGI script.



   Forbidden_Protocols = javascript,mailto



Markup_Validator_URI and CSS_Validator_URI are formatted URIs to the respective validators. The %s in these will be replaced with the full URI encoded URI to the document being checked, and shown in the link checker results view in the online/CGI version. The defaults are:



   Markup_Validator_URI =
     http://validator.w3.org/check?uri=%s
   CSS_Validator_URI =
     http://jigsaw.w3.org/css-validator/validator?uri=%s



Doc_URI is a URI used for linking to the documentation, and CSS and JavaScript files in the dynamically generated content of the link checker. The default is:



   Doc_URI = http://validator.w3.org/docs/checklink.html



Connection_Cache_Size is an integer denoting the maximum number of connections the link checker will keep open at any given time. The default is:



   Connection_Cache_Size = 2



ENVIRONMENT

checklink uses the libwww-perl library which has a number of environment variables affecting its behaviour. See SEE ALSO for some pointers.
<B>W3C_CHECKLINK_CFGB> If set, overrides the path to the configuration file.

SEE ALSO

The documentation for this program is available on the web at <http://validator.w3.org/docs/checklink.html>.

LWP, Net::FTP, Net::NNTP, Net::IP, perlre.

AUTHOR

This program was originally written by Hugo Haas <hugo@w3.org>, based on Renaud Bruyeron’s checklink.pl. It has been enhanced by Ville Skyttae and many other volunteers since. Use the <www-validator@w3.org> mailing list for feedback, and see <http://validator.w3.org/docs/checklink.html#csb> for more information.

This manual page was originally written by Fre\k:'|\n:uric Schuetz <schutz@mathgen.ch> for the Debian GNU/Linux system (but may be used by others).

COPYRIGHT

This program is licensed under the W3CX Software License, <http://www.w3.org/Consortium/Legal/copyright-software>.
Search for    or go to Top of page |  Section 1 |  Main Index


perl v5.20.3 CHECKLINK (1) 2011-03-27

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.