hxextract - extract selected elements from a HTML or XML file
] [ -x
] [ -s text
] [ -e text
] [ -b base
[ -c configfile
outputs all elements with a certain name and/or class.
Input must be well-formed, since no HTML heuristics are applied.
The following options are supported:
- Use XML format conventions.
- -s text
- Insert text at the start of the output.
- -e text
- Insert text at the end of the output.
- -b base
- URL base
- -c configfile
- Read @chapter lines from configfile (lines must be of the form
"@chapter filename") and extract elements from each of those
- -h, -?
- Print command usage.
The following operands are supported:
- The name of an element to extract (e.g., "H2"), or the name of a
class preceded by "." (e.g., ".example") or a
combination of both (e.g., "H2.example").
- A file name or a URL. To read from standard input, use "-".
To use a proxy to retrieve remote files, set the environment variables
Remote files (specified with a URL) are currently only supported for HTTP.
Password-protected files or files that depend on HTTP "cookies" are
not handled. (You can use tools such as curl
(1) or wget
retrieve such files.)