|filename||Read the HTML input from filename (optional argument). If this argument is not provided, the HTML input is read from standard input.|
|-o filename||Output XHTML file. The file is overwritten if it exists. If not provided, the output is written to standard output.|
|-e||Instructs the program to propagate input chunks to the output even if it is unable to adapt them to the output XHTML doctype. Using this option, the XHTML output may be non-valid. Not using this option, some input data could be removed from the output in some [rare] cases.|
Doctype of the output XHTML file. If not specified,
the program selects automatically either
XHTML 1.0 Transitional or XHTML 1.0 Frameset
depending on the input. Current available
o transitional XHTML 1.0 Transitional
o frameset XHTML 1.0 Frameset
o strict XHTML 1.0 Strict
o 1.1 XHTML 1.1
o basic-1.0 XHTML Basic 1.0
o basic-1.1 XHTML Basic 1.1
o mp XHTML Mobile Profile
o print-1.0 XHTML Print 1.0
|--ics input_charset||Character set of the input document. This option overrides the default input character set detection mechanism.|
|--ocs output_charset||Character set for the output XHTML document. If this option is not present, the character set of the input is used as default.|
|--lcs||Dump the list of available character set aliases and exit html2xhtml. No conversion is performed when this option is present.|
|-l line_length||Number of characters per line. The default value is 80. It must be greater or equal to 40, otherwise the parameter is ignored.|
|-b tab_length||Tab length in number of characters. It must be a number between 0 and 16, otherwise the parameter is ignored. Use 0 to avoid indentation in the output.|
|--preserve-space-comments||Use this option to preserve white spaces, tabs and ends of lines in HTML comments. The default, if not provided, is to rearrange spacing.|
|--no-protect-cdata||Enclose CDATA sections in "script" and "style" following the XHTML 1.0 specification (using "<!CDATA[[" and "]]>"). It might be incompatible with some browsers. The default in this version is to enclose CDATA sections using "//<!CDATA[[" and "//]]>", because major browsers handle it properly.|
|--compact-block-elements||No white spaces or line breaks are written between the start tag of a block element and the start tag of its first enclosed inline element (or character data) and between the end tag of its last enclosed inline element (or character data) and the end tag of the block element. By default, if this option is not set, a new line character and indentation is written between them.|
|--compact-empty-elm-tags||Do not write a whitespace before the slash for empty element tags (i.e. write "<br/>" instead of the default "<br />"). Note that although both notations are correct in XML, the XHTML 1.0 standard recommends the latter to improve compatibility with old browsers.|
|--empty-elm-tags-always||By default, empty element tags are written only for elements declared as empty in the DTD. This option makes any element not having content to be written with the empty element tag, even if it is not declared as empty in the DTD. This option may cause problems when the XHTML document is opened by browsers in HTML (tag soup) mode.|
|--dos-eol||Write the output XHTML file with DOS--style (CRLF) end of line, instead of the default UNIX--style end of line. Both end of line styles are allowed by the XML recommendation.|
|--generate-snippet||Treat the input as an HTML fragment instead of a full document. The output will also be a snippet and will not contain either the XML and doctype declarations or the html, head and body elements.|
|--help||Show a brief help message and exit.|
Show the version number and exit.
Since version 1.1.2, html2xhtml is able to parse and write HTML and XHTML documents using the most popular character sets / encodings. It is also able to read the input document using a given character set and generate an output that uses a different character set. The iconv implementation in the GNU C library is used with that purpose.
Any IANA-registered character set that is supported by the iconv library may be used. When naming a character set, any IANA--approved alias for it may be used. The full list of aliases recognised by html2xhtml can be obtained with the --lcs command-line option.
If the character set of the input document is not specified, html2xhtml tries to guess it automatically. If the character set of the output document is not specified, html2xhtml writes the output using the same character set as the input document.
By default, the UNIX-style one-byte end of line is used. It can be changed to DOS-style CRLF end of line with the --dos-eol command-line option.
However, when the program is compiled in the MinGW environment and the output is sent to standard output, the output is automatically converted by the environment to CRLF by default. Do not use the --dos-eol command-line option in that situation. When the output is sent to a file with the -o command-line option, the output is as expected (UNIX-style by default), and the --dos-eol option may be used.
Program developer up to current version: Jesus Arias Fisteus <email@example.com>
The first working version of this program has been developed as a Master Thesis at the University of Vigo (Spain) [http://www.uvigo.es], advised by:
Rebeca Diaz Redondo Ana Fernandez Vilas
Copyright 2000-2001 by Jesus Arias Fisteus, Rebeca Diaz Redondo, Ana Fernandez Vilas. Copyright 2002-2009 by Jesus Arias Fisteus