html2latex
—
convert HTML markup to LaTeX markup
html2latex |
[ opt ]
[file ... ] |
For each file argument,
html2latex
converts
the text as HTML markup to LaTeX markup. If no files are specified, a usage
message is given. Input will be taken from standard input for files named
-
. Output will to a similarly named file
with a ‘
.tex
’ extension (
html2latex
recognises
‘
.html
’ extensions).
Options modify the action of
html2latex
.
The options are:
-n
- Number sections.
-p
- Place page breaks after the title page (if present) and the table of
contents (if present).
-c
- Generate a table of contents.
-s
- Create no files -- LaTeX is output to stdout.
-t
Title
- Generate a title page, with the title
Title.
-a
Author
- Generate a title page, with the author
Author.
-h
Header
- Place the text Header after
‘
\begin{document}
’.
-f
Footer
- Place the text Footer before
‘
\end{document}
’.
-o
Options
- Specify the options to
‘
\documentstyle
’.
An example of use is
html2latex -n - < file.html |
less
This converts
file.html to LaTeX and pages
through the output. The sections (corresponding to heading tags in the HTML
source) will be numbered.
Another example is
html2latex -t 'Introduction to HTML' -a gnat \
-p -c -o '[bookman]{article}' html-intro
This takes input from the file
html-intro,
writing to
html-intro.tex, and adds a title
page (with title
Introduction to HTML and author
gnat) and table of contents with page-breaks
after both. The sections of the document are not numbered. The LaTeX source
includes the line
‘
\documentstyle[bookman]{article}
’.
latex(1).
Current the only HTML tags supported are:
TITLE, H1,
H2, H3, H4, H5, H6, UL, OL, DL, DT, DD, LI,
B, I, U, EM, STRONG, CODE, SAMP,
KBD, VAR, DFN, CITE, LISTING. The only recognised
SGML escapes are ‘
&.amp
’,
‘
&.lt
’,
‘
&.gt
’.
ADDRESS tags are handled badly.
The
COMPACT attribute to a
DL tag is not recognised.
MENU and
DIR styles
are not handled well.
TITLE text are ignored.
Currently
PRE tags are not handled at all.
The entire file is read into memory. For long HTML documents on machines with
little memory, this may cause problems.
Nathan Torkington adapted the HTML parser from NCSA's Xmosaic package
(file://ncsa.uiuc.edu/Web/xmosaic) and wrote the conversion code. The HTML
parser code is subject to the NCSA restrictions. The conversion code is
subject to the VUW restrictions. Enquiries should be sent via e-mail to
‘
Nathan.Torkington@vuw.ac.nz
’.