GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
HTML_FMT(1) User Contributed Perl Documentation HTML_FMT(1)

"html_fmt" - Reformat HTML, indented according to structure

    html_fmt [uri|file]

    html_fmt http://perl.org

Given the URI or the name of a file, writes it to "STDOUT" reformatted and indented according to the HTML structure. Missing start and end tags are supplied and comments added to indicate this. Text inside "<pre>" elements is not altered.

html_fmt tries to parse everything that is actually out there on the Web. In fact, html_fmt will assume any file fed to it was intended as HTML, and will produce its best guess of the author's intent.

html_fmt supplies missing start and end tags. html_fmt's parser is extremely liberal in what it accepts. When its liberalization of the standards is not sufficient to make a document into valid HTML, html_fmt will pick characters to treat as noise or "cruft". The parser ignores cruft in determining the structure of the document.

When html_fmt adds a missing start tag, it precedes the new start tag with a comment. When html_fmt adds a missing end tag, it follows the new end tag with a comment. When html_fmt classifies characters as "cruft", it adds a comment to that effect before the "cruft".

"pre" elements receive special treatment. The contents of "pre" elements are not reformatted. When missing tags or cruft occur inside a "pre" element, the comments to that effect are placed before the "<pre>" start tag.

The argument to html_score can be either as a URI or a file name. If it starts with alphanumerics followed by a colon, it is treated as a URI. Otherwise it is treated as file name.

Given this input:

    <title>Test page<tr>x<head attr="I am cruft"><p>Final graf

html_fmt returns

    <!-- Following start tag is replacement for a missing one -->
    <html>
      <!-- Following start tag is replacement for a missing one -->
      <head>
        <title>
          Test page
        </title>
        <!-- Preceding end tag is replacement for a missing one -->
      </head>
      <!-- Preceding end tag is replacement for a missing one -->
      <!-- Following start tag is replacement for a missing one -->
      <body>
        <!-- Following start tag is replacement for a missing one -->
        <table>
          <!-- Following start tag is replacement for a missing one -->
          <tbody>
            <tr>
              <!-- Following start tag is replacement for a missing one -->
              <td>
                x
                <!-- Next line is cruft -->
                <head attr="I am cruft">
                <p>
                  Final graf
                </p>
                <!-- Preceding end tag is replacement for a missing one -->
              </td>
              <!-- Preceding end tag is replacement for a missing one -->
            </tr>
            <!-- Preceding end tag is replacement for a missing one -->
          </tbody>
          <!-- Preceding end tag is replacement for a missing one -->
        </table>
        <!-- Preceding end tag is replacement for a missing one -->
      </body>
      <!-- Preceding end tag is replacement for a missing one -->
    </html>
    <!-- Preceding end tag is replacement for a missing one -->

This program is a demo of a demo. It purpose is to show how easy it is to write applications which look at the structure of web pages using Marpa::HTML. And the purpose of Marpa::HTML is to demonstrate the power of its parse engine, Marpa. Marpa::HTML was written in a few days, and its logic is a straightforward, natural expression of the structure of HTML.

The starting template for this code was HTML::TokeParser, by Gisle Aas. See also the acknowledgments for Marpa as a whole.

Copyright 2007-2010 Jeffrey Kegler, all rights reserved. Marpa is free software under the Perl license. For details see the LICENSE file in the Marpa distribution.
2022-04-13 perl v5.32.1

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.