GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
Text::Highlight(3) User Contributed Perl Documentation Text::Highlight(3)

Text::Highlight - Syntax highlighting framework

   use Text::Highlight 'preload';
   my $th = new Text::Highlight(wrapper => "<pre>%s</pre>\n");
   print $th->highlight('Perl', $code);

Text::Highlight is a flexible and extensible tool for highlighting the syntax in programming code. The markup used and languages supported are completely customizable. It can output highlighted code for embedding in HTML, terminal escapes for an ANSI-capable display, or even posting on an online forum. Bundled support includes C/C++, CSS, HTML, Java, Perl, PHP and SQL.

In order to install and use this package you will need Perl version 5.005 or better.

Installation as usual:

   % perl Makefile.PL
   % make
   % make test
   % su
     Password: *******
   % make install

No thirdy-part modules are required.

Following modules are optional

HTML::SyntaxHighlighter and HTML::Parser (in order to have better highlighting HTML code)
Term::ANSIColor (if you want terminal escapes)

[Todo]

Text::Highlight provides an object oriented interface described in this section. Optionally, new can take the same parameters as the "configure" method described below.

"my $th = new Text::Highlight( %args )"

"$th->configure( %args )"

Sets the method used to output highlighted code. Any combination of the following properties can be passed at once. If any option is invalid (such as the wrapper containing no %s), a note of such is "cluck"ed out to STDERR and is otherwise silently ignored.

"wrapper => '<pre>%s</pre>'"

An sprintf-style format string that the entire code is passed through when it's completed. It must include a single %s and any other optional formatting. If you do not want any wrapper, just the highlighted code, set this to a simple '%s'. Also, be aware that since this is an sprintf format, you must be careful of other % characters in the format. Include only a single '%s' in the format for the highlighted code. Refer to "sprintf" in perlfunc.

"markup => '<span class="%s">%s</span>'"

Another sprintf format string, this one's for the markup of individual semantic pieces of the highlighted code. In short, it's what makes a comment turn green. The format contains two '%s' strings. The first is the markup identifier from the "colors" hash for the type of snippet that's being marked up. The second is the actual snippet being marked up. A comment may look like "<span class="comment">#me comment</span>" as final output.

The limitation of this is that the identifier for the type must come before the code itself. Normally, this is the way markup works, but if you have something that won't, you're out of luck for the immediate time being. Future versions may include support for setting a coderef to get around it.

"colors => \%hash"

The default colors hash is:

  { comment => 'comment',
    string  => 'string',
    number  => 'number',
    key1    => 'key1',
    key2    => 'key2',
    key3    => 'key3',
    key4    => 'key4',
    key5    => 'key5',
    key6    => 'key6',
    key7    => 'key7',
    key8    => 'key8',
  };

This is the name to semantic markup token mapping hash. The parser breaks up code into semantic chunks denoted by the name keys. What gets passed through the above "markup"'s format is the value set at each key. This can hold things like raw color values, ANSI terminal escapes, or, the default, CSS classes.

"escape => \&escape_sub | 'default' | undef"

Every bit of displayed code is passed through an escape function customizable for the output medium. "$escaped_string = escapeHTML("unescaped string")" If set to a code reference, it will be called for every piece of code. This gets called a lot, so if you're concerned with performance, take care that the function is pretty lightweight.

The default function does a minimal HTML escape, only the three & < and > characters are escaped. If you desire a more robust HTML escape, it has the same prototype as HTML::Entity's "encode_entities()" and CGI's "escapeHTML()". If you change the escape routine and want to change it back to the default, just set it to the literal string 'default'.

A third option is no escaping at all and can be set by passing "undef".

"vb => 1", "tgml => 1", "ansi => 1"

When true, it sets the format, wrapper, escape, and colors to that of the specified markup. When "vb" is true, it sets values for posting in vBulletin. For "tgml" it's good at Tek-Tips. For "ansi" it's good for display in a terminal that accepts ANSI color escapes.

Note, if more than one of these is present in a given call to "configure", it is indeterminite as to which one gets set. Also, if wrapper, markup, colors, or escape is passed along with vb, tgml, or ansi, it does not get overwritten. Hence, "$th->configure(wrapper => '[tt]%s[/tt]', tgml => 1)" will set the stored TGML settings for markups, colors, and escape, but will use the custom wrapper passed in instead of the value stored for TGML.

"$code = $th->highlight($type, $code, $options)"

"$code = $th->highlight(type => $type, code => $code, options => $options)"

The "highlight" method is the one that does all the work. Given at least the "type" and original "code", it will mark-up and return a string with the highlighted code. It takes named parameters as listed below, or just their values as a flat array in the order listed below. Order is subject to change, so you're probably safer using the hash syntax.

"type => $type"

The "type" passed in is the name of the type of code. This can either be a type loaded from "get_syntax" or is the name of a sub-module that has a syntax or highlight method, ie "Text::Highlight::$type".

"code => $code"

"code" is the unmarked-up, unescaped, plain-text code that needs to be highlighted.

"options => $options"

"options" is optional and mostly not needed. Some parsing modules can take extra configuration options, so what "options" is can vary greatly. Could be a string, a number, or a hashref of many options. The only standard is if it is set to the string 'simple' in which case the "highlight" method of the syntax module is not called and Text::Highlight's local parsing method is used with the syntax module's "syntax" hash.

"$code = $th->output"

Returns the highlighted code from the last time the "highlight" method was called.

"$th->get_syntax($type, $grammar, $format, $force)"

"$th->get_syntax(type => $type, grammar => $grammar, format => $format, force => $force)"

In addition to the existing T::H:: sub-modules, you can specify new ones at runtime via text editor syntax files. Current support is for EditPlus and UltraEdit (both very good text/code editors). Many users make these files available on the web and shouldn't be difficult to find. This method can also be used to load an already parsed language syntax hash if, for whatever reason, you don't want to make them into modules.

This method returns a hashref to the parsed syntax if successful, or undef and a clucked error message if not. You can use the returned value as a simple truth test, or you can make your own static sub-module out of it and save reparsing time if you're using the same additional types often. See <a doc that doesn't yet exists> for details on creating a sub-module. The object keeps a copy of the new type and can be referenced in the highlight method for the object's life.

"type => $type"

The "type" is the same that gets passed to "highlight", so whatever is specified here must match the call there for use. Also, if the same type is specified as one that already exits as a sub-module (visible in @INC as Text::Highlight::$type), the syntax loaded via "get_syntax" will take precedence.

"grammar => $filename | \%syntax"

"grammar" can be one of two things: the filename containing the syntax, or a hashref to an already parsed language syntax. If a filename, the file must contain only a single language syntax definition. Though some editors allow multiple language defined in the same file, to be loaded here, it may contain only one. If a hashref, it is assumed to be valid and no further checking is done.

"format => 'editplus' | 'ultraedit'"

"format" is a string specifying which format the syntax definition in the file is in. It is not used if "grammar" is a hashref, but is required if it is a filename. Currently, it must be set to one of the following strings: 'editplus' 'ultraedit'

The syntax for a language is set to the following default hash before parsing the file. This means if any of the options are not set in the syntax file, the default specified here is used instead. If "format" is not set to a valid string, this default hash is also set and passed back instead of throwing an error. It will allow parsing to happen without error, but will not do anything to the code.

  { name => 'Unknown-type',
    escape => '\\',
    case => 1,
    continueQuote => 0,
    blockCommentOn => [],
    lineComment => [],
    quot => [],
  };

"force => 1"

If "force" is set to a true value, the grammar specified will always be reparsed, reset, and reloaded. By default, if a grammar is loaded for a "type" that has already been loaded, the existing copy is used instead and no reparsing is done. This works as a very simple cacheing mechanism so you don't have to worry about unneccessary processing unless you want to.

Until I come up with some better examples, here's the defaults the module uses.

  $DEF_FORMAT   = '<span class="%s">%s</span>';
  $DEF_ESCAPE   = \&_simple_html_escape;
  $DEF_WRAPPER  = '<pre>%s</pre>';
  $DEF_COLORS   = { comment => 'comment',
                    string  => 'string',
                    number  => 'number',
                    key1    => 'key1',
                    key2    => 'key2',
                    key3    => 'key3',
                    key4    => 'key4',
                    key5    => 'key5',
                    key6    => 'key6',
                    key7    => 'key7',
                    key8    => 'key8',
  };
                                
  #sub is the same prototype as CGI.pm's escapeHTML()
  #and HTML::Entity's encode_entities()
  sub _simple_html_escape
  {
      my $code = shift;
        
      #escape the only three characters that "really" matter for displaying html
      $code =~ s/&/&amp;/g;
      $code =~ s/</&lt;/g;
      $code =~ s/>/&gt;/g;
  
      return $code;
  }

[Todo]

[Todo]

  • Finish documentation (especially a "how do I make a custom highlighting module" kind of thing)
  • Let "wrapper" and "format" take coderefs instead of just sprintf format strings
  • Add support for "get_syntax" to take a file handle
  • Add support for a force case option for case-insensitive languages (upper, lower, or match stored)
  • Write T::H:: wrappers for the modules in the Syntax:: namespace
  • Test, test ,test ;-)

Andrew Flerchinger <icrf [at] wdinc.org>

Enrico Sorcinelli <enrico [at] sorcinelli.it> (main contributors)

Please submit bugs to CPAN RT system at <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Highlight> or by email at bug-text-highlight@rt.cpan.org

Patches are welcome and we'll update the module if any problems are found.

Version 0.04

HTML::SyntaxHighlighter, perl(1)

Copyright (C) 2001-2005. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
2005-05-29 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.