GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages


Manual Reference Pages  -  XML::FAST (3)

.ds Aq ’

NAME

XML::Fast - Simple and very fast XML to hash conversion

CONTENTS

SYNOPSIS



  use XML::Fast;
 
  my $hash = xml2hash $xml;
  my $hash2 = xml2hash $xml, attr => ., text => ~;



DESCRIPTION

This module implements simple, state machine based, XML parser written in C.

It could parse and recover some kind of broken XML’s. If you need XML validator, use XML::LibXML

RATIONALE

Another similar module is XML::Bare. I’ve used it for some time, but it have some failures:
o If your XML have node with name ’value’, you’ll got a segfault
o If your XML have node with TextNode, then CDATANode, then again TextNode, you’ll got broken value
o It doesn’t support charsets
o It doesn’t support any kind of entities.
So, after count of tries to fix XML::Bare I’ve decided to write parser from scratch.

It is about 40% faster than XML::Bare and about 120% faster, than XML::LibXML

I got this results using the following test on 35kb xml doc:



    cmpthese timethese -10, {
        libxml  => sub { XML::LibXML->new->parse_string($doc) },
        xmlfast => sub { XML::Fast::xml2hash($doc) },
        xmlbare => sub { XML::Bare->new(text => $doc)->parse },
    };

              Rate  libxml xmlbare xmlfast
    libxml  1107/s      --    -38%    -56%
    xmlbare 1782/s     61%      --    -28%
    xmlfast 2490/s    125%     40%      --



Of course, the results could be defferent for different xml files. With non-utf encodings and with many entities it could be slower. This test was taken for a sample RSS feed in utf-8 mode with a small count of xml entities.

Here is some features and principles:
o It uses minimal count of memory allocations.
o All XML is parsed in 1 scan.
o All values are copied from source XML only once (to destination keys/values)
o If some types of nodes (for ex comments) are ignored, there are no memory allocations/copy for them.

EXPORT

xml2hash CW$xml, [ CW%options ]

OPTIONS

order [ = 0 ] <B>Not implemented yetB>. <B>StrictlyB> keep the output order. When enabled, structures become more complex, but xml could be completely reverted.
attr [ = ’-’ ] Attribute prefix



    <node attr="test" />  =>  { node => { -attr => "test" } }



text [ = ’#text’ ] Key name for storing text

When undef, text nodes will be ignored



    <node>text<sub /></node>  =>  { node => { sub => , #text => "test" } }



join [ = ’’ ] Join separator for text nodes, splitted by subnodes

Ignored when order in effect



    # default:
    xml2hash( <item>Test1<sub />Test2</item> )
    : { item => { sub => , ~ => Test1Test2 } };
   
    xml2hash( <item>Test1<sub />Test2</item>, join => + )
    : { item => { sub => , ~ => Test1+Test2 } };



trim [ = 1 ] Trim leading and trailing whitespace from text nodes
cdata [ = undef ] When defined, CDATA sections will be stored under this key



    # cdata = undef
    <node><![CDATA[ test ]]></node>  =>  { node => test }

    # cdata = #
    <node><![CDATA[ test ]]></node>  =>  { node => { # => test } }



comm [ = undef ] When defined, comments sections will be stored under this key

When undef, comments will be ignored



    # comm = undef
    <node><!-- comm --><sub/></node>  =>  { node => { sub =>  } }

    # comm = /
    <node><!-- comm --><sub/></node>  =>  { node => { sub => , / => comm } }



array => 1 Force all nodes to be kept as arrays.



    # no array
    <node><sub/></node>  =>  { node => { sub =>  } }

    # array = 1
    <node><sub/></node>  =>  { node => [ { sub => [  ] } ] }



array => [ ’node’, ’names’] Force nodes with names to be stored as arrays



    # no array
    <node><sub/></node>  =>  { node => { sub =>  } }

    # array => [sub]
    <node><sub/></node>  =>  { node => { sub => [  ] } }



SEE ALSO

o XML::Bare

Another fast parser, but have problems

o XML::LibXML

The most powerful XML parser for perl. If you don’t need to parse gigabytes of XML ;)

o XML::Hash::LX

XML parser, that uses XML::LibXML for parsing and then constructs hash structure, identical to one, generated by this module. (At least, it should ;)). But of course it is much more slower, than XML::Fast

TODO

o Ordered mode (as implemented in XML::Hash::LX)
o Create hash2xml, identical to one in XML::Hash::LX
o Partial content event-based parsing (I need this for reading XML streams)
Patches, propositions and bug reports are welcome ;)

AUTHOR

Mons Anderson, <mons@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010 Mons Anderson

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Search for    or go to Top of page |  Section 3 |  Main Index


perl v5.20.3 XML::FAST (3) 2010-12-17

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.