GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
XML::DoubleEncodedEntities(3) User Contributed Perl Documentation XML::DoubleEncodedEntities(3)

XML::DoubleEncodedEntities - unbreak XML with doubly-encoded entities

Occasionally, XML files escape into the wild with their entities encoded twice so instead of this:

    <chocolate>Green &amp; Blacks</chocolate>

you get:

    &lt;chocolate&gt;Green &amp;amp; Blacks&lt;/chocolate&gt;

A real-world example of this problem can be seen in this failing test for a module which queries an online XML datasource:

    http://www.nntp.perl.org/group/perl.cpan.testers/2007/02/msg414642.html

(search for the text 'Arcturus' in that page).

This module tries to fix that.

    use XML::DoubleEncodedEntities;
    
    my $xmlfile = XML::DoubleEncodedEntities::decode($xmlfile);

This function is not exported, but can be if you wish. It takes one scalar parameter and returns a corresponding scalar, decoded if necessary.

The parameter is assumed to be a string. If its first non-whitespace characters are "&lt;", or if it contains the sequence "&amp;amp;" the string is assumed to be a doubly-encoded XML document, in which case the following entities, if present, are decoded: &amp; &lt; &gt; &quot; &apos;

No other parameters are decoded. After all, if the input document has been *doubly* encoded then something like "æ", which should be the entity "&aelig;" will be represented by the character sequence "&amp;aelig;". Once the "&amp;" has been corrected by this module, you'll be able to decode the resulting "&aelig;" in the normal way.

I welcome feedback about my code, including constructive criticism. Bug reports should be made using <http://rt.cpan.org/> or by email, and should include the smallest possible chunk of code, along with any necessary data, which demonstrates the bug. Ideally, this will be in the form of a file which I can drop in to the module's test suite. Ideally such files will work in perl 5.004.

If you are feeling particularly generous you can encourage me in my open source endeavours by buying me something from my wishlist: <http://www.cantrell.org.uk/david/wishlist/>

Encode::DoubleEncodedUTF8, which does the same job for broken UTF-8.

Test::DoubleEncodedEntities, which is HTMLish.

David Cantrell <david@cantrell.org.uk>

Copyright 2007 David Cantrell

This module is free-as-in-speech software, and may be used, distributed, and modified under the same terms as Perl itself.

This module is also free-as-in-mason software.
2015-03-22 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.