fix_latin options <input_file >output_file Options: --use-xs <value> 'auto' | 'always' | 'never' --version list version number --help detailed help message
Multi-byte UTF8 characters will be passed through unchanged (although over-long UTF8 byte sequences will be converted to the shortest normal form). Single byte characters will be converted as follows:
0x00 - 0x7F ASCII - passed through unchanged 0x80 - 0x9F Converted to UTF8 using CP1252 mappings 0xA0 - 0xFF Converted to UTF8 using Latin-1 mappings
If you have a SQL format dump file that you would normally restore by piping into 'psql', you can simply filter the dump file through this script:
fix_latin < dump_file | psql -d database
If you have a compressed dump file that you would normally restore using 'pg_restore', you can omit the '-d' option on pg_restore and pipe the resulting SQL through this script and into psql:
pg_restore -O dump_file | fix_latin | psql -d database
To take a look at non-ASCII lines in the dump file:
perl -ne '/^COPY (\S+)/ and $t = $1; print "$t:$_" if /[^\x00-\x7F]/' dump_file
In particular you should read the 'LIMITATIONS' section to understand the circumstances under which data corruption might occur.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.