NAME

Geo::BUFR - Perl extension for handling of WMO BUFR files.

SYNOPSIS

  # A simple program to print decoded contents of a BUFR file. Note
  # that a more sophisticated program (bufrread.pl) is included in the
  # package

  use Geo::BUFR;

  Geo::BUFR->set_tableformat('BUFRDC'); # ECCODES is also possible
  Geo::BUFR->set_tablepath('path to BUFR tables');

  my $bufr = Geo::BUFR->new();

  $bufr->fopen('name of BUFR file');

  while (not $bufr->eof()) {
      my ($data, $descriptors) = $bufr->next_observation();
      print $bufr->dumpsections($data, $descriptors) if $data;
  }

  $bufr->fclose();

DESCRIPTION

BUFR = Binary Universal Form for the Representation of meteorological data. BUFR is approved by WMO (World Meteorological Organization) as the standard universal exchange format for meteorological observations, gradually replacing a lot of older alphanumeric data formats.

This module provides methods for decoding and encoding BUFR messages, and for displaying information in BUFR B and D tables and in BUFR flag and code tables.

Installing this module also installs some programs: "bufrread.pl", "bufrresolve.pl", "bufrextract.pl", "bufrencode.pl", "bufr_reencode.pl" and "bufralter.pl". See <https://wiki.met.no/bufr.pm/start> for examples of use. For the majority of potential users of Geo::BUFR I would expect these programs to be all that you will need Geo::BUFR for.

Note that being Perl, this module cannot compete in speed with for example the (free) ECMWF BUFRDC Fortran library. Still, some effort has been invested in making the module reasonable fast in that the core routines for encoding and decoding bitstreams are implemented in C.

METHODS

The "get_" methods will return undef if the requested information is not available. The "set_" methods as well as "fopen", "fclose", "copy_from" and "rewind" will always return 1, or croak if failing.

Create a new object:

  $bufr = Geo::BUFR->new();
  $bufr = Geo::BUFR->new($BUFRmessages);

The second form of "new" is useful if you want to provide the BUFR messages to decode directly as an input buffer (string). Note that merely calling "new($BUFRmessages)" will not decode anything in the BUFR messages, for that you need to call "next_observation()" from the newly created object. You also have the option of providing the BUFR messages in a file, using the no argument form of "new()" and then calling "fopen".

Associate the object with a file for reading of BUFR messages:

  $bufr->fopen($filename);

Close the associated file that was opened by fopen:

  $bufr->fclose();

Check for end-of-file (or end of the input buffer provided as argument to "new"):

  $bufr->eof();

Returns true if end-of-file (or end of input buffer) is reached, false if not.

Ensure that next call to "next_observation" will decode first subset in first BUFR message:

  $bufr->rewind();

Copy from an existing object:

  $bufr1->copy_from($bufr2,$what);

If $what is 'all' or not provided, will copy everything in $bufr2 into $bufr1, i.e. making a clone. If $what is 'metadata', only the metadata in section 0, 1 and 3 will be copied (and all of section 2 if present).

Load B and D tables:

  $bufr->load_BDtables($table);

$table is optional, and should for BUFRDC be (base)name of a file containing a BUFR table B or D, using the ECMWF BUFRDC naming convention, i.e. [BD]'table_version'.TXT. For ECCODES, use last part of path, e.g. on UNIX-like systems '0/wmo/18' for master tables and '0/local/8/78/236' for local tables, or both if that is needed, e.g. '0/wmo/18,0/local/8/78/236'. If no argument is provided, "load_BDtables()" will use BUFR section 1 information in the $bufr object to decide which tables to load (which for ECCODES might be up to 4 table files, both local and master tables). Previously loaded tables are kept in memory, and "load_BDtables" will return immediately if the tables already have been loaded. Will die (croak) if tables cannot be found, but (in the no argument version) not if these are local tables (Local table version number > 0) and the corresponding master tables exist (Local table version number = 0), which then will be loaded instead. Returns table version for the tables loaded (see "get_table_version").

Load C table:

  $bufr->load_Ctable($table,$default_table);

Both $table and $default_table are optional. This will load the flag and code tables (if not already loaded), which in ECMWF BUFRDC are put in tables C'table_version'.TXT (not to be confused with WMO BUFR table C, which contains the operator descriptors). $default_table will be used if $table is not found. For $table and $default_table in ECCODES, use (just like for "load_BDtables") last part of path, e.g. on UNIX-like systems '0/wmo/18' for master tables and '0/local/8/78/236' for local tables, or both if that is needed, e.g. '0/wmo/18,0/local/8/78/236'. Will for ECCODES then load all tables in the codetables subdirectory. If no arguments are provided, "load_Ctable()" will use BUFR section 1 information in the $bufr object to decide which table(s) to load. Will die (croak) if table cannot be found, but not if this is a local table and the corresponding master table exists, which then will be loaded instead. Returns table version for the table loaded.

Get next observation (next subset in current BUFR message or first subset in next message):

  ($data, $descriptors) = $bufr->next_observation();

where $descriptors is a reference to the array of fully expanded descriptors for this subset, $data is a reference to the corresponding values. This method is meant to be used to iterate through all BUFR messages in the file or input buffer (see "new") associated with the $bufr object, see example program in "SYNOPSIS". Whenever a new BUFR message is reached, section 0-3 will also be decoded, the contents of which is then available through the access methods listed below. This is the main BUFR decoding routine in Geo::BUFR, and will call "load_BDtables()" internally (unless decoding of section 4 has been turned off by use of "set_nodata" or "set_filter_db"), but not "load_Ctable". Consult "DECODING/ENCODING" if you want more precise info about what is returned in $data and $descriptors.

"next_observation" will return the empty list (so both $data and $descriptors will be undef) in the following cases: if there are no more BUFR messages in file/input buffer (so next call to "eof()" will return false), if no decoding of section 4 was requested in "set_nodata", if filtering was turned on in "set_filter_db" and the BUFR message met the filter criteria in the user defined callback function, or if the BUFR message contained 0 subsets. If you need to distinguish the first case from the rest, one way would be to check "get_current_subset_number()" which will return 0 only in this first case.

If an error is met during decoding, it is possible to trap the error in an eval and then continue calling "next_observation" (as demonstrated in source code of "bufrread.pl"). Care has been taken that BUFR messages with incorrectly stated BUFR length should not cause later proper BUFR messages to be skipped. But the possibility of an erroneous last BUFR message in file led to abandonment of the convenient feature retained until Geo::BUFR version 1:25 of "eof" always returning false if there were no more BUFR messages in file/input buffer. Instead you should expect last call to "next_observation" to return false (empty list).

Filter BUFR messages:

  $bufr->set_filter_cb(\&callback,@args);

Here user is responsible for defining the callback subroutine. This subroutine will then be called in "next_observation" (with arguments @args if provided) right after section 3 is decoded, and, if returning true, will cause "next_observation" to return immediately, without even trying to decode section 4 (the data section). Here is a simple example of such a callback (without arguments), filtering on AHL and Data category (table A) of the BUFR message.

  sub callback {
      my $obj = shift;
      return 1 if $obj->get_data_category != 0;
      my $ahl = $obj->get_current_ahl() || '';
      return ($ahl =~ /^IS.... (ENMI|TEST)/);
  }

Check result of filtering:

  $bufr->is_filtered();

Will return true (1) if "next_observation" returned immediately as described for "set_filter_cb" above. But calling "is_filtered" should rarely be needed, as in most cases the simple check 'next if !$data' after calling "next_observation" would be the natural way to proceed.

Print the contents of a subset in BUFR message:

  print $bufr->dumpsections($data,$descriptors,$options);

$options is optional. If this is first subset in message, will start by printing message number and, if this is first message in a GTS bulletin, AHL (Abbreviated Header Line), as well as contents of sections 0, 1 and 3. For section 4, will also print subset number. $options should be an anonymous hash with possible keys 'width' and 'bitmap', e.g. { width => 20, bitmap => 0 }. 'bitmap' controls which of "dumpsection4" and "dumpsection4_with_bitmaps" will be called internally by "dumpsections". Default value for 'bitmap' is 1, causing "dumpsection4_with_bitmaps" to be called. 'width' controls the value of $width used by the "dumpsection4..." methods, default is 15. If you intend to provide the output from "dumpsections" as input to "reencode_message", be sure to set 'bitmap' to 0, and 'width' not smaller than the largest data width in bytes among the descriptors with unit CCITTIA5 occuring in the message.

Normally "dumpsections" is called after "next_observation", with same arguments $data,$descriptors as returned from this call. From the examples given at <https://wiki.met.no/bufr.pm/start#bufrreadpl> you can get an impression of what the output might look like. If "dumpsections" does not give you exactly what you want, you might prefer to instead call the individual dumpsection methods below.

Print the contents of sections 0-3 in BUFR message:

  print $bufr->dumpsection0();
  print $bufr->dumpsection1();
  print $bufr->dumpsection2($sec2_code_ref);
  print $bufr->dumpsection3();

"dumpsection2" returns an empty string if there is no optional section in the message. The argument should be a reference to a subroutine which takes the optional section as (a string) argument and returns the text you want displayed after the 'Length of section:' line. For general BUFR messages probably the best you can do is displaying a hex dump, in which case

  sub {return '    Hex dump:' . ' 'x26 . unpack('H*',substr(shift,4))}

might be a suitable choice for $sec2_code_ref. For most applications there should be no real need to call "dumpsection2".

Print the data of a subset (descriptor, value, name and unit):

  print $bufr->dumpsection4($data,$descriptors,$width);
  print $bufr->dumpsection4_with_bitmaps($data,$descriptors,$width);

$width fixes the number of characters used for displaying the data values, and is optional (defaults to 15). $data and $descriptors are references to arrays of data values and BUFR descriptors respectively, likely to have been fetched from "next_observation". Code and flag values will be resolved if a C table has been loaded, i.e. if "load_Ctable" has been called earlier on. "dumpsection4_with_bitmaps" will display the bit-mapped values side by side with the corresponding data values. If there is no bit-map in the BUFR message, "dumpsection4_with_bitmaps" will provide same output as "dumpsection4". See "DECODING/ENCODING" for some more information about what is printed, and <https://wiki.met.no/bufr.pm/start#bufrreadpl> for real life examples of output.

Set verbose level:

  Geo::BUFR->set_verbose($level); # 0 <= $level <= 6
  $bufr->set_verbose($level);

Some info about what is going on in Geo::BUFR will be printed to STDOUT if $level > 0. With $level set to 1, all that is printed is the B, C and D tables used (with full path). Each line of verbose output starts with 'BUFR.pm: ', except for the level 6 specific output. Setting verbose level > 1 might be helpful when debugging, or for example if you want to extract as much information as possible from an incorrectly formatted BUFR message.

No decoding of section 4 (data section):

  Geo::BUFR->set_nodata($n);
 - $n=1 (or not provided): Skip decoding of section 4 (might speed up
   processing considerably if only metadata in section 1-3 is sought for)
 - $n=0: Decode section 4 (default in Geo::BUFR)

No decoding of quality information:

  Geo::BUFR->set_noqc($n);
 - $n=1 (or not provided): Don't decode quality information (more
   specifically: skip all descriptors after 222000)
 - $n=0: Decode quality information (default in Geo::BUFR)

Enable/disable strict checking of BUFR format for recoverable errors (like using BUFR compression for one subset message etc):

  Geo::BUFR->set_strict_checking($n);
 - $n=0: disable checking (default in Geo::BUFR)
 - $n=1: warn (carp) if error but continue decoding
 - $n=2: die (croak) if error

Confer "STRICT CHECKING" for details of what is being checked if strict checking is enabled.

Show all BUFR table C operators (data description operators, F=2) as well as all replication descriptors (F=1) when calling dumpsection4:

  Geo::BUFR->set_show_all_operators($n);
 - $n=1 (or not provided): Show replication descriptors and all operators
 - $n=0: Show no replication descriptors and only the really informative
         data description operators (default in Geo::BUFR)

set_show_all_operators(1) cannot be combined with "dumpsections" with bitmap option set (which is the default).

Set or get tableformat:

  Geo::BUFR->set_tableformat($tableformat);
  $tableformat = Geo::BUFR->get_tableformat();

Set or get tablepath:

  Geo::BUFR->set_tablepath($tablepath);
  $tablepath = Geo::BUFR->get_tablepath();

Get table version:

  $table_version = $bufr->get_table_version($table);

$table is optional. Return table version from $table if provided, or else from section 1 information in the currently processed BUFR message. For BUFRDC, this is a stripped down version of table name. If for example $table = 'B0000000000088013001.TXT', will return '0000000000088013001'. For ECCODES, this is last path of table location (e.g. '0/wmo/29'), and a stringified list of two such paths (master and local) if local tables are used (e.g. '0/wmo/29,0/local/8/78/236'). Returns undef if impossible to determine table version.

Get number of subsets:

  $nsubsets = $bufr->get_number_of_subsets();

Get current subset number:

  $subset_no = $bufr->get_current_subset_number();

If decoding of section 4 has been skipped (due to use of "set_nodata" or "set_filter_cb"), will return number of subsets. For a BUFR message with 0 subsets, will actually return 1 (a bit weird perhaps, but then this is a really weird kind of BUFR message to handle).

Get current message number:

  $message_no = $bufr->get_current_message_number();

Get current BUFR message:

    $binary_msg = get_bufr_message();

This returns the original raw (binary, not the decoded) BUFR message. An empty string will be returned if no BUFR message is found, or if the currently processed BUFR message is erroneous (even if section 4 is not decoded, there will at least be a check for finding '7777' at expected end of BUFR message, as calculated from length of BUFR message decoded from section 0).

Get Abbreviated Header Line (AHL) before current message:

  $ahl = $bufr->get_current_ahl();

If there is no AHL immediately preceding current message, default is for "get_current_ahl" to return undef. Sometimes that might not be what you want, e.g. when processing a file with GTS bulletins with possibly more than one BUFR message in each bulletin, and especially so if filtering on AHL using "set_filter_cb".

  Geo::BUFR->reuse_current_ahl($n);
 - $n=1 (or not provided): Will cause C<get_current_ahl> to return last
   AHL extracted and not undef if currently processed BUFR message has
   no (immediately preceding) AHL
 - $n=0: Reset C<get_current_ahl> to default behaviour as described
   above

Check if AHL has been reused:

   $bufr->ahl_is_reused();

Will return true (1) if the AHL returned by "get_current_ahl" is a reused one, i.e. the AHL is not immediately preceding the current BUFR message.

Check length of BUFR message (as stated in section 0):

    $bufr->bad_bufrlength();

Will return true (1) if no '7777' is found at the end of BUFR message (as calculated from the stated length of BUFR message in section 0), which usually means that the BUFR message is badly corrupted (e.g. truncated). But note that there should be no need to call "bad_bufrlength" if section 4 is decoded, as in this case you should expect "next_observation" to die with a more precise error message describing the kind of corruption found. If no decoding of section 4 is done (because "set_nodata" or "set_filter_cb" were called), however, "next_observation" is likely not to throw an error, and you can use "bad_bufrlength" to decide what to do next (see source code of "bufrextract.pl" for example of use).

Accessor methods for section 0-3:

  $bufr->set_<variable>($variable);
  $variable = $bufr->get_<variable>();

where <variable> is one of

  bufr_length (get only)
  bufr_edition
  master_table
  subcentre
  centre
  update_sequence_number
  optional_section (0 or 1)
  data_category
  int_data_subcategory
  loc_data_subcategory
  data_subcategory
  master_table_version
  local_table_version
  year_of_century
  year
  month
  day
  hour
  minute
  second
  local_use
  number_of_subsets
  observed_data (0 or 1)
  compressed_data (0 or 1)
  descriptors_unexpanded

set_year_of_century(0) will set year of century to 100. "get_year_of_century" will for BUFR edition 4 calculate year of century from year in section 1.

Encode a new BUFR message:

  $new_message = $bufr->encode_message($data_refs,$desc_refs);

where $desc_refs->[$i] is a reference to the array of fully expanded descriptors for subset number $i ($i=1 for first subset), $data_refs->[$i] is a reference to the corresponding values, using undef for missing values. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method. See "DECODING/ENCODING" for meaning of 'fully expanded descriptors'.

Encode a (single subset) NIL message:

  $new_message = $bufr->encode_nil_message($stationid_ref,$delayed_repl_ref);

$delayed_repl_ref is optional. In section 4 all values will be set to missing except delayed replication factors and the (descriptor, value) pairs in the hashref $stationid_ref. $delayed_repl_ref (if provided) should be a reference to an array of data values for all descriptors 031001 and 031002 occuring in the message (these values must all be nonzero), e.g. [3,1,2] if there are 3 such descriptors which should have values 3, 1 and 2, in that succession. If $delayed_repl_ref is omitted, all delayed replication factors will be set to 1. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method (although number of subsets and BUFR compression will automatically be set to 1 and 0 respectively, whatever value they had before).

Reencode BUFR message(s):

  $new_messages = $bufr->reencode_message($decoded_messages,$width);

$width is optional. Takes a text $decoded_messages as argument and returns a (binary) string of BUFR messages which, when printed to file and then processed by "bufrread.pl" with no output modifying options set (except possibly "--width"), would give output equal to $decoded_messages. If "bufrread.pl" is to be called with "--width $width", this $width must be provided to "reencode_message" also.

Join subsets from several messages:

 ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr_1,$subset_ref_1,
     ... $bufr_n,$subset_ref_n);

where each $subset_ref_i is optional. Will return the data and descriptors needed by "encode_message" to encode a multi subset message, extracting the subsets from the first message of each $bufr_i object. All subsets in (first message of) $bufr_i will be used, unless next argument is an array reference $subset_ref_i, in which case only the subset numbers listed will be included, in the order specified. On return $nsub will contain the total number of subsets thus extracted. After a call to "join_subsets", the metadata (of the first message) in each object will be available through the "get_"-methods, while a call to "next_observation" will start extracting the first subset in the first message. Here is an example of use, fetching first subset from bufr object 1, all subsets from bufr object 2, and subsets 4 and 2 from bufr object 3, then building up a new multi subset BUFR message (which will succeed only if the bufr objects all have the same descriptors in section 3):

  my ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr1,
      [1],$bufr2,$bufr3,[4,2]);
  my $new_bufr = Geo::BUFR->new();
  # Get metadata from one of the objects, then reset those metadata
  # which might not be correct for the new message
  $new_bufr->copy_from($bufr1,'metadata');
  $new_bufr->set_number_of_subsets($nsub);
  $new_bufr->set_update_sequence_number(0);
  $new_bufr->set_compressed_data(0);
  my $new_message = $new_bufr->encode_message($data_refs,$desc_refs);

Extract BUFR table B information for an element descriptor:

  ($name,$unit,$scale,$refval,$width) = $bufr->element_descriptor($desc);

Will fetch name, unit, scale, reference value and data width in bits for element descriptor $desc in the last table B loaded in the $bufr object. Returns false if the descriptor is not found.

Extract BUFR table D information for a sequence descriptor:

  @descriptors = $bufr->sequence_descriptor($desc);
  $string = $bufr->sequence_descriptor($desc);

Will return the descriptors in a direct (nonrecursive) lookup for the sequence descriptor $desc in the last table D loaded in the $bufr object. In scalar context the descriptors will be returned as a space separated string. Returns false if the descriptor is not found.

Resolve BUFR table descriptors (for printing):

  print $bufr->resolve_descriptor($how,@descriptors);

where $how is one of 'fully', 'partially', 'simply' and 'noexpand'. Returns a text string suitable for printing information about the BUFR table descriptors given. $how = 'fully': Expand all D descriptors fully into B descriptors, with name, unit, scale, reference value and width (each on a numbered line, except for replication operators which are not numbered). $how = 'partially': Like 'fully', but expand D descriptors only once and ignore replication. $how = 'noexpand': Like 'partially', but do not expand D descriptors at all. $how = 'simply': Like 'partially', but list the descriptors on one single line with no extra information provided. The relevant B/D table must have been loaded before calling "resolve_descriptor".

Resolve flag table value (for printing):

  print $bufr->resolve_flagvalue($value,$flag_table,$B_table,
                                 $default_B_table,$num_leading_spaces);

Last 2 arguments are optional. $default_B_table will be used if $B_table is not found, $num_leading_spaces defaults to 0. Examples:

  print $bufr->resolve_flagvalue(4,8006,'B0000000000098013001.TXT') # BUFRDC
  print $bufr->resolve_flagvalue(4,8006,'0/wmo/13')       # ECCODES, master table
  print $bufr->resolve_flagvalue(4,8193,'0/local/1/98/0') # ECCODES, local table

Print the contents of BUFR code (or flag) table:

  print $bufr->dump_codetable($code_table,$table,$default_table);

where in BUFRDC $table is (base)name of the C...TXT file containing the code tables, optionally followed by a default table which will be used if $table is not found.

"resolve_flagvalue" and "dump_codetable" will return empty string if flag value or code table is not found.

Manipulate binary data (these are implemented in C for speed and primarily intended as module internal subroutines):

  $value = Geo::BUFR->bitstream2dec($bitstream,$bitpos,$num_bits);

Extracts $num_bits bits from $bitstream, starting at bit $bitpos. The extracted bits are interpreted as a nonnegative integer. Returns undef if all bits extracted are 1 bits.

  $ascii = Geo::BUFR->bitstream2ascii($bitstream,$bitpos,$num_bytes);

Extracts $num_bytes bytes from bitstream, starting at $bitpos, and interprets the extracted bytes as an ascii string. Returns undef if the extracted bytes are all 1 bits.

  Geo::BUFR->dec2bitstream($value,$bitstream,$bitpos,$bitlen);

Encodes nonnegative integer value $value in $bitlen bits in $bitstream, starting at bit $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $value. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

  Geo::BUFR->ascii2bitstream($ascii,$bitstream,$bitpos,$width);

Encodes ASCII string $ascii in $width bytes in $bitstream, starting at $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $ascii. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

  Geo::BUFR->null2bitstream($bitstream,$bitpos,$num_bits);

Sets $num_bits bits in bitstream starting at bit $bitpos to 0 bits. Last byte affected will be padded with 1 bits. $bitstream must be at least $bitpos + $num_bits bits long. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

DECODING/ENCODING

The term 'fully expanded descriptors' used in the description of "encode_message" (and "next_observation") in "METHODS" might need some clarification. The short version is that the list of descriptors should be exactly those which will be written out by running "dumpsection4" (or "bufrread.pl" without any modifying options set) on the encoded message. If you don't have a similar BUFR message at hand to use as an example when wanting to encode a new message, you might need a more specific prescription. Which is that for every data value which occurs in the section 4 bitstream, you should include the corresponding BUFR descriptor, using the artificial 999999 for associated fields following the 204Y operator, and including the data operator descriptors 22[2345]000 and 23[2567]000 with data value set to the empty string, if these occurs among the descriptors in section 3 (rather: in the expansion of these, use "bufrresolve.pl" to check!). Element descriptors defining new reference values (following the 203Y operator) will have F=0 (first digit in descriptor) replaced with F=9 in "next_observation", while in "encode_message" both F=0 and F=9 will be accepted for new reference values. When encoding delayed repetition you should repeat the set of data (and descriptors) to be repeated the number of times indicated by 031011 or 031012 (if given the feedback that this is considered cumbersome, an option for including the set of data/descriptors just once might be added later, both for encoding end decoding).

Some words about the procedure used for decoding and encoding data in section 4 might shed some light on this choice of design.

When decoding section 4 for a subset, first of all the BUFR descriptors provided in section 3 are expanded as far as possible without looking at the actual bitstream, i.e. by eliminating nondelayed replication descriptors (F=1) and by using BUFR table D to expand sequence descriptors (F=3). Then, for each of the thus expanded descriptors, the data value is fetched from the bitstream according to the prescriptions in BUFR table B, applying the data operator descriptors (F=2) from BUFR table C as they are encountered, and reexpanding the remaining descriptors every time a delayed replication factor is fetched from bitstream. The resulting set of data values is returned in an array @data, with the corresponding B (and sometimes also some C) BUFR table descriptors in an array @descriptors. "next_observation" returns references to these two arrays. For convenience, some of the data operator descriptors without a corresponding data value (like 222000) are included in the @descriptors because they are considered to provide valuable information to the user, with corresponding value in @data set to the empty string. These descriptors without a value are written by the dumpsection4 methods on unnumbered lines, thereby distinguishing them from descriptors corresponding to 'real' data values in section 4, which are numbered consecutively.

Encoding a subset is done in a very similar way, by expanding the descriptors in section 3 as described above, but instead fetching the data values from the @data array that the user supplies (actually @{$data_refs->{$i}} where $i is subset number), and then finally encoding this value to bitstream.

The input parameter $desc_ref to "encode_message" is in fact not strictly necessary to be able to encode a new BUFR message. But there is a good reason for requiring it. During encoding the descriptors from expanding section 3 will consecutively be compared with the descriptors in the user supplied $desc_ref, and if these at some point differ, encoding will be aborted with an error message stating the first descriptor which deviated from the expected one. By requiring $desc_ref as input, the risk for encoding an erroneous section 4 is thus greatly reduced, and also provides the user with highly valuable debugging information if encoding fails.

When decoding character data (unit CCITTIA5), any null characters found are silently (unless $Strict_checking is set) removed, as well as leading and trailing white space.

BUFR TABLE FILES

The BUFR table files should follow the format and naming conventions used by one of these two ECMWF software packages: either BUFRDC (download from https://confluence.ecmwf.int/display/BUFR/Releases), or ecCodes (download from https://confluence.ecmwf.int/display/ECC/Releases).

The utility programs in Geo::BUFR will look for table files by default in the standard installation directories, which in Unix-like systems will be /usr/local/lib/bufrtables for BUFRDC and /usr/local/share/eccodes/definitions/bufr/tables for eCcodes. You can change that behaviour by either providing the environment variable BUFR_TABLES, or setting path explicitly by using the "--tablepath". Note that while BUFR_TABLES is a well known concept in BUFRDC software, the closest you get in eCcodes is probably ECCODES_DEFINITION_PATH (see e.g. https://confluence.ecmwf.int/display/ECC/BUFR%3A+Local+configuration), for which BUFR_TABLES should (or could) be set to ECCODES_DEFINITION_PATH/bufr/tables (again in Unix-like systems).

STRICT CHECKING

The package global $Strict_checking defaults to

  0: Ignore recoverable errors in BUFR format met during decoding or encoding

but can be changed to

  1: Issue warning (carp) but continue decoding/encoding

  2: Croak (die) instead of carp

by calling "set_strict_checking". The following is checked for when $Strict_checking is set to 1 or 2:

Total length of BUFR message as stated in section 0 bigger than actual length
Excessive bytes in section 4 (section longer than computed from section 3)
Compression set in section 3 for one subset message (BUFR reg. 94.6.3.2)
Bits 3-8 in octet 7 in section 3 not set to zero
Local reference value for compressed character data not having all bits set to zero (94.6.3.2.i)
Illegal flag values (rightmost bit set for non-missing values) (Note (9) to Table B in FM 94 BUFR)
Character data not being CCITTIA5 (Note (9) in FM 94 BUFR first page)
Null characters in CCITTIA5 data (Note (4) to Table B in FM 94 BUFR)
Missing CCITTIA5 value encoded as spaces
Invalid date and/or time in section 1
Cancellation operators (20[1-4]00, 203255 etc) when there is nothing to cancel
0 subsets in message. This may not break any formal rules, but is likely to cause problems in further data processing (and Geo::BUFR will not allow you to encode or reencode such a message anyway).
Leaving out descriptors to be repeated when corresponding delayed replication/repetition factor in section 4 is 0 and this is last data item. E.g. ending 'Data descriptors unexpanded' in section 3 with '106000 031001' when data value for 031001 is 0. This (mal)practice, however, defies the very point of replication operations (BUFR reg. 94.5.4). Presumably the purpose is to save some space in the BUFR message, but then why not leave out also '106000 031001' and the (0) data value for 031001?
Value encoded using BUFR compression which would be too big to encode without compression. For example, for a data descriptor with data width 9 bits a value of 510 ought to be the biggest value possible to encode, but in a multisubset message using BUFR compression it is possible to encode almost arbitrarily large values in single subsets as long as the average over all subsets is contained within 9 bits. This is not breaking any formal rules, but almost certainly not desirable.

Plus some few more checks not considered interesting enough to be mentioned here.

BUGS OR MISSING FEATURES

Some BUFR table C operators are not implemented or are untested, mainly because I do not have access to BUFR messages containing such operators. If you happen to come over a BUFR message which the current module fails to decode properly, I would therefore highly appreciate if you could mail me this.

AUTHOR

Pål Sannes <pal.sannes@met.no>

CREDITS

I am very grateful to Alvin Brattli, who (while employed as a researcher at the Norwegian Meteorological Institute) wrote the first version of this module, with the sole purpose of being able to decode some very specific BUFR satellite data, but still provided the main framework upon which this module is built.

COPYRIGHT

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.