marc2ris - converts MARC bibliographic data to the RIS format
marc2ris [-e log-destination] [-h]
[-l log-level] [-L log-file] [-m]
[-o outfile] [-O outfile]
[-t input_type] [-u t|f] file
marc2ris attempts to extract the information useful to RefDB from
MARC datasets. MARC (Machine Readable Catalogue Format) is a standard
originating from the 1960s and is widely used by libraries and bibliographic
agencies. Most libraries that offer Z39.50 access can provide the records in
at least one MARC format (like with most other "standards" there's
a couple to choose from). Currently the following MARC dialects are
supported:
- MARC21
- This is an attempt to consolidate existing MARC variants (mainly USMARC
and CANMARC) and will most likely be the format supported by all libraries
in the near future. The format is described on the [1]Library of
Congress MARC pages.
- UNIMARC
- This is the European equivalent of a standardization attempt. The
specification can be found [2]here.
- UKMARC
- This format is fairly close to the USMARC variant and is mainly used by
libraries in the United Kingdom and in Ireland. Libraries supporting this
format may switch to MARC21 in the future. Unfortunately there is no
online description of this format, but this [3]PDF document
describes the main differences between USMARC and UKMARC.
By default the script reads USMARC data from stdin and sends RIS
data to stdout.
- -e
log-destination
- log-destination can have the values 0, 1, or 2, or the equivalent strings
stderr, syslog, or file, respectively. This value
specifies where the log information goes to. 0 (zero) means the messages
are sent to stderr. They are immediately available on the screen but they
may interfere with command output. 1 will send the output to the syslog
facility. Keep in mind that syslog must be configured to accept log
messages from user programs, see the syslog(8) man page for further
information. Unix-like systems usually save these messages in
/var/log/user.log. 2 will send the messages to a custom log file
which can be specified with the -L option.
- -h
- Displays help and usage screen, then exits.
- -l log-level
- Specify the priority up to which events are logged. This is either a
number between 0 and 7 or one of the strings emerg, alert,
crit, err, warning, notice, info,
debug, respectively (see also Log level definitions). -1
disables logging completely. A low log level like 0 means that only the
most critical messages are logged. A higher log level means that less
critical events are logged as well. 7 will include debug messages. The
latter can be verbose and abundant, so you want to avoid this log level
unless you need to track down problems.
- -L log-file
- Specify the full path to a log file that will receive the log messages.
Typically this would be /var/log/refdba.
- -m
- Switch on additional MARC output. The output data will be the RIS output
interspersed with the source MARC data used to generate the output. This
is useful to fix conversion errors manually.
- -o file
- Send output to file. If file exists, its contents will be
overwritten.
- -O file
- Send output to file. If file exists, the output will be
appended.
- -t input_type
- Specify the MARC input type. The default is MARC21. Other available
types are UNIMARC and UKMARC.
- -u t|f
- Request Unicode output if set to "t" (this is the default).
marc2ris attempts to convert the input data into Unicode (unless the
dataset explicitly states that it already uses Unicode). If the conversion
does not seem to work, set this to "f" as some MARC variants do
not state the character encoding explicitly.
marc2ris evaluates the file marc2risrc to initialize
itself.
Table 1. marc2risrc
Variable |
Default |
Comment |
outfile |
(none) |
The default output file name. |
outappend |
t |
Determines whether output is appended (t) to an existing file or
overwrites (f) an existing file. |
unmapped |
t |
If set to t, unknown tags in the input data will be output
following a <unmapped> tag; the resulting data can be inspected and
then be sent through sed to strip off these additional lines. If
set to f, unknown tags will be gracefully ignored. |
logfile |
/var/log/med2ris.log |
The full path of a custom log file. This is used only if logdest is set
appropriately. |
logdest |
1 |
The destination of the log information. 0 = print to stderr; 1 = use the
syslog facility; 2 = use a custom logfile. The latter needs a proper
setting of logfile. |
loglevel |
6 |
The log level up to which messages will be sent. A low setting (0)
allows only the most important messages, a high setting (7) allows all
messages including debug messages. -1 means nothing will be logged. |
The purpose of the MARC format is entirely different from the
purpose of the RIS format, so you shouldn't be too surprised that the import
of MARC data is somewhat rough at the edges. The filter apparently deals
fine with quite a lot of datasets, but the following shortcomings are known
(and more are likely to be discovered by the interested reader):
- •
- Some fields, like 846, are currently ignored completely. This, of course,
is bound to change.
- •
- Author names specified in the natural order, i.e. something like First
Middle Last, are not normalized due to the problems with multiple middle
or last names. Author names in the inverse order, i.e. something like
Last, First Middle, are normalized correctly in most cases. Handling of
non-European names is a matter of trial and error.
- •
- Character set handling is somewhat limited. Only the unaltered input
character encoding or UTF-8 are available for the output data.
That said, there is still some hope. The -m command line
option switches on additional MARC output. That is, the generated output
will contain interspersed lines that show the contents of the original MARC
fields used to generate the following RIS line or lines. For example, the
following output snippet shows how marc2ris generated the author
lines from the MARC input:
<marc>empty author field (100)
<marc>:Author(Ind1): 1
<marc>:Author($a): Ershov, A. P.
<marc>:Author($b):
<marc>:Author($c):
<marc>:Author(Ind1): 1
<marc>:Author($a): Knuth, Donald Ervin,
<marc>:Author($b):
<marc>:Author($c):
AU - Ershov,A.P.
AU - Knuth,Donald Ervin
If you feel marc2ris does not translate your data appropriately,
the easiest way might be to use the -m switch and redirect the output
into a file. Then you can analyze the situation and fix the RIS lines as you
see fit. Finally you can strip the MARC lines off with a command like:
~$ grep -v "<marc>" < withmarc.ris > womarc.ris
- /usr/local/etc/refdb/marc2risrc
- The global configuration file of marc2ris.
- $HOME/.marc2risrc
- The user configuration file of marc2ris.
RefDB (7), bib2ris (1), db2ris (1),
en2ris (1), med2ris (1).
RefDB manual (local copy)
<prefix>/share/doc/refdb-<version>/refdb-manual/index.html
RefDB manual (web)
<[4]http://refdb.sourceforge.net/manual/index.html>
RefDB on the web
<[5]http://refdb.sourceforge.net/>
marc2ris was written by Markus Hoenicka
<markus@mhoenicka.de>.
- 1. Library of Congress MARC pages
- http://www.loc.gov/marc/
- 2. here
- http://www.ifla.org/VI/3/p1996-1/sec-uni.htm
- 3. PDF document
- www.bl.uk/services/bibliographic/marcchange.pdf
- 4. http://refdb.sourceforge.net/manual/index.html
- http://refdb.sourceforge.net/manual/index.html
- 5. http://refdb.sourceforge.net/
- http://refdb.sourceforge.net/