Manual Reference Pages - UTF (3)
runetochar, chartorune, runelen, fullrune, utflen, utfrune, utfrrune, utfutf - Unicode Text Format functionality
int runetochar(char *cp, Rune *rp);
int chartorune(Rune *rp, char *cp);
int runelen(long r);
int fullrune(char *cp, int n);
int utflen(char *s);
int utfbytes(char *s);
char *utfrune(char *cp, long r);
char *utfrrune(char *cp, long r);
char *utfutf(char *big, char *little);
int utf_snprintf(char *buf, size_t size, char *format, ...);
int utfcmp(char *s1, char *s2);
int utfncmp(char *s1, char *s2, int rc);
char *utfcpy(char *dst, char *src);
char *utfncpy(char *dst, char *src, int nbytes);
char *utfcat(char *src, char *append);
char *utfncat(char *src, char *append, int nbytes);
The UTF routines are used to pack the Unicode text encoding into
a standard character stream.
To do that effectively, ASCII characters form the lowest 127 characters
of UTF-8. These characters are interchangeable between the two character
A Rune is a Unicode character, defined in the header file utf.h.
runetochar translates a single Rune to a UTF sequence
and returns the number of bytes produced. chartorune
is the inverse of this function, returning the number of
runelen returns the number of bytes in the encoding
of a Rune.
fullrune checks that the first n bytes of the
UTF string cp contain a complete UTF encoding.
utflen returns the number of runes in a UTF string.
utbytes returns the number of bytes in a UTF string.
utfrune returns a pointer to the first occurrence of
a rune in a UTF string.
utfrrune returns a pointer to the last.
utfutf searches for the first occurrence of a UTF string
in another UTF string.
utf_snprintf is a prticularly dumb implementation of snprintf
for utf strings - it only interprets %%, %s and %d sequences in the
format string, and does no field width calculation on those.
utfcmp compares two strings lexicographically, Rune by Rune,
and returns a value greater than 0, equal to zero, or less than zero
depending on whether the first UTF string is greater than, the
same as, or less than the second string.
utfncmp does the same comparison as utfcmp, with a maximum
upper bound of rc Runes.
utfcpy copies from source to destination, Rune by Rune,
and returns its destination string. No bounds checking is done
on the number of Runes copied, or their individual sizes.
The dst argument is returned.
utfncpy copies at most nbytes bytes from source to destination,
terminating when a null Rune is found in the source. If the number of
bytes copied is less than nbytes, then the destination string is
paddedf with null (0) bytes. If it is equal to or greater than nbytes,
no zero bytes is added.
The dst argument is returned.
utfcat appends the UTF string append onto the UTF string src.
utfncat appends the UTF string append onto the UTF string src,
bearing in mind that the buffer src is only nbytes long.
This implementation of UTF, nominally UTF-8, can encode a null Unicode
character using a one-byte or a two-byte encoding.
Typically, Plan 9 uses a one-byte encoding, whilst Java uses a two-byte
Plan 9 type encoding makes backwards compatibility much easier, and loses
nothing - all the Java functionality is there, there are no embedded
null bytes in a UTF string, due to the encoding of second and third characters,
and ordinary C strings are recognised as well, which is not the case in Java.
By default, a one byte Null-byte encoding is used.
UTF-8 is defined
in X/Open Company Ltd., "File System Safe UCS Transformation Format (FSS_UTF)",
X/Open Preliminary Specification, Document Number: P316, which also appears
in ISO/IEC 10646, Annex P.
Undoubtably, these are many, and legion.
Written by Alistair Crooks (firstname.lastname@example.org, or email@example.com),
from a draft document written by Rob Pike and Ken Thompson, detailing
the implementation of UTF in the Plan 9 operating system.
Visit the GSP FreeBSD Man Page Interface.
Output converted with manServer 1.07.