GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
man(1) chkascii man page man(1)

chkascii - a small C program that checks whether a file is purely ASCII plain-text or else contains any non-ASCII characters.

The ASCII system uses a 7-bit representation in which every byte's left-most bit is reserved. This allows for the character value range 0-127, many of which are control characters (for instance 0) not found in actual, regular text files.

Characters outside of the US-ASCII character set need to be specially encoded when they get into a plain text file. So if a plain text file has the German character 'ö' (o-Umlaut), the character would need to be encoded at the time of writing, and then decoded at the time of reading. Text-encoding schemes are of two varieties: some such as ISO-8859-1 extend the traditional ASCII set by using 8-bit representation; others such as the Unicode family use 2 or 4 bytes per character. As an example, 'ö' (o-Umlaut) uses 1 byte in ISO-8859-1 (decimal 246) but 2 bytes in UTF-8 (the first byte holding decimal 195; the second byte holding decimal 182).

chkascii treats valid ASCII values in a text file as the austere set:

{ 32-126 / 9 (horizontal tab) / 10 (line feed a.k.a. Unix newline) }

Anything else is considered junk, although the user can pass in additional ASCII values to treat as good.

chkascii additionally checks whether the file is EOL-terminated. If the file is neither empty nor terminated with ASCII 10, chkascii treats the condition as an error too.

The command below asks <file> to be checked as per the built-in scheme, stopping on the first bad ASCII character:

chkascii <file>

The command above uses a default maximum-junk (maxjunk) value of 1. If you pass in a maxjunk value of 0, the application treats the upper limit of reportable junk as infinite:

chkascii --maxjunk=0 <file>

--summary keeps chkascii going till EOF, whence a summary is printed.
(Implicitly uses '--maxjunk=0')

--quiet asks chkascii to suppress normal output.
(Implicitly uses '--maxjunk=1')

These options are mutually exclusive: --maxjunk | --summary | --quiet

--accept forces an ASCII value to be accepted as good. The command below asks ASCII values 13 (CR) and 11 (VT) to be treated as good:

chkascii --accept=13,11 <file>

-1 is returned for operational failure. Because the operating system might map return values to unsigned char, the shell could pick up 255.

If the file (contains all good ASCII values and is EOL-terminated) or else (is empty), 0 is returned.

All other cases are treated as errors, and a byte (initially 8 off-bits) is adjusted at the ends (as explained underneath) and returned.

If the file has junk, the rightmost bit of the byte (equivalent to 1) is turned on.

If the file is not EOL-terminated, the left-most bit of the byte (equivalent to 128) is turned on.

`man 7 ascii` and xascii(1) are the standard tools for printing the table of ASCII values and symbols.

vim(1) has the Normal command ga which displays the ASCII value of the character under the cursor.

od(1) can be used to print the (decimal) ASCII values in a file:

od -t u1 <file>

The Ctrl+Q subcommand of hexedit(1) can be used to insert (hexadecimal) ASCII half-chars (0 through f).

Besides many editors, unix2dos(1) and flip(1) can convert text files between Unix (LF) and DOS (CRLF) line-endings.

chkascii does not read standard input.

If you pass in multiple files, the first path is taken as the argument.
(Other paths are silently ignored)

Optional arguments can be specified a maximum of once.

Manish Jain (bourne.identity@hotmail.com)
15 November, 2020 2.4

Search for    or go to Top of page |  Section 1 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.