Kasoft Typesetting

The Rules of Typesetting

Appendix A—ASCII Gamut

The following contains some suggestions as to which characters should and should not be used in a variety of situations when a decision has been made to limit a document to the ASCII character set. This appendix also gives valid uses for these characters in ISO-8859 and Unicode documents.

The time of ASCII has long since passed. Its use should only continue in legacy documents and perhaps in the internal data representations of software where changing character set would carry a compatibility cost. If at all possible, a means of inserting Unicode characters should be provided, perhaps through escape sequences or control codes. For example, the KaData Module Format from Kasoft Software allows Unicode characters in what are ISO-8859-1 documents through the use of \uXXXX escapes.

Control codes

Control codes rarely have well defined meanings across operating systems. Thankfully, they are ignored by most computer language parsers. LF can be used for line endings, and FF is sometimes used for page breaks (e.g. in the IETF RFCs.)

Do not use other control characters. This includes TAB, as it is interpreted differently by nearly all text editors.

None of these characters have Unicode names—their traditional ASCII names are given instead.

U+0000 NUL Do nothing. Also known as "null."
U+0001 SOH Unprintable control code.
Also known as "start-of-heading."
U+0002 STX Unprintable control code.
Also known as "start-of-text."
U+0003 ETX Unprintable control code.
Also known as "end-of-text."
U+0004 EOT Unprintable control code.
Also known as "end-of-transmission."
U+0005 ENQ Unprintable control code.
Also known as "enquiry."
U+0006 ACK Enable display device on RISC OS.
Also known as "acknowledge."
U+0007 BEL Ring terminal bell. Also known as "bell."
U+0008 BS Move cursor left. Also known as "backspace."
U+0009 TAB Move cursor right. For horizontal tabulation.
U+000A LF End of line. Move cursor down. Also known as "line feed."
U+000B VT Move cursor up. For vertical tabulation.
U+000C FF Start a new page or clear the screen.
Also known as "form feed."
U+000D CR Move cursor to beginning of line on RISC OS.
Also known as "carriage return."
U+000E SO Unprintable control code.
Also known as "shift-out."
U+000F SI Unprintable control code.
Also known as "shift-in."
U+0010 DLE Unprintable control code.
Also known as "delete-last-entry."
U+0011 DC1 Unprintable control code.
Also known as "device-control-one."
U+0012 DC2 Unprintable control code.
Also known as "device-control-two."
U+0013 DC3 Unprintable control code.
Also known as "device-control-three."
U+0014 DC4 Unprintable control code.
Also known as "device-control-four."
U+0015 NAK Disable display device on RISC OS.
Also known as "negative-acknowledge."
U+0016 SYN Unprintable control code.
Also known as "synchronous-idle."
U+0017 ETB Unprintable control code.
Also known as "end-of-transmission-block."
U+0018 CAN Unprintable control code.
Also known as "cancel."
U+0019 EM Unprintable control code.
Also known as "end-of-medium."
U+001A SUB Unprintable control code.
Also known as "substitute."
U+001B ESC Unprintable control code.
Also known as "escape."
U+001C FS Unprintable control code.
Also known as "file separator."
U+001D GS Unprintable control code.
Also known as "group separator."
U+001E RS Unprintable control code.
Also known as "record separator."
U+001F US Unprintable control code.
Also known as "unit separator."
U+007F DEL Unprintable control character.
Also known as "delete."

Printable characters

U+0020
 
SPACE Word separator.
U+0021
!
EXCLAMATION MARK Sentence separator. Application prefix in RISC OS.
Boolean "not" operator in Java and C.
Also known as "pling" and "bang."
U+0022
"
QUOTATION MARK Quote. Demarcates character string in:
RISC OS Supervisor, POSIX shell, DR DOS, BASIC, Java, C, etc.
Also known as "double-quote" and "inverted commas."
U+0023
#
NUMBER SIGN Number prefix. Abbreviation for pound (weight.)
Filename wildcard in POSIX shell, CP/M, DR DOS, etc.
Comment prefix in many scripts, including POSIX shell.
Also known as "hash."
U+0024
$
DOLLAR SIGN Currency prefix. Disc root directory in RISC OS.
String variable suffix in BASIC.
Command to move to end of line in vi editor.
U+0025
%
PERCENT SIGN Percentage suffix. Library directory in RISC OS.
Integer variable suffix in BASIC.
Modulus operator in Java and C.
Wildcard in SQL and Tutorial D.
Escape character in URLs.
U+0026
&
AMPERSAND Abbreviation for "and." User directory in RISC OS.
Hexadecimal number prefix in: RISC OS *Eval, BBC BASIC, etc.
Part of entity reference in SGML and applications.
Also known as "and sign."
U+0027
'
APOSTROPHE Word punctuation denoting ownership or missing part.
Abbreviation for foot (distance) and minute (time.)
Surrogate for single quote (both sexes.) Surrogate for prime.
Demarcates character string in: SQL, Tutorial D, Pascal,
Modula-2, Oberon etc. Demarcates single character value in:
Java, C etc.
U+0028
(
LEFT PARENTHESIS Bracket. Also known as "opening parenthesis."
U+0029
)
RIGHT PARENTHESIS Bracket. Also known as "closing parenthesis."
U+002A
*
ASTERISK Footnote marker. Arithmetic operator.
RISC OS Supervisor command prompt.
RISC OS command prefix. Filename wildcard in:
RISC OS, POSIX shell, CP/M, DR DOS, etc.
Regular expression "zero or more."
Multiplication operator in:
RISC OS *Eval command, BASIC, Java, C, etc.
Also known as "star."
U+002B
+
PLUS SIGN Number prefix or arithmetic operator.
Surrogate for dagger (footnote marker.)
Operator used to assign permissions to the user, group or others in POSIX chmod command.
U+002C
,
COMMA Internal punctuation.
U+002D
-
HYPHEN-MINUS Number prefix, arithmetic operator or internal punctuation.
Surrogate for hyphen and dashes. Surrogate for bullet.
Command switch for RISC OS Supervisor and POSIX.
Operator used to revoke permissions from the user, group or others in POSIX chmod command.
U+002E
.
FULL STOP Sentence separator. Part of an ellipsis.
Surrogate for decimal point.
Directory separator on RISC OS. Extension separator on:
CP/M, DR DOS, POSIX, etc.
Regular expression "any character."
Also known as "period."
U+002F
/
SOLIDUS List or alternative word separator. Arithmetic operator.
Directory separator in POSIX. Extension separator in RISC OS.
Division operator in:
RISC OS *Eval command, BASIC, Java, C, etc.
(Mandatory) separator between user and public permissions in RISC OS directory displays and (optional) *Access command.
Command to search in vi editor.
Also known as "slash."
U+0030
0
DIGIT ZERO Arabic digit 0. Signifies no value for numbers of all radices.
U+0031
1
DIGIT ONE Arabic digit 1. Highest place value in radix 2 (binary.)
U+0032
2
DIGIT TWO Arabic digit 2. Highest place value in radix 3 (ternary.)
U+0033
3
DIGIT THREE Arabic digit 3. Highest place value in radix 4 (quarternary.)
U+0034
4
DIGIT FOUR Arabic digit 4.
U+0035
5
DIGIT FIVE Arabic digit 5.
U+0036
6
DIGIT SIX Arabic digit 6.
U+0037
7
DIGIT SEVEN Arabic digit 7. Highest place value in radix 8 (octal.)
U+0038
8
DIGIT EIGHT Arabic digit 8.
U+0039
9
DIGIT NINE Arabic digit 9. Highest place value in radix 10 (decimal.)
U+003A
:
COLON Sentence separator or internal punctuation. Path separator in POSIX.
Used to enter commands into the vi editor's visual command buffer. (See entries for "w" and "q.")
U+003B
;
SEMICOLON List separator. Path separator in RISC OS and DR DOS.
Statement separator in:
Java, C, Pascal, Modula-2, Oberon, etc.
Part of entity reference in SGML and applications.
U+003C
<
LESS-THAN SIGN Part of an algebraic relation. Surrogate for left angle bracket.
Standard input redirection operator in:
RISC OS, POSIX shell, Windows NT.
Demarcates mark-up in SGML and applications.
U+003D
=
EQUALS SIGN Part of an algebraic equation. Assignment and equality test in BASIC.
Assignment operator only in Java and C.
Escape character in MIME messages.
U+003E
>
GREATER-THAN SIGN Part of an algebraic relation. Surrogate for right angle bracket.
BBC BASIC command prompt.
Standard output and error redirection operator in:
RISC OS, POSIX shell, Windows NT.
Demarcates mark-up in SGML and applications.
U+003F
?
QUESTION MARK Sentence separator. Regular expression "zero or one."
Filename wildcard in RISC OS and POSIX.
Also a wildcard in CP/M and DR DOS, but with a slightly different meaning.
U+0040
@
COMMERCIAL AT Abbreviation for "at." Current directory in RISC OS.
Used in SMTP e-mail addresses.
U+0041
A
LATIN CAPITAL LETTER A Roman letter A. SI unit of electric current: ampere.
U+0042
B
LATIN CAPITAL LETTER B Roman letter B. One byte.
U+0043
C
LATIN CAPITAL LETTER C Roman letter C. The set of complex numbers. Roman numeral 100.
Avoid usage as surrogate for copyright.
Command to replace ("change") rest of line in vi editor.
U+0044
D
LATIN CAPITAL LETTER D Roman letter D. Roman numeral 500.
Suffix for double precision FP value in C and Java.
Abbreviation for "directory" in RISC OS directory displays.
U+0045
E
LATIN CAPITAL LETTER E Roman letter E. Common variable representing the energy of a system.
Mantissa/exponent separator in many programming languages.
SI unit prefix for exa (US quintillion.)
U+0046
F
LATIN CAPITAL LETTER F Roman letter F. Faraday constant. Highest place value (15) in radix 16 (hexadecimal.)
Common variable representing force applied to a body.
U+0047
G
LATIN CAPITAL LETTER G Roman letter G. Gravitational constant.
SI unit prefix for giga (US billion.)
U+0048
H
LATIN CAPITAL LETTER H Roman letter H.
U+0049
I
LATIN CAPITAL LETTER I Roman letter I. Roman numeral 1.
U+004A
J
LATIN CAPITAL LETTER J Roman letter J. Avoid usage for the set of integers: use Z instead.
Command to join lines in vi editor.
U+004B
K
LATIN CAPITAL LETTER K Roman letter K. SI unit of temperature: kelvin.
U+004C
L
LATIN CAPITAL LETTER L Roman letter L. Roman numeral 50. Avoid abbreviation for metric litre.
Suffix for long integer value in C and Java.
Abbreviation for "locked" in RISC OS directory displays.
U+004D
M
LATIN CAPITAL LETTER M Roman letter M. Roman numeral 1000. SI unit prefix for mega (million.)
U+004E
N
LATIN CAPITAL LETTER N Roman letter N. The set of natural (or cardinal) numbers.
SI unit of force: newton.
Command to move to previous match after search command in vi editor.
U+004F
O
LATIN CAPITAL LETTER O Roman letter O. The "big-oh" function.
Avoid usage as surrogate for circle: use "o" instead.
Command to insert line (above) in vi editor.
U+0050
P
LATIN CAPITAL LETTER P Roman letter P. Commonly used notation for variable or function representing probability (of an event.)
SI unit prefix for peta (US quadrillion.)
U+0051
Q
LATIN CAPITAL LETTER Q Roman letter Q. The set of rational numbers.
U+0052
R
LATIN CAPITAL LETTER R Roman letter R. The set of real numbers. Molar gas constant.
Abbreviation for (owner) "read" access in RISC OS directory displays.
U+0053
S
LATIN CAPITAL LETTER S Roman letter S.
U+0054
T
LATIN CAPITAL LETTER T Roman letter T. SI unit prefix for tera (US trillion.)
U+0055
U
LATIN CAPITAL LETTER U Roman letter U. Command to undo recent changes to current line in vi editor.
U+0056
V
LATIN CAPITAL LETTER V Roman letter V. Roman numeral 5.
U+0057
W
LATIN CAPITAL LETTER W Roman letter W. Abbreviation for (owner) "write" access in RISC OS directory displays.
U+0058
X
LATIN CAPITAL LETTER X Roman letter X. Roman numeral 10. Delete previous character command in vi editor.
The command which suppresses errors in RISC OS, named for the prefix "X" used on non-error returning software interrupt calls.
U+0059
Y
LATIN CAPITAL LETTER Y Roman letter Y. SI unit prefix for yotta (US septillion.)
U+005A
Z
LATIN CAPITAL LETTER Z Roman letter Z. The set of integers.
SI unit prefix for zetta (US sexillion.)
U+005B
[
LEFT SQUARE BRACKET Bracket. Used in Backus Naur Form "optional."
Demarcates indirect addressing in ARM assembly.
Begin assembly part in BBC BASIC.
Part of array index in Java, C, Pascal, Modula-2, etc.
Also known as "left" or "closing," "box" or "square bracket."
U+005C
\
REVERSE SOLIDUS Directory separator in CP/M, DR DOS etc.
Also known as "backslash."
U+005D
]
RIGHT SQUARE BRACKET Bracket. Used in Backus Naur Form "optional."
Demarcates indirect addressing in ARM assembly.
End assembly part in BBC BASIC.
Part of array index in Java, C, Pascal, Modula-2, etc.
Also known as "right" or "closing," "box" or "square bracket."
U+005E
^
CIRCUMFLEX ACCENT Ornament. Do not use as horizontal rule.
Sometimes used as a proofreader's insertion point.
Exponentiation operator in BBC BASIC.
Bitwise "exclusive-or" operator in Java and C.
Pointer dereference in Pascal, Modula-2, Oberon, etc.
Sometimes rendered as upward arrow.
Also known as "hat" or "caret."
U+005F
_
LOW LINE Ornament. Do not use as horizontal rule.
Word separator in variable names for:
BASIC, Java, C, Pascal, Modula-2, Oberon, etc.
Wildcard in SQL and Tutorial D.
Also known as "underscore" or "underline."
U+0060
`
GRAVE ACCENT Ornament. Do not use as quotation mark.
Part of nested command in POSIX shell.
Also known as "backtick."
U+0061
a
LATIN SMALL LETTER A Lower case Roman letter a.
Common variable representing acceleration of a body.
SI unit prefix for atto (US quintillionth.)
Abbreviation for "all" (i.e. user, group and others) in POSIX chmod command.
Append text command in vi editor.
U+0062
b
LATIN SMALL LETTER B Lower case Roman letter b. One bit.
Command to move back one word in vi editor.
U+0063
c
LATIN SMALL LETTER C Lower case Roman letter c. Speed of light in a vacuum.
Avoid SI unit prefix centi (hundredth.)
Surrogate for copyright.
Command to replace (line or word) in vi editor. In this sense, the letter "c" is an abbreviation for "change".)
U+0064
d
LATIN SMALL LETTER D Lower case Roman letter d. Notation for "a small change" in calculus.
Avoid SI unit prefix deci (tenth.) Avoid SI unit prefix deka (ten.)
Abbreviation for "directory" in POSIX directory displays.
Delete (line or word) command in vi editor.
U+0065
e
LATIN SMALL LETTER E Lower case Roman letter e. The natural number. Symbol for electron.
U+0066
f
LATIN SMALL LETTER F Lower case Roman letter f. Mathematical notation representing a function.
SI unit prefix for femto (US quadrillionth.)
U+0067
g
LATIN SMALL LETTER G Lower case Roman letter g. Agreed constant of acceleration due to gravity.
SI unit of mass: gram.
U+0068
h
LATIN SMALL LETTER H Lower case Roman letter h. Planck's constant.
Avoid SI unit prefix hecto (hundred.)
Move cursor left command in vi and other editors. (The letter "h" is on the home row for qwerty keyboards, which is why vi uses this for cursor motion.)
U+0069
i
LATIN SMALL LETTER I Lower case Roman letter i. The imaginary number. Common variable ranging over the integers.
Insert text command in vi editor.
U+006A
j
LATIN SMALL LETTER J Lower case Roman letter j.
Move cursor down in vi editor. (The letter "j" is on the home row for qwerty keyboards, which is why vi uses this for cursor motion.)
U+006B
k
LATIN SMALL LETTER K Lower case Roman letter k. Boltzmann constant.
SI unit prefix for kilo (thousand.)
Move cursor up in vi editor. (The letter "k" is on the home row for qwerty keyboards, which is why vi uses this for cursor motion.)
U+006C
l
LATIN SMALL LETTER L Lower case Roman letter l. Avoid metric litre.
Abbreviation for "locked" in RISC OS *Access command.
Move cursor right in vi and other editors. (The letter "l" is on the home row for qwerty keyboards, which is why vi uses this for cursor motion.)
U+006D
m
LATIN SMALL LETTER M Lower case Roman letter m. Common variable representing the mass of a body.
SI unit prefix for milli (thousandth.) SI unit of distance: metre.
U+006E
n
LATIN SMALL LETTER N Lower case Roman letter n. Common variable ranging over the natural numbers.
SI unit prefix for nano (US billionth.)
Command to move to next match after search command in vi editor.
U+006F
o
LATIN SMALL LETTER O Lower case Roman letter o. The "little-oh" function. Surrogate for circle.
Abbreviation for "others" in POSIX chmod command.
Command to insert line (below) in vi editor.
U+0070
p
LATIN SMALL LETTER P Lower case Roman letter p. Common variable representing the momentum of a body.
SI unit prefix for pico (US trillionth.)
U+0071
q
LATIN SMALL LETTER Q Lower case letter q. Quit command in many POSIX programs; in the vi editor, this command may be typed in the minibuffer, although it is only effective if followed by an exclamation point or if the file has been saved (via the "w" buffer command.)
U+0072
r
LATIN SMALL LETTER R Lower case Roman letter r. Abbreviation for "read" in POSIX directory displays, POSIX chmod command and RISC OS *Access command.
U+0073
s
LATIN SMALL LETTER S Lower case Roman letter s. SI unit of time: second.
U+0074
t
LATIN SMALL LETTER T Lower case Roman letter t. Common variable representing elapsed time.
Avoid abbreviation for metric tonne.
U+0075
u
LATIN SMALL LETTER U Lower case Roman letter u. Avoid usage as a surrogate for SI unit prefix micro (millionth.)
Abbreviation for "user" in POSIX chmod command. Undo command in vi editor.
U+0076
v
LATIN SMALL LETTER V Lower case Roman letter v. Common variable representing velocity of a body.
U+0077
w
LATIN SMALL LETTER W Lower case Roman letter w. Abbreviation for "write" in POSIX directory displays, POSIX chmod command and RISC OS *Access command.
Abbreviation for "word" (command to move forward by one word, for example) in vi editor; also used to save ("write") the current file from the visual command buffer.
U+0078
x
LATIN SMALL LETTER X Lower case Roman letter x. The unknown quantity in equations. Variable used to measure along the horizontal axis.
Abbreviation for "execute" in POSIX directory displays and POSIX chmod command. Delete current character command in vi editor.
U+0079
y
LATIN SMALL LETTER Y Lower case Roman letter y. Variable used to measure along the vertical axis.
SI unit prefix for yocto (septillionth.)
U+007A
z
LATIN SMALL LETTER Z Lower case Roman letter z. Variable used to measure along the depth axis.
SI unit prefix for zepto (sexillionth.)
U+007B
{
LEFT CURLY BRACKET Ornate bracket. Redirection demarcation in RISC OS.
Used in Backus Naur From extension "zero or more."
Marks beginning of statement block, class etc. in Java.
Also known as "open curly bracket" or "brace."
U+007C
|
VERTICAL LINE Ornate separator. Comment prefix in RISC OS Supervisor.
Backus Naur Form "choice." POSIX shell "pipe."
Also known as "vertical bar."
U+007D
}
RIGHT CURLY BRACKET Ornate bracket. Redirection demarcation in RISC OS.
Used in Backus Naur From extension "zero or more."
Marks end of statement block, class etc. in Java.
Also known as "closing curly bracket" or "brace."
U+007E
~
TILDE Ornament. Do not use as horizontal rule. User directory in POSIX.
Bitwise "not" operator in Java and C.
Also known as "wavy line."

See also: Important Unicode Characters, SI Units and SI Unit Prefixes.


Exit: The Rules Of Typesetting; Kasoft Typesetting; Archer


Author and editor: Kade "Archer" Hansson; e-mail: archer@kaserver5.org

Last updated: Tuesday 10th May 2005