Kasoft Typesetting

The Rules of Typesetting

These rules are overridden by the demands of computer languages which support legacy character sets. Also, these rules are written with English in mind, and so do not override typesetting conventions commonly adopted when setting other natural languages.

General spacing rule

R1. One space is always enough.

If you are using two spaces, it is because you are used to the old standards associated with typewriters. More spaces are suitable in a computer language listing in order to achieve consistent indentation or tabulation. A notable violation in the widely published and respected literature is the IETF RFCs— such a violation is consistent with these documents having been intended to be set in a monospaced font.

Typesetting engines should remove all double spaces, except in computer language listings. HTML user agents will normally perform such distillation, with exception of those using a monospaced font, such as Lynx. These will use the old "typewritten letter" convention of using two spaces between sentences.

Rules for using hyphens and dashes

R2. An em-dash is required to break a sentence or bracket a phrase.

If you are forced to use a character set which does not have em-dash, then a hyphen and a space can be used as a surrogate. Do not use two hyphens unless you are writing a computer language comment intended to be used in a subsequent typeset form. An apparent violation in the widely published and respected literature is the IETF RFCs—however these documents are using ASCII as their character set, and this does not have an em-dash.

A typesetting engine should turn all hyphen-space pairs into an em-dash in any document which is using the surrogate. Any double-hypens in a computer language listing could be turned into an em-dash, but the engine must be engineered so as not to disturb any code where syntactic meaning is given to such a combination (e.g. in C and Java.) Also, there is no point using em-dash in a computer language listing if, as per guideline 5, that listing is set in a monospaced font. In this case, double-hyphen is the preferred setting choice.

Any automated replacement of hyphens with em-dashes as described above should also beware of the use of hyphen-space pairs at the start of lines: this is more often an indication of a bulleted item in a list, and should be set as such. The final check on such replacements should be performed by a human.

Remember, javadoc comments support the insertion of an em-dash because they are written in HTML. This choice is better, but the instructions and entities implied by HTML should be interpreted before setting a listing containing them. Another alternative is to remove the documentation comments before setting and include suitable narrative text around the listing instead.

R3. An en-dash is required to specify ranges.

Hyphen must only be used if the character set has no en-dash, or if a monospaced font is used. An apparent violation in the widely published and respected literature is the IETF RFCs—however:

This translation between HYPHEN or HYPHEN MINUS and EN DASH can sometimes be automated by an engine which recognises hyphens between numbers. But the final check should be performed by a human, as there may be rare exceptions. In particular, careful consideration should be given to replacing a Unicode HYPHEN with an EN DASH or anything else, as it is likely the author or her writing tool was using it deliberately.

R4. MINUS SIGN must be used for negative numbers and in mathematical expressions.

HYPHEN MINUS must only be used if the character set has no minus sign. An apparent violation in the widely published and respected literature is the IETF RFCs—however these are not documents written with the luxury of Unicode's MINUS SIGN.

This translation between HYPHEN MINUS and MINUS SIGN can sometimes be automated by an engine which recognises space-hyphen pairs before numbers. However, the final check should be performed by a human, as there may be rare exceptions.

R5. HYPHEN MINUS is only appropriate as a word breaker or to combine words to form a new word (e.g. "word-cum-phrase.")

(Corollary to rules 2, 3, and 4.)

An en-dash may be used when word phrases are nested to form larger word phrases, in order to indicate the higher level nesting.

Unicode HYPHEN may be substituted for HYPHEN MINUS by a typesetting engine without exception, but such an engine should have also ensured compliance with rules 2, 3 and 4.

Rules for setting numbers

R6. Numbers—phone numbers or mathematical numbers—must not contain hypens or commas.

R7. Numbers must be broken at three character intervals (starting count at the, possibly implied, decimal point) using spaces.

Spacing rule for punctuation

R8. No spaces can appear between or before punctuation marks other than quotes or brackets, though never between quotes or brackets of the same sex.

The correct way to nest adjacent quotation marks is therefore visually equivalent to a triple-quote. An opening triple is read as a single followed by a double and a closing triple is read as a double followed by a single. It is highly irregular to see (and therefore discouraged to cause to be seen) a single quotation where double and single quotes are used to both open and close it.

Ellipsis leading is the responsibility of the rendering engine—do not attempt to enforce any rule in the characters you write. Indeed, the typical rendering employed will be to display ellipses with ordinary period leading, though this often depends on the font being used when the Unicode character HORIZONTAL ELLIPSIS is employed.

It is highly irregular (and therefore discouraged) to use consecutive bracketed phrases or sentences, or consecutive quotations, within a single paragraph without some semantic salt.

Rules for bracketing and quotation

R9. All punctuation (except colon) which is attached to the end of a bracketed or quoted phrase must appear inside the closing bracket or quote.

In order to avoid ambiguity, where the text is literal, a different font or style should be used for the intended literal characters so the punctuation is not interpreted as belonging to the quoted text. If this is not possible, it is acceptable to drop the punctuation entirely. In plain ASCII text, reversal of the punctuation is also permitted.

Colons are preferred outside the bracket where the following phrase or list is set to the right. When the phrase or list is set below, set the introducing colon inside like all other marks.

A search and replace of affected text is a quick way to enforce this convention.

R10. Quotes cannot be nested more than two levels deep, and single quotes always surround double quotes and never the other way around.

This does not exclude double quotes being used where only one level of nesting is ever used. However, the use of single quotes in this case is encouraged.

Rules excluding repeated punctuation

R11. Ellipsis and nested quotes and brackets are the only time where valid punctuation is formed from multiple instances of the same mark (with possible exception made for repeated en-dashes used to indicate missing letters in words or completely missing words.)

It is never correct to use many exclamation or query marks in a row, or a number of consecutive periods other than one or three.

R12. Ellipsis only needs to be followed by a space to complete a sentence.

Where letters are missing from a word, an ellipsis is appropriate to indicate to missing letters. Consider using an apostrophe or the full and complete word where possible, however. Two en-dashes can also be used to represent missing letters from a word. Three en-dashes can represent a completely missing word. No more en-dashes in a row can be tolerated.

It is not considered correct to follow an ellipsis by an extra period space to complete a sentence. In this way an ellipsis is equivalent in power to an exclamation or query mark when followed by a space.

Rule for using prime marks

R13. Use primes for minutes and seconds (and feet and inches.)

Do not use apostrophe or double quote unless the character set has no prime.

But consider using decimal notation for angles, colon-separated notation for elapsed times, and SI units for all other measurements instead.

Rules for setting trademarks and product names

R14. Trademarks and product names (including computer applications) must be set with initial capitals in place as recommended by the owner or producer.

If the owner or producer uses a leading lower case letter, it is preferable to make this title case unless it leads to ambiguity or confusion. It is required to use a leading title case letter where the trademark or product name begins a sentence, or where the name may be confused with a natural language word.

The use of lower case letters only in UNIX utility names is optional if the author's position is not known or using title casing would cause no ambiguity. It is important to be consistent within documents and sets of documents, however. Use the same font for computer listing literals if you feel compelled for reasons of clarity or technical accuracy (say in user documentation) to include the lower case form in some places, but not in others (where you would use the ordinary body text font.) e.g. It is technically inaccurate for a user to invoke the Lame MPEG audio encoder by issuing the command Lame, because UNIX uses a case-sensitive filing system. Therefore is is better to say "then run lame against the appropriate input file" rather than "then run Lame against..."

R15. Unusual characters (such as punctuation) in trademarks or product names must be retained in place as recommended by the owner or producer.

If a character has no Unicode symbol in the chosen body text font, or is a stylized representation chosen for rendition for the mark, then a surrogate may be used or the character dropped entirely provided this would cause no confusion or ambiguity.

The use of the exclamation mark to begin RISC OS application names is optional if the author's position is not known or its omission would cause no ambiguity. It is important to be consistent within documents and sets of documents, however. Use the same font for computer listing literals if you feel compelled for reasons of clarity (say in user documentation) to include the exclamation mark in some places, but not in others (where you would use the ordinary body text font.)

R16. Trademarks which have become common natural language words themselves must only be title cased when the reference is to the particular product to which that name is given (unless the word is title cased for reasons other than rule 14.)

For example, if you are refering to Frisbee's frisbee, use "Frisbee", but if you are referring to flying plates in general, use "frisbee."

Words which are historically natural language words, but have trademark meanings, are also covered by this rule.

R17. Do not use the trademark or registered trademark symbol outside graphical representations.

(Corollary to rules 14, 15 and 16, which allow trademarks to be adequately distinguished without these cumbersome and unsightly abberations.)

R18. Copyright symbol precedes either author or year, and is always followed by a space.

Copyright symbols should only be used at the beginning or end of a document, in headers, title pages, footers and bibliographies, and never in the normal flow of text. Consider reducing the size of the copyright symbol to 80% of its normal size if it appears to be too large in your chosen font—unfortunately this is often the case.

Rules for setting units of mesurement

R19. Units must always be separated from the value by a space, with possible exception made if only the SI unit prefix is used (see rule 20) as an abbreviation.

R20. The unit part in a prefixed SI unit abbreviation may be omitted if it is obvious from the context.

This is the only time the SI unit prefix k may be capitalized.

R21. Where the SI unit prefixes are used for measuring binary data, they represent the nearest power of two to the power of ten represented by the unit.

Any other usage should be noted.

R22. A capital B represents a byte.

R23. A lower case b represents a bit.

R24. The letter p should not be used to represent "per."

Set the superscript index (e.g. -1 or -2) to the right of the divisor units instead.

Appendix A—The ASCII Gamut

Appendix B—Important Unicode Characters

Appendix C—SI Units

Appendix D—SI Unit Prefixes


See also: Typesetting Recommendations and Guidelines for In-house Material.

Also available: All rules, recommendations and guidelines are available as plain text.


Exit: Kasoft Typesetting; Archer


Author and editor: Kade "Archer" Hansson; e-mail: archer@kaserver5.org

Last updated: Sunday 8th May 2005