- Paragraphs and section headings
- All kinds of lists
- Fonts, accents, and symbols
- Block quotation
Before you can start typing a document, you have to give it a kind of skeleton or template into which you can put the text. There are two main parts to an HTML file which give it this basic structure, preceded by a single line declaring the document type as HTML.
This structure is shown in the diagram above.
The skeleton looks like this when viewed in a plain text editor:
<!doctype html public "-//IETF//DTD HTML//EN//"> <html> <head> Header information goes here </head> <body> Text of the document goes here </body> </html>
You can see the document type declaration at the top. All the
rest of the file is enclosed in the
<html>
...</html>
tags. Within
this, the document is divided into the head and
body. The
<head>
...</head>
tags enclose
the header, which identifies the file title and any relationships the
document has with the world outside. The
<body>
...</body>
tags surround
all the rest of your text. Notice that the head and body are
separate, non-overlapping sections, entirely contained within the
<html>
element.
This identifies the type of file with a special kind of tag called a markup declaration, which has an exclamation mark after the opening angle bracket:
<!doctype html public "-//IETF//DTD HTML//EN">
You should always use this line as the first line of every HTML file, exactly as given here (until the version changes and you add extra markup). If you're editing files with a plain text editor or wordprocessor, you may want to keep this line in a separate template file which you can copy in each time you create a new file.
This line is used by HTML editors and other software to locate a copy of the correct DTD so that they can understand what elements are usable in your file. Fully-compliant editors handle this automatically, and may not even display the declaration, although they will insert it when the file is saved to disk.
If you're using a copy of the HTML DTD in the same directory as the files you are editing, you can use the form
<!doctype html system "html.dtd">
Exercise 8.1: Creating a new file
For these exercises, you need to have some information you want to put into the Web. One good starting point would be to create your own personal page, with information about yourself - a kind of extended business card. You might even want to make it into a Web version of your résumé or curriculum vitae. How you phrase it and what information you put in it is entirely up to you: the objective is to become familiar with using HTML.
Use your editor to open a new file. Insert the document type declaration as the first line (unless you're using something like HoTMetaL or Author/Editor, which use a built-in or precompiled version automatically, and don't display it). Then add the
<html>
and</html>
tags. Save the file with a name of your choosing, but ending with .html (or .htm if you're on a PC).
A HTML file should be self-documenting: that is, it should contain some information about itself, so that you can identify the file without having to read through it all. You do this with the header, in which you can specify the title of the file and a variety of other information about it.
A header with a title must occur in every file. It's equivalent to a running head in a printed document. Here's an example of a header with a title:
<head> <title>How to make $1,000,000</title> </head>
There are other optional elements which can be included for additional information which we'll come to later. The structure of a header is shown at the top of Figure 7.1.
The <title>
element is a kind of
label for recording the function of the file: it is not a part of the
document text. Most browsers show the file title at the top of the
screen, separate from the text, either off to the right-hand side or
in a separate panel labelled `Title' or something similar, so it
should be short enough not to overflow the display: a few words is
usually enough. To display a heading at the start of your text you
use a different element which we'll see in the next section.
Exercise 8.2: Adding a file title
Insert a
<head>
element between the<html>
and/html
tags, and put a<title>
element inside it containing a description of what this file will be. Keep the title under a line long so that it won't overflow the display box used by some browsers to show it. Display the file in your browser.
All of the file after the header is enclosed in the
<body>
.../body
tags. This is where
all your text, illustrations, forms, tables, and hypertext references
go. The body of most documents consists of a mixture of
elements: some are simple one-line items like section headings,
subheadings, and illustrations; others are blocks of text like
paragraphs and lists, but a lot depends on the nature of the material
and how you want to present it. The structure of the text body and the
elements you can use in it is shown in Figure
7.1.
Exercise 8.3: The text body
Insert the
<body>
element between the</head>
and</html>
tags. As the overall structure up to now is common to all files, you might want to create a macro to insert this kind of skeleton for you (if your editor handles macros: they're a kind of miniature program of prerecorded keystrokes which you can get the editor to play back with a single key).
Inside the body of a document, the most common element is the
paragraph. HTML defines a paragraph with the <p>
element, for example:
<p>If you are ordering for shipment abroad, please add $30 to cover air freight and insurance charges.</p>
HTML does not recognise blank lines or indentation as the sign
of a new paragraph in the way that wordprocessors or DTP systems do,
but uses the < p>
element to enclose each
paragraph. Browsers pay no attention to multiple spaces, tabs, or
linebreaks, but treat them all as a single space (except in one
special circumstance which we'll come on to later) because it is the
markup which defines where elements like paragraphs begin and end.
The paragraph above could equally well have been written as
<p> If you are ordering for shipment abroad, please add $30 to cover air freight and insurance charges. </p>
The effect in a browser display would have been just the same in both cases:
If you are ordering for shipment abroad, please add $30 to cover air freight and insurance charges.
You can take advantage of this relaxed attitude to spacing if you are using a plain text editor rather than an HTML-sensitive one, because it allows you to include as much extra spacing as you want to make things easier to edit on the screen, without having to worry about whether it will format properly in the user's display. Paragraphs can contain plain text with the markup elements shown in Figure 8.1.
Figure 8.1: Element contents of a paragraph
See the explanation and description of CADE diagrams.
Exercise 8.4: Adding text in paragraphs
Use the
<p>
...</p>
tags to insert a few paragraphs of text in the body of your document. Don't worry about their order or placement at the moment: you can always move them around later.
Sectioning is used to divide a document into some form of
logical groups, and each section or subsection usually has its own
heading. HTML allows you up to six levels of section heading, using
the <h1>
to <h6>
elements.
Figure 8.2: Levels of section heading in Netscape
The top-level heading is used to represent the major divisions
of your text, and the first one in the file normally contains some
kind of title which applies to the whole document. The text is
enclosed in <h1>
...</h1>
tags,
and this conventionally makes it display in large bold type in a
graphical browser, although some allow the user to select the exact
font, size, and style to suit their own taste. In a character browser
it is positioned, outlined, or highlighted in some other way.
Further levels of section headings are done with
<h2>
, subsections with <h3>
,
subsubsections with <h4>
and so on with
<h5>
and <h6>
.
Graphical browsers display different highlights, sizes,
colors, or positions of type for the different levels of headings,
conventionally getting smaller or less bold as the depth of sectioning
gets greater (Figure 8.2). In some browsers
the user can change how the headings display. Mosaic's section
headings for <h4>
to < h6>
actually use smaller type than that used for the normal text of
paragraphs. Here's an example of a top-level heading:
<h1>Jill Doe's own page</h1>
If the file title said `Autobiography', this might display in your browser like this
Autobiography
Jill Doe's own page
In character browsers, with a smaller range of typographic variation available, other visual techniques are used for headings, such as centering or capitalising (Lynx), or surrounding or underlining with asterisks, dashes, or dots (Emacs w3-mode).
The top-level heading <h1>
is
usually the first element after the <body>
start-tag, so that it displays at the top of the screen when the
document is retrieved. Headings always come between paragraphs, not
within them: putting a heading element inside a paragraph is not
meaningful for the reader (an HTML-compliant editor won't let you do
this anyway). All that a browser will do is split the paragraph in two
and display the heading between the two halves.
Second-level headings (and all the rest) are done in the same way, for example:
<h2>Chapter 1 - Born to rule</h2>
This produces this kind of output when displayed:
Autobiography
Jill Doe - My Life
Chapter 1 - Born to rule
The elements you can use inside a heading are exactly the same as those you can use inside a paragraph: these are shown in Figure 8.1.
Section headings in HTML represent section levels, not section
numbers, so <h3>
means `heading level 3',
not `section number 3'. Although there are six levels of heading
provided for, each level can occur as many times as necessary. There
is no automated section-numbering in HTML 2.0, but if you need
section headers numbered, you can insert the numbers along with the
text of the heading.
Exercise 8.5: Inserting headings
Insert a
<h1>
element immediately after the start-tag for the body, before your first paragraph, and type in it the text that you want displayed as the top heading. Insert a<h2>
element between your first and second paragraphs and type a second-level heading in there.
Style guide
Although it is easy to put tags around your headings and paragraphs, and to write a simple document with a header and a body, there are some basic practices which experience has shown to make good sense. This is not to say that you have to do things this way - there are always plenty of reasons why not - but the following guidelines appear to meet the approval of readers of all kinds of material, not just Web hypertext.
Keep files to a reasonable size. Users on slow connections do not appreciate having to wait many minutes to receive a file when they may only want to refer to a few sentences.
Unfortunately, it's impossible to give an optimum length for a file, because even long files can load fast when you're in the same building as the server, and even short files can be slow if you're the other side of the world.
As a rule of thumb, when working with corporate or campus information systems, I try not to create files over 10 screens long (based on the default 22 lines of 80 characters, used in a standard terminal window), simply to avoid the reader getting lost in the verbal jungle. Most are much shorter, a very few are a little longer (whole articles, for example). I have also seen recommendations that no HTML file should be longer than one screenful, but this is perhaps unduly restrictive.
- If you are creating a long document, split it into separate files on section boundaries and make the first page a table of contents. We'll see in Chapter 9 how to reference one file from another.
- Make the file title meaningful and try to keep it under a line long so that it doesn't overrun the window in which it gets displayed, or get in the way in browsers which use the top line of a 25-line screen for it.
- Try to keep headings under one line long as well: top-level (
<h1>
) headings in a graphical broswer can take up a lot of space when a large bold font is used.- Keep paragraphs reasonable short, preferably less than a screenful each. Although what constitutes a `screenful' varies enormously (especially as many users read Web documents in a resizable window), try the following guide: a standard 25-line, 80-character screen means 2,000 characters when full, or about 250 words (at an average of 6 characters per word plus spacing and margins), but such solid slabs of text tire the eyes: to keep the reader's interest, try about 16 lines as a maximum, roughly 160 words.
- Write clearly, without overlong sentences or complex grammatical constructions, unless you are writing for a specialist audience who are accustomed to a particular style or content.
Everyone makes lists: they're a handy way of marshalling your thoughts and a common way of providing instructions or a record of events. Lists fall into three main groups:
There are elements to define all of these in HTML, and they can be nested within each other. Lists are block-oriented elements, so they occur between paragraphs, not within them.
There is in fact a fourth type, labelled lists, where the items are numbered or lettered for reference purposes, but where the order is not important. HTML 2.0 does not provide for this explicitly, as the subparagraphs of a definition list can be used, with the labelling provided by hypertext.
This is an example of an ordered list:
- Undo the two screws holding the coverplate.
- Gently lift the coverplate from the back.
- Latch the coverplate open using the hook inside on the left.
- Remove the soundproof wadding from over the blower motor housing.
The numbering is
automatic in HTML, so if you delete or insert items, they get renumbered
automatically by the browser when the file is next displayed. An ordered
list is a sequence of <li>
(list item)
elements completely enclosed in the <ol>
(ordered list) element. An ordered list is written like this:
<ol> <li>Undo the two screws holding the coverplate.</li> <li>Gently lift the coverplate from the back.</li> <li>Latch the coverplate open using the hook inside on the left.</li> <li>Remove the soundproof wadding from over the blower motor housing.</li> </ol>
There's a diagram of what elements can go inside a list item in Figure 8.3.
Figure 8.3: Element contents of a list item
This applies to all lists which use
<li>
(ordered, unordered, menu, and directory lists)
Exercise 8.6: Ordered lists
Add an ordered list between two of your paragraphs, or between a paragraph and a section heading. Insert a few items in the list, then save the file and display it using your browser.
Edit the file to change the position of some of the list items, then save the file and reload it in the browser: check that the items get renumbered automatically.
An unordered list has bullets or asterisks instead of numbers, for example:
- 2lb (4 cups) sugar
- 8oz (8 squares) dark chocolate
- 1/2pt (8 fl.oz) evaporated milk
- 2oz (1 stick) butter
The same <li>
tags are used for the list
items, but in this case, the tag used to surround the whole list is
<ul>
(unordered list):
<ul> <li>2lb (4 cups) sugar</li> <li>8oz (8 squares) dark chocolate</li> <li>1/2pt (8 fl.oz) evaporated milk</li> <li>2oz (1 stick) butter</li> </ul>
This makes it easy to change the kind of list later: just
change the <ul>
tags for < ol>
ones (the fudge tastes pretty good, too...)
Both ordered and unordered lists can have an attribute of
compact
which browsers can interpret as meaning the list
items will be short, so the space between items can be reduced. Like
other attributes, the word compact
goes inside the
angle brackets after the tag name, separated by a
space:
<ul compact> <li>sugar</li> <li>chocolate</li> <li>milk</li> <li>butter</li> </ul>
In a character browser, because of the fixed line-height, there is less scope for such compression, although Emacs w3-mode does show the difference by using a different bullet and spacing. The elements which can make up the content of a list item in an unordered list are the same as for a list item in ordered, menu, or directory lists (Figure 8.3).
Exercise 8.7: Unordered lists
Add an unordered list in a similar way to how you did the ordered one in the last exercise (or just change the
<ol>
tagging of that list to<ul>
). Save the file and reload it in the browser and see the difference in the way the two kinds of list are displayed.
Definition lists are designed to handle cases where each list item is made up of the term being defined or explained as well as one or more paragraphs of definition. This is common where the items are categories rather than just simple entries.
In this case, the whole list is a <dl>
(definition list) element. Typically, each term is given as a
<dt>
element (definition term), followed by one or
more <dd>
(definition discussion) elements, but
they can be used on their own or in other combinations.
The idea is that within the <dl>
element,
<dt>
and <dd>
elements normally
occur in pairs. In fact you can have several <dt>
elements together (where multiple terms are being defined as
synonyms, perhaps) as well as several <dd>
elements defining them. You can see from the diagram in Figure 8.4 that the most significant structural
difference is that the <dt>
element cannot contain
another structure like a list or form, because it functions only as a
title, while the <dd>
element can contain any
element in the normal flow of text: the content is the same as that
for the list item (Figure 8.3). Here's an example
of such a list:
- Managerial grades
- Senior and junior managers must wear conventional blue business suits with white shirts and suitable ties during office hours, particularly while meeting clients.
- Any departure from this norm will be taken very seriously as it reflects on the standing of Widget Computers Inc.
- Salaried staff below managerial level
- Employees must dress neatly but need not wear suits. Unusual or extravagant clothing is inappropriate to the position of these employees within the company.
- When meeting clients, conventional blue business suits with white shirts and suitable ties must be worn.
- Programmers and HTML hacks
- Back-office staff can wear anything they want, provided it falls within the bounds of decency.
- In any case, we depend on these people so much to keep our systems running that we'll make almost any exception they insist on.
which can be obtained with:
<dl> <dt>Managerial grades</dt> <dd>Senior and junior managers are expected to dress in conventional blue business suits with white shirts and suitable ties during office hours, particularly while meeting clients.</dd> <dd>Any departure from this norm will be taken very seriously as it reflects on the standing of Widget Computers Inc.</dd> <dt>Salaried staff below managerial level</dt> <dd>Employees must dress neatly but need not wear suits. Unusual or extravagant clothing is inappropriate to the position of these employees within the company.</dd> <dd>When meeting clients, conventional blue business suits with white shirts and suitable ties <em>must</em> be worn.</dd> <dt>Programmers and HTML hacks</dt> <dd>Back-office staff can wear anything they want, provided it falls within the bounds of decency.</dd> <dd>In any case, we depend on these people so much to keep our systems running that we'll make almost any exception they insist on.</dd> </dl>
Figure 8.4: Content of the definition list element
The <dd>
element can contain any text, as
well as other elements including separate paragraphs and lists (but
not headings); the <dt>
element is intended to
contain only text and highlighting elements. The
<dl>
element can have a compact
attribute like ordered and unordered lists, to imply that it should
be formatted more tightly.
Exercise 8.8: Definition lists
Add a definition list to your file. If it's your résumé that you're typing, this might be a good way to list your employment history, with the name of the company or the job title as the
<dt>
element and the<dd>
element for describing the job. Check the display in your browser.
HTML also defines two other list elements, <menu>
and <dir>
, for handling menus
and directory listings. In both cases the individual items in their
content are enclosed in <li>
tags, as for ordered
and unordered lists, and both can take the compact
attribute implying closer formatting.
A menu list is intended for items which each fit on one line, so browser authors can implement alternative formatting. Because a menu implies choices which the user can select, it is suited to lists of short selections for hypertext links (see Chapter 9 for how to do these):
Search menu
Pick one of the following:
which is produced by the following code:
<p>Pick one of the following:</p> <menu> <li>Try the last search again with a new keyword</li> <li>Change the database you want to search</li> <li>Return to the main menu</li> </menu>
When we come to look at making fill-in forms, this can be used as a convenient way to represent selections from a list of options. The elements which can make up the content of a list item in a menu list are the same as for a list item in ordered, unordered, or directory lists (see Figure 8.3).
The
<dir>
element is intended for items
containing fewer than 20 characters each. The HTML specification
suggests that the items in a directory list may be arranged in columns,
although this does not seem to have been implemented in any known
browser. This means that a much more horizontally-compressed display
should be possible, so that
<dir> <li>Paris <li>London <li>Berlin <li>Madrid <li>Dublin <li>Moscow <li>New York <li>Tokyo </dir>
could result in a display such as:
Destinations
Paris Madrid New York London Dublin Tokyo Berlin Moscow
The elements which
can make up the content of a list item in a directory list are the same
as for a list item in ordered, unordered, or menu lists (see Figure 8.3). Notice that in this example, I've omitted the
end-tags from the list items, as lists are among those cases where the
only element permitted inside them is the list item, so there is no
ambiguity. The same would apply to ordered and unordered lists and to
menu lists. In definition lists the end-tags <
/dt>
and </dd>
can be omitted in the
same way.
Menus and directories
Experiment with menu and directory lists and see how they display in your browser. A résumé is probably not a likely place to find them, but if you're creating files as part of a larger system which gathers input from the user or provides choices of what document to visit next, they may make a more logical choice than ordered or unordered lists.
You can have any type of list within any other type, but the if it involves nested ordered lists, the subsidiary numbering or lettering scheme is a function of the browser: under HTML 2.0 you do not have any means of specifying how a browser numbers or letters the items. Some browser authors have indicated that they may include configuration options to handle how you prefer to see such nested lists displayed. Here's a part of a nested list, showing a definition list containing an ordered list containing a menu list:
Changing the settings
- Configuration
- Swarf Brothers' numerically-controlled lathes are shipped with factory settings (see the manual for details). To change these settings, follow this sequence:
- Open the front panel by turning the knurled knobs on the sides toward you.
- Move the blue control switch to Standby.
- Adjust the settings as shown in the manual. The settings which you can change from this position are:
- Move the blue control switch to Run and close the front panel.
The code to produce such a nest looks like this:
<dl> <dt>Configuration</dt> <dd>Swarf Brothers' numerically-controlled lathes are shipped with factory settings (see the manual for details). To change these settings, follow this sequence:<ol> <li>Open the front panel by turning the knurled knobs on the sides toward you.</li> <li>Move the blue control switch to <b>Standby</b>.</li> <li>Adjust the settings as shown in the manual. The settings which you can change from this position are:<menu> <li>Maximum speed;</li> <li>Maximum distance of bit travel;</li> <li>Shear level for the automatic cut-off;</li> <li>Default communications parameters for the serial interface.</li></menu></li> <li>Move the blue control switch to <b>Run</b> and close the front panel.</li> </ol> </dd> </dl>
Exercise 8.10: Nested lists
Try your hand at a nested list. If you used a definition list for your employment history, and you had several positions with one employer, you could use an ordered list in the
<dd>
element to show the jobs in order.
So far we have looked at tags which describe structure or content. Browsers use this information to decide how to lay out and display a file, so you (as author) do not need to specify how this is done, because you cannot know or tell what hardware and software the user has got. There are many occasions, though, when you need to be able to influence some part of the user's display in order to emphasize or distinguish words or phrases.
There are also occasions when you need to mark certain words or phrases as having some special quality, for example citations, fragments of computer or other code, or sample input or output. These may not necessarily need or receive special typographic treatment, but once marked correctly, they are useful for retrieval by indexing routines, textbase programs, or formatting systems.
The whole business of exerting further control over the user's display is one which is receiving close attention, and is discussed in more detail in Chapter 11.
Some of these elements may only be obvious to the user if they have a graphical browser which can use different typestyles. Plain-text terminals cannot do this, so character browsers may simulate the changes by using highlighting, underscores or asterisks to make the text stand out.
If you need to make some words more obvious in the text, it might be because you want to emphasize them, or because they have some other special meaning. Either way it normally means changing the kind of type used to display them, bold and italics being the two most common variations. HTML provides several methods for doing this: emphasis or italics, strong emphasis or bold type and underlining. There are also several other ways to define special meanings for words and phrases which are explained in Marks of Identification.
<em>
element:
Remember, <em>send no money now!</em>which usually produces italics in a graphical browser:
although some browsers let you change the exact font used. In character browsers, italic or slanted type is not normally available, so some other form of highlighting may be used, such as reverse video or high-intensity letters, or the conventions established by email, where the words are surrounded by underscores:Remember, send no money now!
When you want something italicized for reasons other than emphasis, there is anRemember, send no money now!
<i>
element. The choice between them allows you to distinguish between real
emphasis and simple font change:
Italic type is slanted <i>like this</i>.The difference between
<em>
and
<i>
may or may
not be apparent on a graphical browser screen, depending on what fonts
they have installed, but if you are using your text for other purposes
(perhaps for a database, or paper publishing), it is a useful way of
differentiating usage:Italic type is slanted like this.
<strong>
element:
Unsuccessful candidates are <strong>not</strong> eligible for a refund of expenses.which typically displays in bold type in a graphical browser:
(although as withUnsuccessful candidates are not eligible for a refund of expenses.
<em>
, users may be able to change the font they
see). In character browsers, following email convention, the words may
appear highlighted in some way or surrounded by asterisks if
the characteristics of the terminal being used do not allow for
highlighting:Along similar lines to the distinction betweenUnsuccessful candidates are not eligible for a refund of expenses.
<em>
and
<i>
, there is a <b>
tag
for occasions when you want bold type for purposes other
than strong emphasis:
In bibliographies, the volume number of a serial publication must be given in <b>bold type</b>.so that you can keep the usage clear:
In bibliographies, the volume number of a serial publication must be given in bold type.
<u>
element was proposed in an earlier version of HTML 2.0, and as
a result is still implemented by many browsers. If you want to use
underlining, therefore, you should be aware that it may not always work,
and that it is not valid in the HTML 2.0 DTD.
In these examples, terminal input by the user is shown <u>underlined</u>.If you are using a validating (SGML-compliant) editor, it may not let you insert the
<u>
element if it is not in
the version of the DTD which is in use.Where you have one element which affects font control embedded inside another, it is normal in wordprocessors and DTP systems for the type changes to be `commutative', that is, they inherit their style from the next outer element, so
This is <b>bold type <i>with italic and <u>underlining</u></i></b>.
would be expected to produce
This is bold type with italic and underlining
Not all browsers support this. If you embed elements in this way, take great care to insert the end-tags in the right (reverse) order if you are typing them manually in your editor, as element start- and end-tags are not supposed to overlap. The following would be an invalid use:
This is <b>bold type <i>with italic <u>and underlining</b></i></u>.
Exercise 8.11: Emphasis
Use the emphasis and strong emphasis elements (or bold and italics) and underlining to make any changes to your file where the reader's attention needs to be brought to particular words or phrases. Try nesting the elements to see what your browser supports.
Because of the extensive use of Web for computer-based documentation, there are several ways of showing fixed-width (typewriter) type. However, as with the tags for emphasis (italics) and strong emphasis (bold), users on plain-text terminals only get typewriter-style letters anyway, so they won't be able to see any difference.
There are two main elements for explicit typewriter type, one for inline usage (words in the middle of a paragraph) and one for display usage (blocks of text).
<tt>
element: To find all files ending with <tt>.html</tt> in your disk areawhich would display as
To find all files ending with .html in your disk area
<pre>
element. The
content of this element can have many lines, and this is the one case I
mentioned earlier where the linebreaks, multiple spaces and blank
lines will be honored exactly as
you type them: <pre> \parindent=<i>1em</i> \parskip=<i>\baselineskip</i> \parfillskip=<i>\parindent plus1fil</i> </pre>Font changes like italics and bold for emphasis will still also be honored within the preformatted block, and in a graphic display these would show up as slanted or bold typewriter type: in character browsers emphasis and italics may be shown otherwise:
The elements which can appear in preformatted text blocks are the same as those for regular paragraphs, as shown in Figure 8.1. In the absence of elements to do tables in HTML 2.0, you can use this element to display columnar text in its preformatted form in fixed-width type (p.208).\parindent=1em \parskip=\baselineskip \parfillskip=\parindent plus1fil
<code>
tag can be used: You should never type <code>del *.*</code> unless you really mean it.This will display as typewriter type in a graphical browser:
This lets you identify commands or programming code separately from other uses of typewriter type.You should never type
del *.*
unless you really mean it.
<kbd>
element. Like
the other elements which reflect computer material, this allows the
author to keep specific usage separate from ordinary typewriter type:
When the menu appears again, press <kbd>X</kbd> to stop the program.If you are using a font-configurable browser, it might be better for it to appear in another typeface altogether:
When the menu appears again, press X to stop the program.
<samp>
element allows the separate
specification of
an inline sequence of literal characters such as samples of output from
computer systems: To display the file, type <code>more</code> followed by the filename. This pauses with the word <samp>--More--</samp> and the percentage through the file at the end of each screenful.and it could also be used for sample forms of words which are not part of the text itself in non-computing situations, such as incidental translations:
...as with the word `telescope' (Greek, <samp>see-afar</samp>)...No specific font or typestyle is implied by this element, although italics would seem to be a reasonable choice for browser implementors:
To display the file, type
more
followed by the filename. This pauses with the word --More-- and the percentage through the file at the end of each screenful....as with the word `telescope' (Greek, see-afar)...
<var>
element
is for this kind of indication, where it is important to keep this usage
separate from other computing elements: To increase paragraph indentation, set the value of <var>\parindent</var> to a bigger dimension.The content of this element would be expected to appear in typewriter type.
To increase paragraph indentation, set the value of \parindent to a bigger dimension.
Exercise 8.12: Fixed-width type
Include your email address in your document, using the
<tt>
element.Use the
<pre>
element to make a short block of text where the linebreaks and spacing are significant.
Because HTML uses the less-than
and greater-than signs to identify markup, you need to avoid
confusion if you want to include the actual less-than and
greater-than characters themselves. To prevent them being
misinterpreted as pieces of markup, you need to use the symbolic names
<
for the less-than sign and >
for the greater-than sign. These names are
called character entities, and there are many more of them to
describe other symbols, especially accented characters (Table 8.2). The mnemonic name is preceded by an
ampersand and followed by a semicolon, so the ampersand
character itself has to be encoded this way: if you want to display one
you need to type &
. If you were
documenting a program which said:
...using the condition if ((amount-paid<tax-paid) & (tax-due>amount-owing)); we can check...
this would have to be typed as
...using the condition <code>if((amount-paid<tax-paid) & (tax-due>amount-owing));</code> we can check...
If you simply typed it as it appears in a program, a browser
might attempt
to interpret <tax-paid)&(tax-due>
as
markup tags, because they would appear to be enclosed in angle
brackets, and it might then also get confused over the apparent
character entity &(tax-due>amount-owing)) overlapping
the middle of it, because it begins with an ampersand and ends
with a semicolon. If you need to display the actual character entity
name of the less-than symbol itself (<
), you would
need to type &.
One other character occasionally needs to be
treated this way: the double-quote mark () when you need
to use it in a place where the text is already in double quotes, such as
within an attribute value. This is done with "
(but it is not in HTML 2.0 and not all browsers support it
yet). These typewriter-style unidirectional double quotes are not used
in normal text, where the open-quote and close-quote signs are used,
`like this'.
Exercise 8.13: Character entities
Use the less-than, greater-than and ampersand character entities to type into your file the paragraph above beginning `If you simply typed it...', then display the result in your browser.
When you want to indicate that there is something special about certain words or phrases, you may want to keep this information separate from the ordinary emphasis or typewriter elements. The visual effect for the user may be the same, or there may be none, but it makes HTML documents more useful for authors if categories of usage for special terms have their own elements, so that they can be indexed or referenced, or used in glossaries or databases.
<cite>
element: See <cite>SGML: A User's Guide to Structured Information</cite>, by Liora Alschuler (<cite>ISBN 1-85032-197-3</cite>).which may typically display in italics:
See SGML: A User's Guide to Structured Information, by Liora Alschuler ISBN 1-85032-197-3.
<dfn>
element was
introduced in an earlier version of HTML 2.0 to identify such defining
instances, but was later removed and is now reappearing. This contrast in harmony and tonality between soloists and orchestra, which became popular towards the end of the 17th century, is known as the <dfn>Concerto Grosso</dfn>.The text does not even necessarily change typeface or appearance in the browser display, although those browsers which allow font configuration may be able to do so: the objective is to allow the author to keep track of the important occurrences of words. In the above example, if you needed to italicize the definition, an
<i>
element could surround
the whole <dfn>
. One obvious application is
for a browser to build a list of these as it parses a document, and to
display them at the end as a glossary or index: if taken in conjunction
with the use of the <link>
element from the
header, which allows the specification of a link to an external glossary
or index, an annotational tool of consideable power could be
implemented.<strike>
element in earlier versions of HTML
provided this, and could be rendered as letters with a slash or line
through them. It is unclear if any browser ever implemented this. The premises may <strike>not</strike> at any time be occupied by members.It is being supplemented (as
<s>
) in HTML3 with <del>
,
and a new element <ins>
is being proposed to
handle insertions (to replace material being deleted). Details of
HTML3 are in Appendix B.Exercise 8.14: Identification elements
Use the elements discussed in this section to identify some examples in your text, and see how many of them take effect in your browser. Type in the following text:
<blockquote> <p>Phonemes are the sounds or strictly the distinctive sounds of language - cat consists of the phonemes /k/, /æ/ and /t/. (Palmer F, Grammar, Penguin 1971, p107)</p> </blockquote> <p>When using the phoneme analysis program, pressing P will break the value of current_word into its phonemes.</p>Mark the first occurrence of `phonemes' as a defining instance; mark `cat' in italics; the three phonemes in slashes as samples; and mark the whole citation in brackets (and within that, the title `Grammar' in bold). Mark the words `phoneme analysis' as a deletion and make the word `PhonAl' after them an insertion (use typewriter type); then make the P a keyboard character and the term current_word a computer variable. Ideally, the display should look something like this:
Phonemes are the sounds or strictly the distinctive sounds of language - cat consists of the phonemes /k/, /æ/ and /t/. (Palmer F, Grammar, Penguin 1971, p107)
When using the
phoneme analysisPhonAl program, pressing P will break the value of current_word into its phonemes.This kind of markup is very useful for typesetting and indexing routines.
There are two tags which are useful in controlling the formatting and layout of your file: the forced line-break and the horizontal rule. These are empty elements: they are just like start-tags on their own without matching end-tags. There are also character entities which represent a non-breaking space and a soft hyphen.
<br>
tag: On the next line, by itself, type the command<br> <code>search italic in typo-l since 13 feb</code> <br>and press the <kbd>Enter</kbd> key.which could be used to get
See also the example of a poem.On the next line, by itself, type the command
search italic in typo-l since 13 feb
and press the Enter key.
<hr>
tag: ...and telephone the department when you have finished.</p><hr> <h2>Authorization for expenses</h2> <p>Staff of Grade 3 and above are permitted to claim...which makes a greater distinction between sections, or around some important text:
...and telephone the department when you have finished.
Authorization for expenses
Staff of Grade 3 and above are permitted to claim...
between the two words, leaving no other
space: It has been done at the specific request of the deceased, Dr. A.B. See, whose express wish...This avoids formatting routines in browsers making a break at that point if the words happen to fall where the end of a line would normally come, while keeping a printed space in the output:
This entity has been proposed for early support in HTML but is not implemented in all browsers.It has been done at the specific request of the deceased, Dr. A.B. See, whose express wish...
­
character entity: pneu­mono­ul­tra­mi­cro­scop­ ic­sil­i­co­vol­ca­no­co­ nio­sis(pneumonoultramicroscopicsilicovolcanoconiosis: the display above has been deliberately broken to fit in the width of these pages - you would actually type it all on one line.). This entity is again a proposal for HTML and is not supported in all browsers.
Exercise 8.14: Rules and spacing
Insert horizontal rules above the section headings in your file, and use the forced linebreak tag to control the ends of lines in a paragraph. Separate any initials from surnames with the non-breaking space, and insert soft hyphens in any long words. Reload the file to see what your browser supports.
Figure 8.5: Netscape formatting in action
The additional elements proposed by Netscape for varying type size and for centering text are not part of HTML 2.0, although centering is implemented in HTML3 using an attribute. While there is unquestionably a demand for typographic controls like this, HTML elements are not always the place for them: the style sheet proposals of Arena and HTML3 will be much more effective. The benefit of the Netscape proposals was that it brought the discussion of visual appearance to the fore for consideration.
However, the same strictures apply to formatting in the Web as to handwriting, typewriting, wordprocessing, DTP, and other areas of graphic design: competent hands can do wonders with even limited resources, but the most powerful formatting system in the world can also be used to produce visual nonsense. Many Web information providers are now beginning to concentrate on content as well as appearance, because they know that it will be just as important to be able to retrieve information according to its sense or context (which can only be done if the text is marked up logically) as it is to be able to retrieve it in an attractive visual form.
Accents and symbols which may or may not be on your own keyboard need a special representation so that they will work on all kinds of computer. They have to be typed in a different form to do this, because hardware and software manufacturers have their own (sometimes idiosyncratic) ideas about how to handle accents. The Web is used internationally and HTML 2.0 provides direct support for accented letters in the Latin alphabet.
The standard
way to represent letters with accents is to use character entities
like those used for
the less-than and greater-than symbols, non-breaking space and
soft-hyphen (see Special characters).
The entities are
mnemonics for the names of the accents or symbols enclosed between the
two characters `&' and `;' (ampersand and
semicolon) as before. For example, to get an e with an acute
accent, you type é
as in the word `
Résumé':
Résumé
HTML-compliant editors let you pick these from a menu, but to speed typing in other editors, you can usually use macros to redefine the accenting keys or menus so that they insert the right characters. Using these character entities is the only way to make sure your accents and symbols work in other browsers running on different computers to the ones you use. Don't be tempted to use your computer's own idea of accented letters or other special characters, because they may display completely differently on other users' computers.
HTML 2.0 supports the International Standards Organisation's Latin-1 characters (see the full list in Table 8.1). There is much discussion over how or whether a future version of HTML should support extended character sets like Unicode or ISO 10646, which are capable of handling the increased number of characters in more complex writing systems such as Kanji (there's a mailing list to discuss this: see Appendix C). A separate but related issue is how such extended character sets should be encoded.
Table 8.1: International Standards Organisation Latin-1 character entities
Copyright of the International Organization for Standardization 1986. Permission to copy in any form is granted for use with conforming SGML systems and applications as defined in ISO 8879, provided this notice is included in all copies.
Name Code Char Description Name Code Char DescriptionÀ
#192
À capital A, grave accentà
#224
à small a, grave accentÁ
#193
Á capital A, acute accentá
#225
á small a, acute accentÂ
#194
 capital A, circumflexâ
#226
â small a, circumflexÃ
#195
à capital A, tildeã
#227
ã small a, tildeÄ
#196
Ä capital A, dieresis/umlautä
#228
ä small a, dieresis/umlautÅ
#197
Å capital A, ringå
#229
å small a, ringÆ
#198
Æ capital AE ligatureæ
#230
æ small ae ligatureÇ
#199
Ç capital C, cedillaç
#231
ç small c, cedillaÈ
#200
È capital E, grave accentè
#232
è small e, grave accentÉ
#201
É capital E, acute accenté
#233
é small e, acute accentÊ
#202
Ê capital E, circumflexê
#234
ê small e, circumflexË
#203
Ë capital E, dieresis/umlautë
#235
ë small e, dieresis/umlautÌ
#204
Ì capital I, grave accentì
#236
ì small i, grave accentÍ
#205
Í capital I, acute accentí
#237
í small i, acute accentÎ
#206
Î capital I, circumflexî
#238
î small i, circumflexÏ
#207
Ï capital I, dieresis/umlautï
#239
ï small i, dieresis/umlautÐ
#208
Ð capital Eth, Icelandicð
#240
ð small eth, IcelandicÑ
#209
Ñ capital N, tildeñ
#241
ñ small n, tildeÒ
#210
Ò capital O, grave accentò
#242
ò small o, grave accentÓ
#211
Ó capital O, acute accentó
#243
ó small o, acute accentÔ
#212
Ô capital O, circumflexô
#244
ô small o, circumflexÕ
#213
Õ capital O, tildeõ
#245
õ small o, tildeÖ
#214
Ö capital O, dieresis/umlautö
#246
ö small o, dieresis/umlautØ
#216
Ø capital O, slashø
#248
ø small o, slashÙ
#217
Ù capital U, grave accentù
#249
ù small u, grave accentÚ
#218
Ú capital U, acute accentú
#250
ú small u, acute accentÛ
#219
Û capital U, circumflexû
#251
û small u, circumflexÜ
#220
Ü capital U, dieresis/umlautü
#252
ü small u, dieresis/umlautÝ
#221
Ý capital Y, acute accentý
#253
ý small y, acute accentÞ
#222
Þ capital THORN, Icelandicþ
#254
þ small thorn, Icelandicß
#223
ß small sharp s, German szÿ
#255
ÿ small y, dieresis/umlaut
As you have seen, there are a some additional character entities for other symbols because using the regular characters for them in their raw state would conflict with HTML's own markup characters.
From Table 8.2 you can see that as well as the
name of the
character entity, each character has a numeric code. This is expressed
in the same way as a character entity (starting with an
ampersand and ending with a semicolon), but with a hash
mark or number sign before the digits. These codes are intended for use
instead of the name for those character entities which HTML
does not yet support directly, such as £
,
because it allows individual browsers which do
support them to display the right character.
There are many others
defined in other ISO character sets, and a subset
of the graphical characters is included in HTML 2.0 for use by reference
to the character code number rather than the entity name (Table 8.2). This means that for a copyright
sign © you need to type ©
and for the
pounds sterling sign £ you need to type £
.
The complete set of all the character entities is in the entity files
defining them, which can be downloaded from several sites on the
Internet such as ftp://sgml1.ex.ac.uk/pub/SGML/.
Numerical character references
This is derived from the ISO 8859--1 8-bit single-byte coded graphic character set
Name Code Char Description
#160
non-breaking space¡
#161
¡ inverted exclamation mark¢
#162
¢ cent sign£
#163
£ pound sign¤
#164
¤ general currency sign¥
#165
¥ yen sign¦
#166
¦ broken (vertical) bar§
#167
§ section sign¨
#168
¨ umlaut/dieresis©
#169
© copyright sign«
#171
« angle quotation mark, left®
#174
® registered sign¯
#175
¯ macron°
#176
° degree sign±
#177
± plus-or-minus sign²
#178
² superscript two³
#179
³ superscript three´
#180
´ acute accentµ
#181
µ micro sign¶
#182
¶ pilcrow (paragraph sign)·
#183
· middle dot¸
#184
¸ cedilla¹
#185
¹ superscript one»
#187
» angle quotation mark, right¼
#188
¼ fraction one-quarter½
#189
½ fraction one-half¾
#190
¾ fraction three-quarters¿
#191
¿ inverted question mark×
#215
× multiply sign÷
#247
÷ division sign
The problems of using the Web for non-Latin-alphabet and multilingual documents are currently being discussed by the IETF Working Group. Many users around the world have adapted the software for local use, and have crested pages in a huge number of languages. François Yergeau of Alis Technologies (Montréal) maintains a list of some of these which I reproduce here (Table 8.3).
Table 8.3: Examples of non-Latin-alphabet and multilingual Web pages
Language URL Encoding Russian http://www.free.net/Docs/cyrillic/alphabet_iso.txt ISO-8859-5 http://www.free.net/Docs/cyrillic/alphabet_koi.txt KOI-8 http://www.free.net/Docs/cyrillic/alphabet_alt.txt CP866, Cyrillic MSDOS http://www.free.net/Docs/cyrillic/alphabet_win.txt CP1251?, Cyrillic MS-Windows http://www.elvis.msk.su/koi8/home.html KOI-8 http://www.ntt.jp/Mosaic-l10n/russian.html ISO-8859-5 Polish http://www.uci.agh.edu.pl/ ISO-8859-2? Greek http://www.ntt.jp/Mosaic-l10n/greek.html ISO-8859-7 Hebrew http://www1.huji.ac.il/www_teva/environment.html ISO-8859-8, dir. unknown http://www.ntt.jp/Mosaic-l10n/hebrew-visual.html ISO-8859-8, Visual dir. http://www.ntt.jp/Mosaic-l10n/hebrew.html ISO-8859-8, Implicit dir. Chinese http://www.ntt.jp/Mosaic-l10n/chinese.html GB 2312; GB encoding http://www.ntt.jp/Mosaic-l10n/chinese-hz.html GB2312; HZ encoding http://www.ntt.jp/Mosaic-l10n/chinese-big5.html Big5 Japanese http://www.etl.go.jp/Organization/welcome.html ISO-2022-JP http://www.ntt.jp/Mosaic-l10n/japanese.html JIS X 208 Korean http://h20.kotel.co.kr/ KSC 5601 http://www.ntt.jp/Mosaic-l10n/korean.html KSC 5601 Persian http://gpg.com/MERC/news/gpg.isi ISIRI 3342 German http://www.ntt.jp/Mosaic-l10n/german.html ISO-8859-1 Multi http://www.ntt.jp/japan/note-on-JP/multi-example.html ISO-2022
You can put comments in your file which you can see when editing it, but which won't get displayed to others who view the file using a Web browser. A comment looks like a tag but has no name: instead there's an exclamation mark and double-dash after the opening angle bracket and another double-dash before the closing one. The comment text inside the tag can go over more than one line:
<!--Here's a comment which I can see when I edit the file but which is invisible to anyone reading the file via a browser.-->
But be careful: most browsers can also display the raw HTML of any file they retrieve, so comments can be seen by users who use the View source function!
Several browsers implement comments wrongly, recognising <! on its own without the double dash. Although this avoids their authors having to code recognition of declarations like <!doctype html..., it also leads to some of them displaying pages wrongly, starting the display some way down the file instead of at the top.
Exercise 8.15: Accents and comments
Add a comment line to your file saying what editor you used to create it, e.g.
<!-- Edited with Emacs psgml-mode -->Type the following sentence as a separate paragraph:
My résumé says I made Münster cheese for the Mañana Cheese Co, but I was very naïve in those days.
Add a copyright statement to your file using the copyright symbol.
Block text is text that is set off from the surrounding or adjacent paragraph by being indented or separated by some space. It is frequently used for quotations of more than a line or so from other documents, and for addresses, warnings, and notes of various kinds.
We've seen indented text used in the items in a list, but block text elements can be used independently.
There are two elements in HTML
2.0 to let you indicate this: <address>
and
<blockquote>
. Both of them go between
paragraphs, not inside them, as they represent matter which is to be
displayed as distinct blocks. An <address>
element can occur within a <blockquote>
element, but not the other way around.
Although its intention is clear from its name, the
<address>
element is not restricted to addresses.
It typically causes a font change to italics, and may or may not be
indented (in fact, Arena displays it right-aligned). As with most
block-oriented elements, formatting is left to the browser, so if you
need to split an address up into separate lines, use the <br>
element. The code
...If you would like to apply for this position, please forward your résumé to:</p> <address>Personnel Office<br> ACME Widget Enterprises<br> Nowheresville, KY</address> <p>Applicants will be notified by postal mail of receipt of their application...
would display something like
If you would like to apply for this position, please forward your résumé to:
Personnel Office
ACME Widget Enterprises
Nowheresville, KYApplicants will be notified by postal mail of receipt of their application.
It is common practice to include an address block at the bottom of major files you write, so that your authorship is evident and readers can see who to contact.
As with the address
element, this name suggests the use for quotations, but it can easily be
employed for other block material such as warnings or exercises.
Browsers usually indent this element, but there is no typeface change
suggested. Because block quotation is an environment in its own right,
it can contain other block-oriented elements like paragraphs and lists,
so the text inside
<blockquote>
...</blockquote>
must be at least within its own <p>
element.
...as Peter Collinson says:</p> #<blockquote><p>It's good practice to sign each page and place a date when the page was last changed.</p> </blockquote> <p>This lets the user identify...
...as Peter Collinson says:
It's good practice to sign each page and place a date when the page was last changed.
This lets the user identify...
A special case of block text is poems, because they need
fixed linebreaks, and are frequently divided into stanzas. One way of
doing this is of course to use regular paragraphs and force the
linebreaks with the <br>
tag:
<p><i>Zerbrochen ist das Steuer und es kracht<br> Das Schiff an allen Seiten. Berstend reißt<br> Der Boden unter meinen Füssen auf!<br> Ich fasse Dich mit beiden Armen an!<br> So klammert sich der Schiffer endlich noch<br> Am Felsen fest, an dem er scheitern sollte!<br> </i><br> <b>Johann Wolfgang von Goethe</b>, <i>Torquato Tasso</i>.</p>
If the poem or
extract is very long, you can avoid having to insert the linebreaks
manually by using the <pre>
element, but the
result appears in typewriter type, which is less satisfactory.
However, poetry often consists of lines much narrower than the
width of
the page or screen, so a more balanced visual effect can be achieved by
making use of the fact that the <blockquote>
element is usually indented. The formatting within the block quotation
is the same as within a paragraph, but if you have a heading or other
text preceding the poem, the result is better-spaced:
Storm imagery in drama
The metaphor of the storm-tossed soul on the seas of life is extended by the irony of the sailor saving his life by clinging to the very rocks that wrecked his ship:
Zerbrochen ist das Steuer und es kracht
Das Schiff an allen Seiten. Berstend reißt
Der Boden unter meinen Füssen auf!
Ich fasse Dich mit beiden Armen an!
So klammert sich der Schiffer endlich noch
Am Felsen fest, an dem er scheitern sollte!Johann Wolfgang von Goethe, Torquato Tasso.
Here the attribution has been moved out of the block quotation into another paragraph.
Exercise 8.16: Blocks of text
Use the
<address>
element to add your name and address at the top or bottom of your file, using<br>
to break the lines.Complete the following limerick and display it using the
<blockquote>
element:An HTML hacker called Tom...