Welcome to Day 2! Today you'll learn about HTML, the language WWW hypertext documents are written in, and specifically about the following things:
<TITLE>,
<H1>...<H6>, <P>
HTML stands for HyperText Markup Language. HTML is based on SGML (the Standard Generalized Markup Language), which is used to describe the general structure of various kinds of documents. It is not a page description language like PostScript, nor is it a language that can be easily generated from your favorite page layout program. The focus of HTML is the content of the document, not its appearance. This section explains a little bit more about that.
Figure 3.1. Document elements (3K GIF)
If you've worked with word processing programs that use style sheets (such as Microsoft Word) or paragraph catalogs (such as FrameMaker), then you've done something similar; each section of text conforms to one of a set of styles that are pre-defined before you start working.
The elements of a document are labeled through the use of HTML tags. It is the tags that describe the document; anything that it not a tag is part of the document itself.
With a few minor exceptions, HTML does not describe the appearance or layout of a document. The designers of HTML did this on purpose. Why? Because if you separate the structure of a document and its appearance, you can then quickly and easily change the appearance of that document without a lot of tinkering. You can format the document in different ways for different audiences or for different purposes (printed or online documents, quick reference cards, help systems).Also, you or the readers of your document can reformat any text on the fly to different styles as desired. All that is needed is a formatting tool that can interpret the tags.
Web browsers, in addition to providing the networking functions to retrieve documents over the Net, are also HTML formatters. When you load an HTML document into a browser such as Mosaic or Lynx, that browser reads, or parses, the HTML information and formats the text and images on the screen. If you use different browsers, you may notice that the same document may appear differently in each browser--the headings may be centered in one, or in a larger font.
This does put a wrinkle in how you write and design your Web documents, however, and it may often frustrate you. The number one prevailing rule of designing documents for the Web, as I have mentioned before and I'll mention throughout this book is this:
Do NOT design your documents based on what they look like in one browser. Focus instead on providing clear, well-structured content that is easy to read and understand.
In the current state of HTML, the choices you have for the elements in your document (the tags) are also very limited. There are very few kinds of elements you have to choose from: headings, paragraphs, a few lists are essentially it. You can include images, but you can't align a column of text next to an image. You can't indent text, or center it, or format it into tables.
You also cannot make up your own elements (tags); if you could, how would browsers know how to interpret them?
So the answer is, this is what you're stuck with.
Most browsers available now support what is called HTML Level One--the first version of the HTML specification (consider it to be something like the 1.0 release of a software program). HTML Level One is the base standard for Web documents; a browser must support most, if not all, HTML Level One tags. This book focuses primarily on HTML Level One.
Two other levels of HTML have been proposed: HTML Level Two is similar to HTML Level One, but has additional features to support interactive forms. (You'll learn more about forms in Chapter 13, "Forms and Image Maps."). By the time you read this, most browsers should be able to handle HTML Level Two; many of the more popular browsers support them now.
HTML Level Three, often called HTML+, is proposed as the next major release of the language. HTML+ includes elements for such things as
<TheTagName> affected text </TheTagName>The tag name itself (here,
TheTagName
), is enclosed in brackets
(<>
).
HTML tags generally have a beginning and an ending tag, surrounding the
text
that they affect. The beginning tag "turns on" a feature (such as headings,
bold, and so on), and the ending tag turns it off. Closing tags generally
have
the tag name preceded by a slash (/
).
Not all HTML tags have a beginning and an end. Some tags are only one-sided, and still other tags are "containers" that hold extra information and text inside the brackets. You'll learn about these tags as the book progresses.
All HTML tags are case-insensitive; that is, you can specify them in upper
or
lower case, or in any mixture. So, <HTML>
is the same as
<html>
is the same as <HtMl>
. I like
to
put my tags in all caps (<HTML>
) so I can pick them out
from
the text better. That's how I'll show them in the examples in this book.
Most Web browsers have a way of viewing the HTML source of the Web page
you're
currently looking at. You may have a menu item or a button for View Source
or
View HTML. In Lynx, the \
(backslash) command toggles between
source view and formatted view.
Some browsers do not have the capability to directly view the source of a Web document, but do allow you to save the current page as a file to your local disk. Under a dialog box for saving the file, there may be a menu of formats; for example, Text, PostScript, or HTML. You can save the current page as HTML and then open that file in a text editor or word processor to see the HTML source.
Try going to a typical home page, then viewing the source for that page.
For
example, Figure 3.2 shows what the normal NCSA Mosaic home page (URL
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/NCSAMosaicHome.html
)
looks like.
Figure 3.2. Mosaic home pages (35K GIF)
The HTML source of that page should look something like Figure 3.3.
Figure 3.3. Some HTML source (8K GIF)
Try viewing the source of your own favorite Web pages. You should start seeing some similarities in the way pages are organized, and get a feel for the kinds of tags that HTML uses. You can learn a lot about HTML by comparing the text on the screen with the source for that text.
To write an HTML document, all you really need is an editor that can write text (ASCII) files. You can use a plain old text editor (for example, TeachText on the Mac or vi on UNIX), or you can use a full-featured word processor, as long as it can save the files as text only, with no control codes or funny characters.
Open up that text editor, and type the following code. You don't have to understand what any of this means at this point; you'll learn about it later in this chapter. This is just a simple example to get you started:
<HTML><HEAD> <TITLE>My Sample HTML Document</TITLE></HEAD> <BODY> <H1>This is an HTML Document</H1> </BODY></HTML>After you create your HTML file, save it to disk--and remember to save it as a text-only file if you're using a word processor. One other thing to note when you save your file: Many HTML browsers use an extension to determine whether the file is an HTML or a plain file. So when you name the file, give it an extension of .html (.htm on DOS systems); for example,
myexample.html
or homepage.htm
.
Now, start up a Web browser such as Mosaic. You don't have to be connected
to
the Network since you're not going to be opening documents at any other
site
(although your browser may require you to be on a network; since this
varies
from browser to browser give it a try and see what happens). Look in your
browser for a menu item or button for Open Local.... (In Lynx simply use
the
command lynx myfile.html
from a command line). The Open Local
command (or its equivalent) tells the browser to read in an HTML file from
a
local disk, parse it, and display it, as if it were a page already out on
the
Web. Using your browser and the Open Local command, you can write and test
your
HTML files on your computer in the privacy of your own home.
Try opening up the little file you just created in your browser. You should see something like the picture shown in Figure 3.4.
Figure 3.4. The sample HTML file (6K GIF)
If you don't see something like what's in the picture, go back into your text editor and make the change. You don't have to quit your browser; just fix the file and save it again under the same name.
Then, in your browser, choose Reload or its equivalent. (In Lynx, it's Control+R.) The browser will read the new version of your file, and voila, you can edit and preview and edit and preview until you get it right.
NOTE: There's one exception to this rule; a tag called <PRE>. You'll learn about this tag tomorrow in Chapter 5, "More HTML."The advantage of having all white space (spaces, tabs, returns) ignored is that you can put your tags wherever you want to. The following examples all produce the same output. (Try it!)
<H1>If music be the food of love, play on.</H1> <H1> If music be the food of love, play on. </H1> <H1> If music be the food of love, play on. </H1> <H1> If music be the food of love, play on. </H1>
There are programs that can help you write HTML. These programs tend to fall into two categories: editors in which your write HTML directly, and converters, which convert the output of some other word processing program into HTML.
I discuss some of the available HTML-based editors in Chapter 14, "HTML Assistants: Editors and Conveters." For now, if you have an HTML editor, feel free to use it for the examples in this book. If all you have is a text editor, no problem; it just means you'll have to do a little more typing.
What about WYSIWYG editors? The problem is that there's really no such thing as WYSIWYG when you're dealing with HTML, since WYG varies wildly based on the browser that someone is using to read your document. So you could spend hours in a so-called WYSIWYG HTML editor (say, one that makes your documents look just like Mosaic), only to discover that when the output of that editor is read on some other browser, it looks truly awful.
The best way to deal with HTML is not to get too hung up on its appearance. Write clear HTML code and make sure your writing is clear and well-organized, and the appearance will take care of itself.
In many cases, converters can be extremely useful, particularly for putting existing documents on the Web as fast as possible.
However, converters are in no way an ideal environment for HTML development. What converter programs exist (and most of them are shareware or public domain, with little support) are fairly limited, not necessarily by their own features, but mostly by the limitations in HTML itself. No amount of fancy converting is going to make HTML do things that it can't yet do. If a particular capability doesn't exist in HTML, there's nothing the converter can do to solve that.
The other problem with converters is that even though you can do most of your writing and development in a converter with a simple set of formats and low expectations, you will eventually have to go "under the hood" and edit the HTML text yourself. Most converters do not convert images. No converter that I have seen will automate links to documents out on the Web, although a few do links to related local documents.
In other words, even if you've already decided that want to do the bulk of your Web work using a converter, you'll need to know HTML anyhow. So press onward; there's not that much to learn.
Although a "correct" HTML document will always contain these structure tags, if your document does not contain them, most browsers will be able to read it anyway. However, because it is possible that in the future the document structure tags might become required elements, or that tools may come along that require them, if you get in the habit of including the document structure tags now, you won't have to worry about updating all your files later on.
<HTML>
tag, which indicates that the content of this
file is
in the HTML language.All the text and HTML commands in your HTML document should go within the beginning and ending HTML tags, like this:
<HTML> ...your document... </HTML>
<HEAD>
tag specifies that the lines within the
beginning
and ending points of the tag are the prologue to the rest of the file.
There
are generally only a few tags that go into the <HEAD>
portion of the document (most notably, the document title, described
below).
You should never put any of the text of your document into the header.
Here's a typical example of how you would properly use the
<HEAD>
tag (you'll learn about TITLE
later
on):
<HTML> <HEAD> <TITLE>This is the Title.</TITLE> </HEAD> .... </HTML>
<BODY>
tag. In combination with the <HTML>
and
<HEAD>
tags, this looks like this:
<HTML> <HEAD> <TITLE>This is the Title. It will be explained later on</TITLE> </HEAD> <BODY> .... </BODY> </HTML>
<TITLE>
HTML tag. <TITLE>
tags
always go
inside the document header (the <HEAD>
tags), and
describes
the contents of the page, like this:
<HTML> <HEAD> <TITLE>The Lion, The Witch, and the Wardrobe</TITLE> </HEAD> <BODY> ... </BODY> </HTML>You can only have one title in the document, and that title can only contain plain text; that is, there shouldn't be any other tags inside the title.
When you pick a title, try to pick one that is both short and descriptive of the content on the page. Additionally, your title should also be relevant out of context. If someone browsing on the Web followed a random link and ended up on this page, or if they found your title in a friend's browser history list, would they have any idea what this page is about? You may not intend the page to be used independently of the documents you specifically linked to it, but because anyone can link to any page at any time, be prepared for that consequence and pick a helpful title.
Additionally, because many browsers put the title in the title bar of the
window, you may have a limited number of words available. (Although the
text
within the <TITLE>
tag can be of any length, it may be
cut
off by the browser when its displayed.) Here are some other examples of
good
titles:
<TITLE>Poisonous Plants of North America</TITLE> <TITLE>Image Editing: A Tutorial</TITLE> <TITLE>Upcoming Cemetery Tours, Summer 1995</TITLE> <TITLE>Installing The Software: Opening the CD Case</TITLE> <TITLE>Laura Lemay's Awesome Home Page</TITLE>
And some no-so-good titles:
<TITLE>Part Two</TITLE> <TITLE>An Example</TITLE> <TITLE>Nigel Franklin Hobbes</TITLE> <TITLE>Minutes of the Second Meeting of the Fourth Conference of the Committee for the Preservation of English Roses, Day Four, After Lunch</TITLE>
<TITLE>Poisonous Plants of North America</TITLE>Output: See Figures 3.5 and 3.6.
Figure 3.5. The output in Mosaic (5K GIF)
Figure 3.6. The output in Lynx (4K GIF)
<H1>Installing Your Safetee Lock</H1>The numbers indicate heading levels (H1 through H6). The headings themselves, when they're displayed, are not numbered; they're displayed either in bigger or bolder text or centered or underlined or in all caps--n some way that makes them stand out from regular text.
Think of the headings as though they were items in an outline; if the text you're writing about has a structure, use the headings to indicate that structure, as shown in the next code lines. (Note that here I've indented the headings in this example to show the hierarchy better. They don't have to be indented in your document, and, in fact, the indenting will be ignored by the browser.)
<H1>Engine Tune-Up</H1> <H2>Change The Oil</H2> <H2>Adjust the Valves</H2> <H2>Change the Spark Plugs</H2> <H3>Remove the Old Plugs</H3> <H3>Prepare the New Plugs</H3> <H4>Remove the Guards</H4> <H4>Check the Gap</H4> <H4>Apply Anti-Seize Lubricant</H4> <H4>Install the Plugs</H4> <H2>Adjust the Timing<H2>Note that unlike titles, headings can be any length you want them to be, including lines and lines of text (although because headings are emphasized, having lines and lines of emphasized text may be tiring for your reader).
Its a common practice to use a first-level heading at the top of your document to either duplicate the title (which is usually displayed elsewhere), or to provide a shorter or less contextual form of the title. For example, if you had a page that showed several examples of folding bedsheets, part of a long document on how to fold bedsheets, the title might look something like this:
<TITLE>How to Fold Sheets: Some Examples</TITLE>The top-most heading, however, might just say:
<H1>Examples</H1>Don't use headings to do boldface, or to make certain parts of your document stand out more. Although it may look cool on your browser, you don't know what it'll look like when other people uses their own browsers to read your document (and, in fact, it may look really stupid).
Also, it's a good idea to use headings hierarchically; that is, to start your document with a first-level heading and to use the headings in order. Don't skip levels. If you follow a first-level head with a fourth-level head, for example, readers will probably wonder what happened to the second and third level headings in between. Even though you may prefer the look of certain headings in certain places in your browser, they may look entirely different and be confusing to someone else using another browser.
<H1>Engine Tune-Up</H1> <H2>Change The Oil</H2> <H2>Change the Spark Plugs</H2> <H3>Prepare the New Plugs</H3> <H4>Remove the Guards</H4> <H4>Check the Gap</H4>Output: See Figures 3.7 and 3.8.
Figure 3.7. The output in Mosaic (8K GIF)
Figure 3.8. The output in Lynx (5K GIF)
Unfortunately, paragraphs in HTML are slippery things. Between the three
versions of HTML, the definition of a paragraph has changed. The only thing
they agree on is the fact that you indicate a plain text paragraph using
the
<P>
tag.
The first version of HTML specified the <P>
tag as a
one-sided tag. There was no corresponding </P>,
and the
<P>
tag was used to indicate the end of a
paragraph,
not the beginning. So paragraphs in the first version of HTML looked like
this:
The blue sweater was reluctant to be worn, and wrestled with her as she attempted to put it on. The collar was too small, and would not fit over her head, and the arm holes moved seemingly randomly away from her searching hands.<P> Exasperated, she took off the sweater and flung it on the floor. Then she vindictively stomped on it in revenge for its recalcitrant behavior.<P>Most browsers that were written at the time of HTML 1 assume that paragraphs will be formatted this way. When they come across a
<P>
tag,
they
start a new line and add some extra vertical space between the line they
just
ended and the one that they just began, as shown in Figure 3.9.Figure 3.9. How paragraphs are formatted (8K GIF)
In the HTML Level 2 specification and the proposed Level 3 (HTML+) tags,
the
paragraph tag has been revised. In these versions of HTML, the paragraph
tags
are two-sided (<P>...</P>
), but
<P>
indicates the beginning of the paragraph. Also, the closing tag
(</P>
) is optional, presumably to be
backwards-compatible
with the original version of HTML. So the sweater story would look like
this in
the newer versions of HTML:
<P>The blue sweater was reluctant to be worn, and wrestled with her as she attempted to put it on. The collar was too small, and would not fit over her head, and the arm holes moved seemingly randomly away from her searching hands.</P> <P>Exasperated, she took of the sweater and flung it on the floor. Then she vindictively stomped on it in revenge for its recalcitrant behavior.</P>The good news is that if you want to use the new version of the paragraph tag (as I do in all the examples throughout this book), most, if not all, browsers will accept it without complaint. (I haven't found any that have a problem with it.)
However, note that because many browsers expect <P>
to
indicate the end of a paragraph, if you use it at the beginning you
may
end up with extra space in between the first paragraph and the element
before
it, as shown in Figure 3.10.
Figure 3.10. Extra space before paragraphs (8K GIF)
If this bothers you overly much, you can do one of the following:
<P>
as a paragraph separator, rather than
indicating
the beginning or ending of a paragraph.
<P>
in each set of
paragraphs.<P>
tags to pad
extra space around other tags to spread out the text on the page. Once
again,
the cardinal reminder: Design for content, not for appearance. Someone
with a
text-based browser is not going to care much about the extra space you so
carefully put in.
<P>The sweater lay quietly on the floor, seething from its ill treatment. It wasn't its fault that it didn't fit right. It hadn't wanted to be purchased by this ill-mannered woman</P>Output: Figures 3.11 and 3.12 show the output.
Figure 3.11. The output in Mosaic (6K GIF)
Figure 3.12. The output in Lynx (5K GIF)
<!-- This is a comment -->Each line should be individually commented, and it's usually a good idea not to include other HTML tags within comments. (Although this practice isn't strictly illegal, many browsers may get confused when they encounter HTML tags within comments and display them anyway.)
Here are some examples:
<!-- Rewrite this section with less humor --> <!-- Neil helped with this section --> <!-- Go Tigers! -->
This last exercise in this chapter shows you how to create an HTML file that uses the tags you've learned about in this chapter, so you can get a feel for what they look like when they're displayed on-screen and for the sorts of typical mistakes you're going to make. (Everyone makes them, and that's why its often useful to use an HTML editor that does the typing for you. The editor doesn't forget the closing tags, or leave off the slash, or misspell the tag itself.)
So. Create a simple example in that text editor of yours. It doesn't have to say much of anything; in fact, all it needs to include are the structure tags, a title, a couple of headings, and a paragraph or two, Here's an example:
<HTML> <HEAD> <TITLE>Company Profile, Camembert Incorporated</TITLE> </HEAD> <BODY> <H1>Camembert Incorporated</H1> "Many's the long night I dreamed of cheese -- toasted, mostly." -- Robert Louis Stevenson <H2>What We Do</H2> We make cheese. Lots of cheese; more than eight tons of cheese a year. Your Brie, your Gouda, your Havarti, we make it all. <H2>Why We Do It</H2> <P>We are paid an awful lot of money by people who like cheese. So we make more.</P> </BODY> </HTML>Save it to an .html file, and open it in your browser and see how it came out.
If you have access to another browser on your platform, or on another platform, I highly recommend that you try opening the same HTML file there so you can see the differences in appearance between browsers. Sometimes the differences can surprise you; lines that looked fine in one browser will look strange in another browser.
For example, the cheese factory example looks like Figure 3.13 in NCSA Mosaic (the Macintosh version) and like Figure 3.14 in Lynx.
Figure 3.13. The cheese factory in Mosaic (9K GIF)
And looks like this in the character-based Lynx browser:
Figure 3.14. The cheese factory in Lynx (5K GIF)
See what I mean?
In this chapter, you've learned what HTML is and how to write and preview simple HTML files. You've also learned about the HTML tags shown in Table 3.1.
Table 3.1. HTML tags from Chapter 3.
Tag Use ________________________________________________________________________________ <HTML> ... </HTML> The entire HTML document <HEAD> ... </HEAD> The head, or prologue, of the HTML document <BODY> ... </BODY> All the other content in the HTML document <TITLE> ... </TITLE> The title of the document <H1> ... </H1> First-level heading <H2> ... </H2> Second-level heading <H3> ... </H3> Third-level heading <H4> ... </H4> Fourth-level heading <H5> ... </H5> Fifth-level heading <H6> ... </H6> Sixth-level heading <P>... </P> Paragraph <!-- ... --> Comment
At the time, the goal was simply to put hypertext information up on the Net so that it could be easily downloaded and formatted on the fly in a simple, device-independent way. Given those goals, HTML was an ideal language: simple, small, fast to download, and easy to parse. Since then, new features like images and forms and other media have been added. The limitations of HTML didn't become readily apparent until these new browsers and capabilities came along, and more and more people wanted to publish other kinds of information. And it happened so fast!
HTML Level Three should solve many of these limitations. But there's a long way to go yet before HTML allows full control over formatting and layout, simply because of the speed with which it needs to be downloaded over the Net and formatted. If each Web page you viewed took half an hour to load, would you want to read it?
Can I do any formatting of text in HTML?
You can do some formatting to strings of characters; for example, making a word or two bold. You'll learn about this tomorrow, in Chapter 5.
I've noticed in most Web pages that the document structure tags
(<HTML>
, <HEAD>
,
<BODY>
) aren't often used. Do I really need to include them if
pages work just fine without them?
You don't need to, no. Most browsers will handle plain HTML without the document structure tags. But including the tags will allow your documents to be read by more general SGML tools, and to take advantage of features of future browsers. And, it's the "correct" thing to do if you want your documents to conform to true HTML format.
I've seen comments in some HTML files that look like this:
<!-- this is a comment
>
Is that legal?
That's the old form of comments that was used in very early forms of HTML. Although many browsers may still accept it, you should use the new form (and comment each line individually) in your documents.
Copyright (c) 1994 Laura Lemay and Sams Publishing. For more information about this Web book, contact
lemay@lne.com
Home What's Book- Reference Software Overview Talk To Us Page New Store Desk Library
© 1995, Macmillan Publishing USA, a Simon and Schuster Company.