HTML file format

HTML are really nice files, they are straight text, but with it (well and a few other things) you get what you are looking at now.  So, let’s take a look at what an HTML file consists of:

  • Content encoding
  • Doctype declaration
  • HTML block
  • HEAD block
  • BODY block

So, let’s dive into that a bit, and go through this step by step.

Content Encoding

HTML files are usually encoding in utf-8 unicode format.  Which for “8-bit” characters is pretty close to ASCII (in fact all ASCII characters are coded identically in utf-8, though ASCII at times might have transferred them differently being only 7 bits of significant data) but also allows for “extra characters” due to the fact that the first “block” of characters (8 bits) is only part of the set.  In fact, in utf-8, if you need to encode more than the 7 bits of ASCII you will need more bytes per character.

Using ASCII encoding (declaring it) is relatively rarely done these days.  ASCII encoding was used in about 17% of websites in 2012, while utf-8 was used in about 63% of websites.  Other encodings were used as well.

You probably don’t need to worry at all about this, unless you are using characters outside of the ASCII set, and not escaping them (so they can be represented as ASCII).  I have seen some websites where the ‘ displays as 3 characters that don’t look correct.  This is a problem that somewhere along the way utf-8 encoding has gotten shifted to ASCII encoding, and the result is that rather than a “left single quote” you get a meaningless series of characters.

So, save your file in text format, and be careful if you are transferring the file from one place to another, that the encoding isn’t getting broken.  Then you need to look at what you put into the file itself.

Doctype Declaration

The first part (first line) of an HTML file should be a doctype declaration.  There are a number of options, and in the past the “correct” one was difficult to remeber as it was something like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

which you can see is a lot of specific bits of things to remember, now the current version of the same thing is:

<!doctype html>

Which is a lot easier to remember.

While most browser don’t care much at all about this, in order to have things properly formatted it is “supposed” to be there.  The w3schools says that it is “required” in their information about doctype declaration, and the W3C list of valid doctype declarations, has a list they consider valid (though I believe that there are other ones which can be used, as well).  So, if you open with a proper doctype declaration, that’s a “bonus” in my book, but people who are even more pedantic say it’s “required”.  And by the standard (w3c sets the standard) it is required.

You’re unlikely to have anything break if you forget it.

HTML Block

This is something which has not changed since I started working with web development.  The doctype is not part of html, it is just part of the file.  And it does look a lot like an html tag, but it’s not.  It comes before any html, and the first tag (an opening one, which needs to be closed, but closing comes after opening) that you are required to use (again if it’s missing, probably no browser you’re likely to encounter will have any problems with the file as long as it can determine in other ways that it is an html file) is the html tag.

In its most basic form (and I usually don’t modify this myself until I start to look at finalizing stuff about a page, and mostly for accessibility reasons) is:

<html>

Which is really simple. There are a number of attributes you can add, such as:

<html lang="en">

in order to define your entire document (you can use the lang attribute on any tag, as it is a “global” attribute) as being in English. This is helpful in terms of screen reading, or other accessible technologies. It also can help (though usually not by much) search engines to better understand your website.

Once you have that defined (it can only occur once), it needs a closing tag (and also has two required blocks inside it the head and body blocks (again only one of each)), and the closing tag is:

</html>

And as with all closing tags, it is only the / and the tag name, and not all tags have required closing tags.

The HTML closing tag should be the last thing in your document, as anything after it is not supposed to be interpreted (I don’t know what the behaviour of any browsers are if there is a closing HTML tag, with content following it, they are not supposed to interpret it at all, or display anything with regard to it, but I can’t guarantee that all of them follow this behaviour.

Now let’s look at the first of the two required blocks within the html block.

HEAD Block

Most of the information you need to know about the HEAD block, is mentioned regarding the HTML block.  They have different attributes (global ones can be used on any html tag), but the basics are the same.

The HEAD block defines “header information”, most of this isn’t displayed (well it’s not displayed on the web page itself), but is used to define the entire document.  The HEAD block starts with a:

<head>

tag. This is the minimum you will need to use for that, but as with the html tag, you can include attributes. I’m not going to go into that. If you have included an lang attribute in the html tag, unless you are “changing” it, you don’t need to repeat it, and it should only be used where you are changing it (if I use a French word, I might tag it as such, though it is rarely needed unless you are looking at the highest level of accessibility standards).

Likewise you will need to close your head tag. It’s the same as closing the html tag (though it’s a different tag):

</head>

Just that simple.

Now on to your main content in the BODY block.

BODY Block

The body block is where everything that you display goes. There are things which can go into the body block (and certain things may only belong in the body block) which aren’t usually displayed. But everything that is in the “web page” which gets displayed is in the BODY block. (Though there are cases where some of that display is at least partially defined elsewhere)

So, really, not much new to say here:

<body>

and likewise there are attributes that can be applied to it. This is where all your content will go.

Likewise you will want to put the closing tag:

</body>

after to close your BODY block. As this all falls in the HTML block, your HTML closing block is what you need to do right after that.

Resources

Here are a quick list of resources, both general sites (I am starting there) and specific resources for the topics which are included here:

This entry was posted in Education, Web Development, Web Development Basics (edu) and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

One Trackback