FreeBSD 文档计划新手入门读本

The FreeBSD Documentation Project

FreeBSD 中文计划

  感谢您参与 FreeBSD 文档计划,您的点滴贡献,都相当宝贵。

  这份入门介绍了您在开始为 FreeBSD 文档项目提供帮助之前所需要了解的全部内容, 涵盖了从您将使用的工具和软件 (包括必须和推荐使用的) 到文档项目本身的哲学等的各类内容。

  本文档还在草稿,尚未完稿。未完成的章节,我们会在章节名称旁边加注『 * 』以作识别。

重要: 本文中许可证的非官方中文翻译仅供参考, 不作为判定任何责任的依据。如与英文原文有出入,则以英文原文为准。

在满足下列许可条件的前提下, 允许再分发或以源代码 (SGML DocBook) 或 “编译” (SGML, HTML, PDF, PostScript, RTF 等) 的经过修改或未修改的形式:

  1. 再分发源代码 (SGML DocBook) 必须不加修改的保留上述版权告示、 本条件清单和下述弃权书作为该文件的最先若干行。

  2. 再分发编译的形式 (转换为其它DTD、 PDF、 PostScript、 RTF 或其它形式), 必须将上述版权告示、本条件清单和下述弃权书复制到与分发品一同提供的文件, 以及其它材料中。

重要: 本文档由 FREEBSD DOCUMENTATION PROJECT “按现状条件” 提供, 并在此明示不提供任何明示或暗示的保障, 包括但不限于对商业适销性、 对特定目的的适用性的暗示保障。 任何情况下, FREEBSD DOCUMENTATION PROJECT 均不对任何直接、 间接、 偶然、 特殊、 惩罚性的, 或必然的损失 (包括但不限于替代商品或服务的采购、 使用、 数据或利益的损失或营业中断) 负责, 无论是如何导致的并以任何有责任逻辑的, 无论是否是在本文档使用以外以任何方式产生的契约、 严格责任或是民事侵权行为(包括疏忽或其它)中的, 即使已被告知发生该损失的可能性。

Redistribution and use in source (SGML DocBook) and 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code (SGML DocBook) must retain the above copyright notice, this list of conditions and the following disclaimer as the first lines of this file unmodified.

  2. Redistributions in compiled form (transformed to other DTDs, converted to PDF, PostScript, RTF and other formats) must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

重要: THIS DOCUMENTATION IS PROVIDED BY THE FREEBSD DOCUMENTATION PROJECT "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FREEBSD DOCUMENTATION PROJECT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


目录
序言
Shell 提示符
书中所用的编排风格
『Note、Tip、Important、Warning、Example』的运用
感谢
第1章 概论
1.1 FreeBSD 的文档
1.2 在开工之前...
1.3 快速上手篇
第2章 工具
2.1 必备工具
2.1.1 软件
2.1.2 DTDs 及 Entities
2.1.3 样式表(Stylesheets)
2.2 Optional tools
2.2.1 Software
第3章 SGML Primer
3.1 Overview
3.2 Elements, tags, and attributes
3.2.1 For you to do...
3.3 The DOCTYPE declaration
3.3.1 Formal Public Identifiers (FPIs)
3.3.2 Alternatives to FPIs
3.4 Escaping back to SGML
3.5 Comments
3.5.1 For you to do...
3.6 Entities
3.6.1 General Entities
3.6.2 Parameter entities
3.6.3 For you to do...
3.7 Using entities to include files
3.7.1 Using general entities to include files
3.7.2 Using parameter entities to include files
3.7.3 For you to do...
3.8 Marked sections
3.8.1 Marked section keywords
3.8.2 For you to do...
3.9 Conclusion
第4章 SGML Markup
4.1 HTML
4.1.1 Formal Public Identifier (FPI)
4.1.2 Sectional elements
4.1.3 Block elements
4.1.4 In-line elements
4.1.5 Links
4.2 DocBook
4.2.1 FreeBSD extensions
4.2.2 Formal Public Identifier (FPI)
4.2.3 Document structure
4.2.4 Block elements
4.2.5 In-line elements
4.2.6 Images
4.2.7 Links
第5章 * Stylesheets
5.1 * DSSSL
5.2 CSS
5.2.1 The DocBook documents
第6章 Structuring documents under doc/
6.1 The top level, doc/
6.2 The lang.encoding/ directories
6.3 Document specific information
6.3.1 The Handbook
第7章 The Documentation Build Process
7.1 The FreeBSD Documentation Build Toolset
7.2 Understanding Makefiles in the Documentation tree
7.2.1 Subdirectory Makefiles
7.2.2 Documentation Makefiles
7.3 FreeBSD Documentation Project make includes
7.3.1 doc.project.mk
7.3.2 doc.subdir.mk
第8章 The Website
8.1 Preparation
8.1.1 Simple method: Using csup
8.1.2 Advanced method: Maintaining a local CVS doc/www repository
8.2 Build the web pages from scratch
8.3 Install the web pages into your web server
8.4 Environment variables
第9章 Translations
第10章 Writing style
10.1 Style guide
10.1.1 Letter case
10.1.2 Acronyms
10.1.3 Indentation
10.1.4 Tag style
10.1.5 White space changes
10.1.6 Nonbreaking space
10.2 Word list
第11章 Using sgml-mode with Emacs
第12章 See Also
12.1 The FreeBSD Documentation Project
12.2 SGML
12.3 HTML
12.4 DocBook
12.5 The Linux Documentation Project
附录A. Examples
A.1 DocBook <book>
A.2 DocBook <article>
A.3 Producing formatted output
A.3.1 Using Jade
范例清单
例1. 这是举例说明
例3-1. Using an element (start and end tags)
例3-2. Using an element (start tag only)
例3-3. Elements within elements; <em>
例3-4. Using an element with an attribute
例3-5. Single quotes around attributes
例3-6. .profile, for sh(1) and bash(1) users
例3-7. .cshrc, for csh(1) and tcsh(1) users
例3-8. SGML generic comment
例3-9. Erroneous SGML comments
例3-10. Defining general entities
例3-11. Defining parameter entities
例3-12. Using general entities to include files
例3-13. Using parameter entities to include files
例3-14. Structure of a marked section
例3-15. Using a CDATA marked section
例3-16. Using INCLUDE and IGNORE in marked sections
例3-17. Using a parameter entity to control a marked section
例4-1. Normal HTML document structure
例4-2. <h1>, <h2>, etc.
例4-3. Bad ordering of <hn> elements
例4-4. <p>
例4-5. <blockquote>
例4-6. <ul> and <ol>
例4-7. Definition lists with <dl>
例4-8. <pre>
例4-9. Simple use of <table>
例4-10. Using rowspan
例4-11. Using colspan
例4-12. Using rowspan and colspan together
例4-13. <em> and <strong>
例4-14. <b> and <i>
例4-15. <tt>
例4-16. <big>, <small>, and <font>
例4-17. Using <a href="...">
例4-18. Using <a name="...">
例4-19. Linking to a named part of another document
例4-20. Linking to a named part of the same document
例4-21. Boilerplate <book> with <bookinfo>
例4-22. Boilerplate <article> with <articleinfo>
例4-23. A simple chapter
例4-24. Empty chapters
例4-25. Sections in chapters
例4-26. <para>
例4-27. <blockquote>
例4-28. <warning>
例4-29. <itemizedlist>, <orderedlist>, and <procedure>
例4-30. <programlisting>
例4-31. <co> and <calloutlist>
例4-32. <informaltable>
例4-33. Tables where frame="none"
例4-34. <screen>, <prompt>, and <userinput>
例4-35. <emphasis>
例4-36. Quotations
例4-37. Keys, mouse buttons, and combinations
例4-38. Applications, commands, and options.
例4-39. <filename>
例4-40. <filename> tag with package role
例4-41. <devicename>
例4-42. <hostid> and roles
例4-43. <username>
例4-44. <maketarget> and <makevar>
例4-45. <literal>
例4-46. <replaceable>
例4-47. <errorname>
例4-48. id on chapters and sections
例4-49. <anchor>
例4-50. Using <xref>
例4-51. Using <link>
例4-52. <ulink>
例A-1. DocBook <book>
例A-2. DocBook <article>
例A-3. Converting DocBook to HTML (one large file)
例A-4. Converting DocBook to HTML (several small files)
例A-5. Converting DocBook to Postscript
例A-6. Converting DocBook to PDF

序言

Shell 提示符

  下表显示出一般账号与 root 的提示符, 在所有的文档例子中会用提示符 (prompt) 来提醒您该用哪种账号才对。

账号 提示符
普通账号 %
root #

书中所用的编排风格

  下表为本书中所使用编排风格方式:

代表意义 举例
命令名。 使用 ls -l 来列出所有文件。
文件名。 编辑您的 .login 文件。
屏幕上来自计算机的输出。
You have mail.
在输入指令后,屏幕上会出现的对应内容:
% su
Password:
讲到要参考的联机手册时... 使用 su(1) 来改变用户名。
在讲到账号(user)、群组(group)的名称的时候... 只有 root 能够完成此任务。
语气的强调。 必须 这样做。
打指令时可替换的部份(可改为符合自己机器上的文件名、设备或其它类似部份)。 要删除文件, 输入 rm 文件名
环境变量设定 $HOME 是您的主目录。

『Note、Tip、Important、Warning、Example』的运用

  以下文字是『注意(Note)』、『技巧(Tip)』、『重要讯息(Important)』、『警告(Warning)』、『范例(Example)』的运用。

注意: 需要注意的事项将如此表示, 其中包括您需要注意的事情, 因为它可能会影响到您的操作。

提示: 技巧将如此表示, 并提供可能对您有用, 或简化您的操作方式的内容。

重要: 重要信息将如此表示。 一般来说, 它们会包括您操作时需要附加的额外参数。

警告: 警告将如此表示, 并介绍如果您不注意则可能导致损失的事项。 这些损失可能是对您或硬件的物理损害, 也可能是逻辑损害, 例如在疏忽中删除重要文件。

例 1. 这是举例说明

例子将如此表示, 通常包含您应遵循完成的示例, 或展示某一特定动作所预期的结果。


感谢

  在此要感谢 Sue Blake, Patrick Durusau, Jon Hamilton, Peter Flynn, Christopher Maden 这些人的协助与阅读初期草稿,并提供许多宝贵的润稿意见与评论。


第1章  概论

  欢迎参与 FreeBSD 文档计划。 维持高品质的文档对 FreeBSD 的成功来说至关重要, 而 FreeBSD 文档计划 (以下皆以 FDP 来代表 FreeBSD Documentation Project 的缩写) 则与这些文档的撰写、 更新息息相关,因此您的点滴贡献都是十分宝贵的。

  这份文档最主要的目的,就是清楚告诉您:『FDP 的架构有哪些』、『如何撰写并提交文档给 FDP』、 『如何有效运用工具来协助撰稿』。

   我们欢迎每个热心的志士来加入 FDP 行列。FDP 并不限定每月必须交出多少稿量,才能加入。 您唯一须要作的就是订阅 FreeBSD 文档计划邮件列表

  读完本份文档, 您将:


1.1 FreeBSD 的文档

  FDP 总共负责 FreeBSD 的 4 种类别的文档:

联机手册

英文版的系统联机手册并不是由 FDP 所撰写的, 因为它们是基本系统的组成部份。 然而, FDP 可以 (也曾) 修改这些文档, 来让这些文档写得更清楚, 甚至是勘正错误的地方。

翻译团队负责将系统的联机手册翻译为不同的语言。 这些译本将在 FDP 中维护。

FAQ

FAQ 主要是收集在各论坛或 newsgroup 会常问到或有可能会问到的 FreeBSD 相关问题与答案。 (简单讲,就是『问答集』格式) 通常会摆在这里面的问答格式, 不会放太长的详细内容。

手册

手册主要的目标是为 FreeBSD 用户提供详尽的联机参考资料。

Web site

FreeBSD 主要各项介绍方面的 WWW 部份, 欢迎逛逛 http://www.FreeBSD.org/ 以及许多其它 mirror 站。 这网站是许多人第一次接触 FreeBSD 的地方。

  文档中的这四个组成部分都通过 FreeBSD CVS 来提供。 这意味着这些文件的修改记录对于任何人都是可见的, 并且, 任何人都可以通过类似 CVSupCTM 这样的程序来在本地保存文档的副本。

  此外, 许多人会撰写教程或维护其他关于 FreeBSD 的网站。 这些资料有些也会保存在 CVS 文档库中 (如果作者同意的话)。 其他情况下, 文档的作者可能希望在 FreeBSD 主文档库以外的地方来保存他们的文档。 FDP 努力提供尽可能多的此类文档的链接。


1.2 在开工之前...

  本文假设您已经了解:

  • 如何从 FreeBSD CVS 文档库中更新自己计算机上的 FreeBSD 文档 部份 (通过 CVSCVSup 或是 CTM) 或是用 CVSup 来下载 检出版本 的副本

  • 如何用 FreeBSD Ports 套件管理机制或 pkg_add(1) 来下载、 安装软件。


1.3 快速上手篇

  若想先自行试试看, 并有信心可以作得到, 那么就照下面步骤做吧。

  1. 安装 textproc/docproj 这个组合型 port (meta-port)。

    # cd /usr/ports/textproc/docproj
    # make JADETEX=no install
    
  2. 获取一份 FreeBSD doc 部分的本地副本。 您可以使用 CVSup 的 checkout 模式来完成此项工作, 也可以获取一份完整的本地 CVS 文档库副本。

    如果您在本地维护 CVS 文档库的副本, 则您还至少需要检出 doc/share, 以及 doc/en_US.ISO8859-1/share 这些目录。

    % cvs checkout doc/share
    % cvs checkout doc/en_US.ISO8859-1/share
    

    若硬盘空间够大的话,那可以把所有 doc 都检出:

    % cvs checkout doc
    
  3. 如果您打算修改某个现有的书籍或文章, 就可以根据需要从文档库中检出。 如果您计划撰写新书或文章, 则可以以现有的作为例子来进行。

    举例来说,若想写篇新文章,内容是有关在 FreeBSD 与 Windows 2000 之间建立 VPN 联机, 那么可以照类似下面这样的作法:

    1. 检出 articles 目录。

      % cvs checkout doc/en_US.ISO8859-1/articles
      
    2. 复制现有的文章作为模板。 在这个例子中, 您打算决定把新文章放在 vpn-w2k 的目录下。

      % cd doc/en_US.ISO8859-1/articles
      % cp -R committers-guide vpn-w2k
      

    若是要修改现有文章,像是 FAQ (摆在 doc/en_US.ISO8859-1/books/faq), 那么要从 repository 中取出来(check out):

    % cvs checkout doc/en_US.ISO8859-1/books/faq
    
  4. 以编辑器来编写 .sgml 文件。

  5. lint 当辅助参数, 来快速检测文档结构及其中的超链接有无错误, 以下这个指令, 实际上不会进行耗时的编书过程, 只是先测试文档有无错误。

    % make lint
    

    如果您已经为构建文档做好了准备, 则可以通过 FORMATS 变量来指定生成格式。 目前支持的格式共有: htmlhtml-splittxtpspdf, 以及 rtf。 支持格式的最新列表, 可以在 doc/share/mk/doc.docbook.mk 文件的开头找到。 务必注意, 在单个命令行中指定多种格式时, 应使用引号将这些格式括起来。

    举例来说,若只要 html 格式就好,那么就打:

    % make FORMATS=html
    

    但若希望有 htmltxt 格式的话, 你可能要打两次 make(1) 指令才能完成:

    % make FORMATS=html
    % make FORMATS=txt
    

    其实,也可以用单一指令来完成:

    % make FORMATS="html txt"
    
  6. 最后,以 send-pr(1) 来提交修改的部份。


第2章  工具

  FDP 使用一系列工具来协助管理 FreeBSD 文件、转换文件格式等等的工作。 因此, 若要进行 FDP 工作的话, 就必须要学会这些工具才行。

  这些工具都可以用 Ports 或 Packages 来安装, 以节省许多安装的工夫。

  您必须安装这些工具, 才能使用接下来各章节会介绍到的例子。 这些工具的用法,会在后续相关章节谈到。

尽可能用 textproc/docproj: 如果通过 textproc/docproj port 来安装, 就可以省下许多时间。 这是一个 组合型 port, 也就是它本身并不包含任何软件。 相反, 它依赖许多其他 ports 的正确安装。 安装这个 port 应该会 自动下载并安装这一章所介绍的所有您需要的软件包。

您首先需要的一个软件包是 JadeTeX 宏套件。 这个套件本身则依赖一份装好的 TeXTeX 是一个大型软件包, 只有在您希望生成 Postscript 或 PDF 输出时才需要它。

为了节约您的时间和磁盘空间, 您应在安装这一 port 时指定是否希望安装 JadeTeX (它会连带安装 TeX)。 您可以根据需要:

# make JADETEX=yes install

或者

# make JADETEX=no install

。 另外, 您也可以安装 textproc/docproj-jadetextextproc/docproj-nojadetex。 这些子 port 会自动为您定义 JADETEX 变量, 以确保在您的机器上安装的是同样的应用程序。 需要注意的是, 如果您不安装 JadeTeX 的话, 就只能生成 HTML 或 ASCII 文本的输出。 而如果希望生成 PostScript 或 PDF 的输出结果, 就需要安装 TeX 了。


2.1 必备工具

2.1.1 软件

  These programs are required before you can usefully work with the FreeBSD documentation, and they will allow you to convert the documentation to HTML, plain text, and RTF formats. They are all included in textproc/docproj.

Jade (textproc/jade)

A DSSSL implementation. Used for converting marked up documents to other formats, including HTML and TeX.

Tidy (www/tidy)

An HTML “pretty printer”, used to reformat some of the automatically generated HTML so that it is easier to follow.

Links (www/links)

A text-mode WWW browser that can also convert HTML files to plain text.

peps (graphics/peps)

Some of the documentation includes images, some of which are stored as EPS files. These must be converted to PNG before most web browsers will display them.


2.1.2 DTDs 及 Entities

  These are the DTDs and entity sets used by the FDP. They need to be installed before you can work with any of the documentation.

HTML DTD (textproc/html)

HTML is the markup language of choice for the World Wide Web, and is used throughout the FreeBSD web site.

DocBook DTD (textproc/docbook)

DocBook is designed for marking up technical documentation. All the FreeBSD documentation is written in DocBook.

ISO 8879 entities (textproc/iso8879)

19 of the ISO 8879:1986 character entity sets used by many DTDs. Includes named mathematical symbols, additional characters in the Latin character set (accents, diacriticals, and so on), and Greek symbols.


2.1.3 样式表(Stylesheets)

  The stylesheets are used when converting and formatting the documentation for display on screen, printing, and so on.

Modular DocBook Stylesheets (textproc/dsssl-docbook-modular)

The Modular DocBook Stylesheets are used when converting documentation marked up in DocBook to other formats, such as HTML or RTF.


2.2 Optional tools

  You do not need to have any of the following installed. However, you may find it easier to work with the documentation if you do, and they may give you more flexibility in the output formats that can be generated.


2.2.1 Software

JadeTeX and teTeX (print/jadetex and print/teTeX)

Jade and teTeX are used to convert DocBook documents to DVI, Postscript, and PDF formats. The JadeTeX macros are needed in order to do this.

If you do not intend to convert your documentation to one of these formats (i.e., HTML, plain text, and RTF are sufficient) then you do not need to install JadeTeX and teTeX. This can be a significant space and time saver, as teTeX is over 30MB in size.

重要: If you decide to install JadeTeX and teTeX then you will need to configure teTeX after JadeTeX has been installed. print/jadetex/pkg-message contains detailed instructions explaining what you need to do.

Emacs or XEmacs (editors/emacs or editors/xemacs)

Both these editors include a special mode for editing documents marked up according to an SGML DTD. This mode includes commands to reduce the amount of typing you need, and help reduce the possibility of errors.

You do not need to use them; any text editor can be used to edit marked up documents. You may find they make you more efficient.

  If anyone has recommendations for other software that is useful when manipulating SGML documents, please let 文档工程组 know, so they can be added to this list.


第3章  SGML Primer

  The majority of FDP documentation is written in applications of SGML. This chapter explains exactly what that means, how to read and understand the source to the documentation, and the sort of SGML tricks you will see used in the documentation.

  Portions of this section were inspired by Mark Galassi's Get Going With DocBook.


3.1 Overview

  Way back when, electronic text was simple to deal with. Admittedly, you had to know which character set your document was written in (ASCII, EBCDIC, or one of a number of others) but that was about it. Text was text, and what you saw really was what you got. No frills, no formatting, no intelligence.

  Inevitably, this was not enough. Once you have text in a machine-usable format, you expect machines to be able to use it and manipulate it intelligently. You would like to indicate that certain phrases should be emphasized, or added to a glossary, or be hyperlinks. You might want filenames to be shown in a “typewriter” style font for viewing on screen, but as “italics” when printed, or any of a myriad of other options for presentation.

  It was once hoped that Artificial Intelligence (AI) would make this easy. Your computer would read in the document and automatically identify key phrases, filenames, text that the reader should type in, examples, and more. Unfortunately, real life has not happened quite like that, and our computers require some assistance before they can meaningfully process our text.

  More precisely, they need help identifying what is what. You or I can look at

To remove /tmp/foo use rm(1).

% rm /tmp/foo
and easily see which parts are filenames, which are commands to be typed in, which parts are references to manual pages, and so on. But the computer processing the document cannot. For this we need markup.

  “Markup” is commonly used to describe “adding value” or “increasing cost”. The term takes on both these meanings when applied to text. Markup is additional text included in the document, distinguished from the document's content in some way, so that programs that process the document can read the markup and use it when making decisions about the document. Editors can hide the markup from the user, so the user is not distracted by it.

  The extra information stored in the markup adds value to the document. Adding the markup to the document must typically be done by a person──after all, if computers could recognize the text sufficiently well to add the markup then there would be no need to add it in the first place. This increases the cost (i.e., the effort required) to create the document.

  The previous example is actually represented in this document like this:

<para>To remove <filename>/tmp/foo</filename> use &man.rm.1;.</para>

<screen>&prompt.user; <userinput>rm /tmp/foo</userinput></screen>

  As you can see, the markup is clearly separate from the content.

  Obviously, if you are going to use markup you need to define what your markup means, and how it should be interpreted. You will need a markup language that you can follow when marking up your documents.

  Of course, one markup language might not be enough. A markup language for technical documentation has very different requirements than a markup language that was to be used for cookery recipes. This, in turn, would be very different from a markup language used to describe poetry. What you really need is a first language that you use to write these other markup languages. A meta markup language.

  This is exactly what the Standard Generalized Markup Language (SGML) is. Many markup languages have been written in SGML, including the two most used by the FDP, HTML and DocBook.

  Each language definition is more properly called a Document Type Definition (DTD). The DTD specifies the name of the elements that can be used, what order they appear in (and whether some markup can be used inside other markup) and related information. A DTD is sometimes referred to as an application of SGML.

  A DTD is a complete specification of all the elements that are allowed to appear, the order in which they should appear, which elements are mandatory, which are optional, and so forth. This makes it possible to write an SGML parser which reads in both the DTD and a document which claims to conform to the DTD. The parser can then confirm whether or not all the elements required by the DTD are in the document in the right order, and whether there are any errors in the markup. This is normally referred to as “validating the document”.

注意: This processing simply confirms that the choice of elements, their ordering, and so on, conforms to that listed in the DTD. It does not check that you have used appropriate markup for the content. If you tried to mark up all the filenames in your document as function names, the parser would not flag this as an error (assuming, of course, that your DTD defines elements for filenames and functions, and that they are allowed to appear in the same place).

  It is likely that most of your contributions to the Documentation Project will consist of content marked up in either HTML or DocBook, rather than alterations to the DTDs. For this reason this book will not touch on how to write a DTD.


3.2 Elements, tags, and attributes

  All the DTDs written in SGML share certain characteristics. This is hardly surprising, as the philosophy behind SGML will inevitably show through. One of the most obvious manifestations of this philosophy is that of content and elements.

  Your documentation (whether it is a single web page, or a lengthy book) is considered to consist of content. This content is then divided (and further subdivided) into elements. The purpose of adding markup is to name and identify the boundaries of these elements for further processing.

  For example, consider a typical book. At the very top level, the book is itself an element. This “book” element obviously contains chapters, which can be considered to be elements in their own right. Each chapter will contain more elements, such as paragraphs, quotations, and footnotes. Each paragraph might contain further elements, identifying content that was direct speech, or the name of a character in the story.

  You might like to think of this as “chunking” content. At the very top level you have one chunk, the book. Look a little deeper, and you have more chunks, the individual chapters. These are chunked further into paragraphs, footnotes, character names, and so on.

  Notice how you can make this differentiation between different elements of the content without resorting to any SGML terms. It really is surprisingly straightforward. You could do this with a highlighter pen and a printout of the book, using different colors to indicate different chunks of content.

  Of course, we do not have an electronic highlighter pen, so we need some other way of indicating which element each piece of content belongs to. In languages written in SGML (HTML, DocBook, et al) this is done by means of tags.

  A tag is used to identify where a particular element starts, and where the element ends. The tag is not part of the element itself. Because each DTD was normally written to mark up specific types of information, each one will recognize different elements, and will therefore have different names for the tags.

  For an element called element-name the start tag will normally look like <element-name>. The corresponding closing tag for this element is </element-name>.

例 3-1. Using an element (start and end tags)

HTML has an element for indicating that the content enclosed by the element is a paragraph, called p. This element has both start and end tags.

<p>This is a paragraph.  It starts with the start tag for
  the 'p' element, and it will end with the end tag for the 'p'
  element.</p>

<p>This is another paragraph.  But this one is much shorter.</p>

  Not all elements require an end tag. Some elements have no content. For example, in HTML you can indicate that you want a horizontal line to appear in the document. Obviously, this line has no content, so just the start tag is required for this element.

例 3-2. Using an element (start tag only)

HTML has an element for indicating a horizontal rule, called hr. This element does not wrap content, so only has a start tag.

<p>This is a paragraph.</p>

<hr>

<p>This is another paragraph.  A horizontal rule separates this
  from the previous paragraph.</p>

  If it is not obvious by now, elements can contain other elements. In the book example earlier, the book element contained all the chapter elements, which in turn contained all the paragraph elements, and so on.

例 3-3. Elements within elements; <em>

<p>This is a simple <em>paragraph</em> where some
  of the <em>words</em> have been <em>emphasized</em>.</p>

  The DTD will specify the rules detailing which elements can contain other elements, and exactly what they can contain.

重要: People often confuse the terms tags and elements, and use the terms as if they were interchangeable. They are not.

An element is a conceptual part of your document. An element has a defined start and end. The tags mark where the element starts and end.

When this document (or anyone else knowledgeable about SGML) refers to “the <p> tag” they mean the literal text consisting of the three characters <, p, and >. But the phrase “the <p> element” refers to the whole element.

This distinction is very subtle. But keep it in mind.

  Elements can have attributes. An attribute has a name and a value, and is used for adding extra information to the element. This might be information that indicates how the content should be rendered, or might be something that uniquely identifies that occurrence of the element, or it might be something else.

  An element's attributes are written inside the start tag for that element, and take the form attribute-name="attribute-value".

  In sufficiently recent versions of HTML, the <p> element has an attribute called align, which suggests an alignment (justification) for the paragraph to the program displaying the HTML.

  The align attribute can take one of four defined values, left, center, right and justify. If the attribute is not specified then the default is left.

例 3-4. Using an element with an attribute

<p align="left">The inclusion of the align attribute
  on this paragraph was superfluous, since the default is left.</p>

<p align="center">This may appear in the center.</p>

  Some attributes will only take specific values, such as left or justify. Others will allow you to enter anything you want. If you need to include quotes (") within an attribute then use single quotes around the attribute value.

例 3-5. Single quotes around attributes

<p align='right'>I am on the right!</p>

  Sometimes you do not need to use quotes around attribute values at all. However, the rules for doing this are subtle, and it is far simpler just to always quote your attribute values.

  The information on attributes, elements, and tags is stored in SGML catalogs. The various Documentation Project tools use these catalog files to validate your work. The tools in textproc/docproj include a variety of SGML catalog files. The FreeBSD Documentation Project includes its own set of catalog files. Your tools need to know about both sorts of catalog files.


3.2.1 For you to do...

  In order to run the examples in this document you will need to install some software on your system and ensure that an environment variable is set correctly.

  1. Download and install textproc/docproj from the FreeBSD ports system. This is a meta-port that should download and install all of the programs and supporting files that are used by the Documentation Project.

  2. Add lines to your shell startup files to set SGML_CATALOG_FILES. (If you are not working on the English version of the documentation, you will want to substitute the correct directory for your language.)

    例 3-6. .profile, for sh(1) and bash(1) users

    SGML_ROOT=/usr/local/share/sgml        
    SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog
    SGML_CATALOG_FILES=${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=/usr/doc/share/sgml/catalog:$SGML_CATALOG_FILES
    SGML_CATALOG_FILES=/usr/doc/en_US.ISO8859-1/share/sgml/catalog:$SGML_CATALOG_FILES
    export SGML_CATALOG_FILES
    

    例 3-7. .cshrc, for csh(1) and tcsh(1) users

    setenv SGML_ROOT /usr/local/share/sgml
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES /usr/doc/share/sgml/catalog:$SGML_CATALOG_FILES
    setenv SGML_CATALOG_FILES /usr/doc/en_US.ISO8859-1/share/sgml/catalog:$SGML_CATALOG_FILES
    

    Then either log out, and log back in again, or run those commands from the command line to set the variable values.

  1. Create example.sgml, and enter the following text:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    
    <html>
      <head>         
        <title>An example HTML file</title>
      </head>
    
      <body>        
        <p>This is a paragraph containing some text.</p>
    
        <p>This paragraph contains some more text.</p>
    
        <p align="right">This paragraph might be right-justified.</p>
      </body>       
    </html>
    
  2. Try to validate this file using an SGML parser.

    Part of textproc/docproj is the nsgmls validating parser. Normally, nsgmls reads in a document marked up according to an SGML DTD and returns a copy of the document's Element Structure Information Set (ESIS, but that is not important right now).

    However, when nsgmls is given the -s parameter, nsgmls will suppress its normal output, and just print error messages. This makes it a useful way to check to see if your document is valid or not.

    Use nsgmls to check that your document is valid:

    % nsgmls -s example.sgml
    

    As you will see, nsgmls returns without displaying any output. This means that your document validated successfully.

  3. See what happens when required elements are omitted. Try removing the <title> and </title> tags, and re-run the validation.

    % nsgmls -s example.sgml
    nsgmls:example.sgml:5:4:E: character data is not allowed here
    nsgmls:example.sgml:6:8:E: end tag for "HEAD" which is not finished
    

    The error output from nsgmls is organized into colon-separated groups, or columns.

    Column Meaning
    1 The name of the program generating the error. This will always be nsgmls.
    2 The name of the file that contains the error.
    3 Line number where the error appears.
    4 Column number where the error appears.
    5 A one letter code indicating the nature of the message. I indicates an informational message, W is for warnings, and E is for errors[a], and X is for cross-references. As you can see, these messages are errors.
    6 The text of the error message.
    表注:
    a. It is not always the fifth column either. nsgmls -sv displays nsgmls:I: SP version "1.3" (depending on the installed version). As you can see, this is an informational message.

    Simply omitting the <title> tags has generated 2 different errors.

    The first error indicates that content (in this case, characters, rather than the start tag for an element) has occurred where the SGML parser was expecting something else. In this case, the parser was expecting to see one of the start tags for elements that are valid inside <head> (such as <title>).

    The second error is because <head> elements must contain a <title> element. Because it does not nsgmls considers that the element has not been properly finished. However, the closing tag indicates that the element has been closed before it has been finished.

  4. Put the title element back in.


3.3 The DOCTYPE declaration

  The beginning of each document that you write must specify the name of the DTD that the document conforms to. This is so that SGML parsers can determine the DTD and ensure that the document does conform to it.

  This information is generally expressed on one line, in the DOCTYPE declaration.

  A typical declaration for a document written to conform with version 4.0 of the HTML DTD looks like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN">

  That line contains a number of different components.

<!

Is the indicator that indicates that this is an SGML declaration. This line is declaring the document type.

DOCTYPE

Shows that this is an SGML declaration for the document type.

html

Names the first element that will appear in the document.

PUBLIC "-//W3C//DTD HTML 4.0//EN"

Lists the Formal Public Identifier (FPI) for the DTD that this document conforms to. Your SGML parser will use this to find the correct DTD when processing this document.

PUBLIC is not a part of the FPI, but indicates to the SGML processor how to find the DTD referenced in the FPI. Other ways of telling the SGML parser how to find the DTD are shown later.

>

Returns to the document.


3.3.1 Formal Public Identifiers (FPIs)

注意: You do not need to know this, but it is useful background, and might help you debug problems when your SGML processor can not locate the DTD you are using.

  FPIs must follow a specific syntax. This syntax is as follows:

"Owner//Keyword Description//Language"
Owner

This indicates the owner of the FPI.

If this string starts with “ISO” then this is an ISO owned FPI. For example, the FPI "ISO 8879:1986//ENTITIES Greek Symbols//EN" lists ISO 8879:1986 as being the owner for the set of entities for Greek symbols. ISO 8879:1986 is the ISO number for the SGML standard.

Otherwise, this string will either look like -//Owner or +//Owner (notice the only difference is the leading + or -).

If the string starts with - then the owner information is unregistered, with a + it identifies it as being registered.

ISO 9070:1991 defines how registered names are generated; it might be derived from the number of an ISO publication, an ISBN code, or an organization code assigned according to ISO 6523. In addition, a registration authority could be created in order to assign registered names. The ISO council delegated this to the American National Standards Institute (ANSI).

Because the FreeBSD Project has not been registered the owner string is -//FreeBSD. And as you can see, the W3C are not a registered owner either.

Keyword

There are several keywords that indicate the type of information in the file. Some of the most common keywords are DTD, ELEMENT, ENTITIES, and TEXT. DTD is used only for DTD files, ELEMENT is usually used for DTD fragments that contain only entity or element declarations. TEXT is used for SGML content (text and tags).

Description

Any description you want to supply for the contents of this file. This may include version numbers or any short text that is meaningful to you and unique for the SGML system.

Language

This is an ISO two-character code that identifies the native language for the file. EN is used for English.


3.3.1.1 catalog files

  If you use the syntax above and process this document using an SGML processor, the processor will need to have some way of turning the FPI into the name of the file on your computer that contains the DTD.

  In order to do this it can use a catalog file. A catalog file (typically called catalog) contains lines that map FPIs to filenames. For example, if the catalog file contained the line:

PUBLIC "-//W3C//DTD HTML 4.0//EN"             "4.0/strict.dtd"

  The SGML processor would know to look up the DTD from strict.dtd in the 4.0 subdirectory of whichever directory held the catalog file that contained that line.

  Look at the contents of /usr/local/share/sgml/html/catalog. This is the catalog file for the HTML DTDs that will have been installed as part of the textproc/docproj port.


3.3.1.2 SGML_CATALOG_FILES

  In order to locate a catalog file, your SGML processor will need to know where to look. Many of them feature command line parameters for specifying the path to one or more catalogs.

  In addition, you can set SGML_CATALOG_FILES to point to the files. This environment variable should consist of a colon-separated list of catalog files (including their full path).

  Typically, you will want to include the following files:

  • /usr/local/share/sgml/docbook/4.1/catalog

  • /usr/local/share/sgml/html/catalog

  • /usr/local/share/sgml/iso8879/catalog

  • /usr/local/share/sgml/jade/catalog

  You should already have done this.


3.3.2 Alternatives to FPIs

  Instead of using an FPI to indicate the DTD that the document conforms to (and therefore, which file on the system contains the DTD) you can explicitly specify the name of the file.

  The syntax for this is slightly different:

<!DOCTYPE html SYSTEM "/path/to/file.dtd">

  The SYSTEM keyword indicates that the SGML processor should locate the DTD in a system specific fashion. This typically (but not always) means the DTD will be provided as a filename.

  Using FPIs is preferred for reasons of portability. You do not want to have to ship a copy of the DTD around with your document, and if you used the SYSTEM identifier then everyone would need to keep their DTDs in the same place.


3.4 Escaping back to SGML

  Earlier in this primer I said that SGML is only used when writing a DTD. This is not strictly true. There is certain SGML syntax that you will want to be able to use within your documents. For example, comments can be included in your document, and will be ignored by the parser. Comments are entered using SGML syntax. Other uses for SGML syntax in your document will be shown later too.

  Obviously, you need some way of indicating to the SGML processor that the following content is not elements within the document, but is SGML that the parser should act upon.

  These sections are marked by <! ... > in your document. Everything between these delimiters is SGML syntax as you might find within a DTD.

  As you may just have realized, the DOCTYPE declaration is an example of SGML syntax that you need to include in your document...


3.5 Comments

  Comments are an SGML construction, and are normally only valid inside a DTD. However, as 第 3.4 节 shows, it is possible to use SGML syntax within your document.

  The delimiter for SGML comments is the string “--”. The first occurrence of this string opens a comment, and the second closes it.

例 3-8. SGML generic comment

<!-- test comment -->
<!-- This is inside the comment -->

<!-- This is another comment    -->

<!-- This is one way
     of doing multiline comments -->

<!-- This is another way of   --
  -- doing multiline comments -->

  If you have used HTML before you may have been shown different rules for comments. In particular, you may think that the string <!-- opens a comment, and it is only closed by -->.

  This is not the case. A lot of web browsers have broken HTML parsers, and will accept that as valid. However, the SGML parsers used by the Documentation Project are much stricter, and will reject documents that make that error.

例 3-9. Erroneous SGML comments

<!-- This is in the comment --

     THIS IS OUTSIDE THE COMMENT!

  -- back inside the comment -->

The SGML parser will treat this as though it were actually:

<!THIS IS OUTSIDE THE COMMENT>

This is not valid SGML, and may give confusing error messages.

<!--------------- This is a very bad idea --------------->

As the example suggests, do not write comments like that.

<!--===================================================-->

That is a (slightly) better approach, but it still potentially confusing to people new to SGML.


3.5.1 For you to do...

  1. Add some comments to example.sgml, and check that the file still validates using nsgmls.

  2. Add some invalid comments to example.sgml, and see the error messages that nsgmls gives when it encounters an invalid comment.


3.6 Entities

  Entities are a mechanism for assigning names to chunks of content. As an SGML parser processes your document, any entities it finds are replaced by the content of the entity.

  This is a good way to have re-usable, easily changeable chunks of content in your SGML documents. It is also the only way to include one marked up file inside another using SGML.

  There are two types of entities which can be used in two different situations; general entities and parameter entities.


3.6.1 General Entities

  You cannot use general entities in an SGML context (although you define them in one). They can only be used in your document. Contrast this with parameter entities.

  Each general entity has a name. When you want to reference a general entity (and therefore include whatever text it represents in your document), you write &entity-name;. For example, suppose you had an entity called current.version which expanded to the current version number of your product. You could write:

<para>The current version of our product is
  &current.version;.</para>

  When the version number changes you can simply change the definition of the value of the general entity and reprocess your document.

  You can also use general entities to enter characters that you could not otherwise include in an SGML document. For example, < and & cannot normally appear in an SGML document. When the SGML parser sees the < symbol it assumes that a tag (either a start tag or an end tag) is about to appear, and when it sees the & symbol it assumes the next text will be the name of an entity.

  Fortunately, you can use the two general entities &lt; and &amp; whenever you need to include one or other of these.

  A general entity can only be defined within an SGML context. Typically, this is done immediately after the DOCTYPE declaration.

例 3-10. Defining general entities

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY current.version    "3.0-RELEASE">
<!ENTITY last.version       "2.2.7-RELEASE">
]>

Notice how the DOCTYPE declaration has been extended by adding a square bracket at the end of the first line. The two entities are then defined over the next two lines, before the square bracket is closed, and then the DOCTYPE declaration is closed.

The square brackets are necessary to indicate that we are extending the DTD indicated by the DOCTYPE declaration.


3.6.2 Parameter entities

  Like general entities, parameter entities are used to assign names to reusable chunks of text. However, whereas general entities can only be used within your document, parameter entities can only be used within an SGML context.

  Parameter entities are defined in a similar way to general entities. However, instead of using &entity-name; to refer to them, use %entity-name;[1]. The definition also includes the % between the ENTITY keyword and the name of the entity.

例 3-11. Defining parameter entities

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % param.some "some">
<!ENTITY % param.text "text">
<!ENTITY % param.new  "%param.some more %param.text">

<!-- %param.new now contains "some more text" -->
]>

  This may not seem particularly useful. It will be.


3.6.3 For you to do...

  1. Add a general entity to example.sgml.

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [
    <!ENTITY version "1.1">
    ]>    
    
    <html>
      <head>         
        <title>An example HTML file</title>
      </head>
    
      <!-- You might well have some comments in here as well -->
          
      <body>        
        <p>This is a paragraph containing some text.</p>
    
        <p>This paragraph contains some more text.</p>
    
        <p align="right">This paragraph might be right-justified.</p>
    
        <p>The current version of this document is: &version;</p>     
      </body>       
    </html>
    
  2. Validate the document using nsgmls.

  3. Load example.sgml into your web browser (you may need to copy it to example.html before your browser recognizes it as an HTML document).

    Unless your browser is very advanced, you will not see the entity reference &version; replaced with the version number. Most web browsers have very simplistic parsers which do not handle proper SGML[2].

  4. The solution is to normalize your document using an SGML normalizer. The normalizer reads in valid SGML and outputs equally valid SGML which has been transformed in some way. One of the ways in which the normalizer transforms the SGML is to expand all the entity references in the document, replacing the entities with the text that they represent.

    You can use sgmlnorm to do this.

    % sgmlnorm example.sgml > example.html
    

    You should find a normalized (i.e., entity references expanded) copy of your document in example.html, ready to load into your web browser.

  5. If you look at the output from sgmlnorm you will see that it does not include a DOCTYPE declaration at the start. To include this you need to use the -d option:

    % sgmlnorm -d example.sgml > example.html
    

3.7 Using entities to include files

  Entities (both general and parameter) are particularly useful when used to include one file inside another.


3.7.1 Using general entities to include files

  Suppose you have some content for an SGML book organized into files, one file per chapter, called chapter1.sgml, chapter2.sgml, and so forth, with a book.sgml file that will contain these chapters.

  In order to use the contents of these files as the values for your entities, you declare them with the SYSTEM keyword. This directs the SGML parser to use the contents of the named file as the value of the entity.

例 3-12. Using general entities to include files

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY chapter.1 SYSTEM "chapter1.sgml">
<!ENTITY chapter.2 SYSTEM "chapter2.sgml">
<!ENTITY chapter.3 SYSTEM "chapter3.sgml">
<!-- And so forth -->
]>

<html>
  <!-- Use the entities to load in the chapters -->

  &chapter.1;
  &chapter.2;
  &chapter.3;
</html>

警告: When using general entities to include other files within a document, the files being included (chapter1.sgml, chapter2.sgml, and so on) must not start with a DOCTYPE declaration. This is a syntax error.


3.7.2 Using parameter entities to include files

  Recall that parameter entities can only be used inside an SGML context. Why then would you want to include a file within an SGML context?

  You can use this to ensure that you can reuse your general entities.

  Suppose that you had many chapters in your document, and you reused these chapters in two different books, each book organizing the chapters in a different fashion.

  You could list the entities at the top of each book, but this quickly becomes cumbersome to manage.

  Instead, place the general entity definitions inside one file, and use a parameter entity to include that file within your document.

例 3-13. Using parameter entities to include files

First, place your entity definitions in a separate file, called chapters.ent. This file contains the following:

<!ENTITY chapter.1 SYSTEM "chapter1.sgml">
<!ENTITY chapter.2 SYSTEM "chapter2.sgml">
<!ENTITY chapter.3 SYSTEM "chapter3.sgml">

Now create a parameter entity to refer to the contents of the file. Then use the parameter entity to load the file into the document, which will then make all the general entities available for use. Then use the general entities as before:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!-- Define a parameter entity to load in the chapter general entities -->
<!ENTITY % chapters SYSTEM "chapters.ent">

<!-- Now use the parameter entity to load in this file -->
%chapters;
]>

<html>
  &chapter.1;
  &chapter.2;
  &chapter.3;
</html>

3.7.3 For you to do...

3.7.3.1 Use general entities to include files

  1. Create three files, para1.sgml, para2.sgml, and para3.sgml.

    Put content similar to the following in each file:

    <p>This is the first paragraph.</p>
    
  2. Edit example.sgml so that it looks like this:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
    <!ENTITY version "1.1">
    <!ENTITY para1 SYSTEM "para1.sgml">
    <!ENTITY para2 SYSTEM "para2.sgml">
    <!ENTITY para3 SYSTEM "para3.sgml">
    ]>
    
    <html>
      <head>
        <title>An example HTML file</title>
      </head>
    
      <body>
        <p>The current version of this document is: &version;</p>
    
        &para1;
        &para2;
        &para3;
      </body>
    </html>
    
  3. Produce example.html by normalizing example.sgml.

    % sgmlnorm -d example.sgml > example.html
    
  4. Load example.html into your web browser, and confirm that the paran.sgml files have been included in example.html.


3.7.3.2 Use parameter entities to include files

注意: You must have taken the previous steps first.

  1. Edit example.sgml so that it looks like this:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
    <!ENTITY % entities SYSTEM "entities.sgml"> %entities;
    ]>
    
    <html>
      <head>
        <title>An example HTML file</title>
      </head>
    
      <body>
        <p>The current version of this document is: &version;</p>
    
        &para1;
        &para2;
        &para3;
      </body>
    </html>
    
  2. Create a new file, entities.sgml, with this content:

    <!ENTITY version "1.1">
    <!ENTITY para1 SYSTEM "para1.sgml">
    <!ENTITY para2 SYSTEM "para2.sgml">
    <!ENTITY para3 SYSTEM "para3.sgml">
    
  3. Produce example.html by normalizing example.sgml.

    % sgmlnorm -d example.sgml > example.html
    
  4. Load example.html into your web browser, and confirm that the paran.sgml files have been included in example.html.


3.8 Marked sections

  SGML provides a mechanism to indicate that particular pieces of the document should be processed in a special way. These are termed “marked sections”.

例 3-14. Structure of a marked section

<![ KEYWORD [
  Contents of marked section
]]>

  As you would expect, being an SGML construct, a marked section starts with <!.

  The first square bracket begins to delimit the marked section.

  KEYWORD describes how this marked section should be processed by the parser.

  The second square bracket indicates that the content of the marked section starts here.

  The marked section is finished by closing the two square brackets, and then returning to the document context from the SGML context with >.


3.8.1 Marked section keywords

3.8.1.1 CDATA, RCDATA

  These keywords denote the marked sections content model, and allow you to change it from the default.

  When an SGML parser is processing a document it keeps track of what is called the “content model”.

  Briefly, the content model describes what sort of content the parser is expecting to see, and what it will do with it when it finds it.

  The two content models you will probably find most useful are CDATA and RCDATA.

  CDATA is for “Character Data”. If the parser is in this content model then it is expecting to see characters, and characters only. In this model the < and & symbols lose their special status, and will be treated as ordinary characters.

  RCDATA is for “Entity references and character data” If the parser is in this content model then it is expecting to see characters and entities. < loses its special status, but & will still be treated as starting the beginning of a general entity.

  This is particularly useful if you are including some verbatim text that contains lots of < and & characters. While you could go through the text ensuring that every < is converted to a &lt; and every & is converted to a &amp;, it can be easier to mark the section as only containing CDATA. When the SGML parser encounters this it will ignore the < and & symbols embedded in the content.

注意: When you use CDATA or RCDATA in examples of text marked up in SGML, keep in mind that the content of CDATA is not validated. You have to check the included SGML text using other means. You could, for example, write the example in another document, validate the example code, and then paste it to your CDATA content.

例 3-15. Using a CDATA marked section

<para>Here is an example of how you would include some text
  that contained many <literal>&lt;</literal>
  and <literal>&amp;</literal> symbols.  The sample
  text is a fragment of HTML.  The surrounding text (<para> and
  <programlisting>) are from DocBook.</para>

<programlisting>
  <![ CDATA [  
    <p>This is a sample that shows you some of the elements within
      HTML.  Since the angle brackets are used so many times, it is
      simpler to say the whole example is a CDATA marked section
      than to use the entity names for the left and right angle
      brackets throughout.</p>

    <ul>
      <li>This is a listitem</li>
      <li>This is a second listitem</li>
      <li>This is a third listitem</li>
    </ul>

    <p>This is the end of the example.</p>
  ]]>
</programlisting>

If you look at the source for this document you will see this technique used throughout.


3.8.1.2 INCLUDE and IGNORE

  If the keyword is INCLUDE then the contents of the marked section will be processed. If the keyword is IGNORE then the marked section is ignored and will not be processed. It will not appear in the output.

例 3-16. Using INCLUDE and IGNORE in marked sections

<![ INCLUDE [
  This text will be processed and included.
]]>

<![ IGNORE [
  This text will not be processed or included.
]]>

  By itself, this is not too useful. If you wanted to remove text from your document you could cut it out, or wrap it in comments.

  It becomes more useful when you realize you can use parameter entities to control this. Remember that parameter entities can only be used in SGML contexts, and the keyword of a marked section is an SGML context.

  For example, suppose that you produced a hard-copy version of some documentation and an electronic version. In the electronic version you wanted to include some extra content that was not to appear in the hard-copy.

  Create a parameter entity, and set its value to INCLUDE. Write your document, using marked sections to delimit content that should only appear in the electronic version. In these marked sections use the parameter entity in place of the keyword.

  When you want to produce the hard-copy version of the document, change the parameter entity's value to IGNORE and reprocess the document.

例 3-17. Using a parameter entity to control a marked section

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
<!ENTITY % electronic.copy "INCLUDE">         
]]>

...

<![ %electronic.copy [
  This content should only appear in the electronic
  version of the document.
]]>

When producing the hard-copy version, change the entity's definition to:

<!ENTITY % electronic.copy "IGNORE">

On reprocessing the document, the marked sections that use %electronic.copy as their keyword will be ignored.


3.8.2 For you to do...

  1. Create a new file, section.sgml, that contains the following:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [
    <!ENTITY % text.output "INCLUDE">
    ]>
    
    <html>
      <head>
        <title>An example using marked sections</title>
      </head>
    
      <body>     
        <p>This paragraph <![ CDATA [contains many <
          characters (< < < < <) so it is easier
          to wrap it in a CDATA marked section ]]></p>
    
        <![ IGNORE [
        <p>This paragraph will definitely not be included in the
          output.</p>
        ]]>
    
        <![ %text.output [
        <p>This paragraph might appear in the output, or it
          might not.</p>
    
        <p>Its appearance is controlled by the %text.output
          parameter entity.</p>      
        ]]>
      </body>
    </html>
    
  2. Normalize this file using sgmlnorm(1) and examine the output. Notice which paragraphs have appeared, which have disappeared, and what has happened to the content of the CDATA marked section.

  3. Change the definition of the text.output entity from INCLUDE to IGNORE. Re-normalize the file, and examine the output to see what has changed.


3.9 Conclusion

  That is the conclusion of this SGML primer. For reasons of space and complexity several things have not been covered in depth (or at all). However, the previous sections cover enough SGML for you to be able to follow the organization of the FDP documentation.


第4章  SGML Markup

  This chapter describes the two markup languages you will encounter when you contribute to the FreeBSD documentation project. Each section describes the markup language, and details the markup that you are likely to want to use, or that is already in use.

  These markup languages contain a large number of elements, and it can be confusing sometimes to know which element to use for a particular situation. This section goes through the elements you are most likely to need, and gives examples of how you would use them.

  This is not an exhaustive list of elements, since that would just reiterate the documentation for each language. The aim of this section is to list those elements more likely to be useful to you. If you have a question about how best to markup a particular piece of content, please post it to the FreeBSD 文档计划邮件列表.

Inline vs. block: In the remainder of this document, when describing elements, inline means that the element can occur within a block element, and does not cause a line break. A block element, by comparison, will cause a line break (and other processing) when it is encountered.


4.1 HTML

  HTML, the HyperText Markup Language, is the markup language of choice on the World Wide Web. More information can be found at <URL:http://www.w3.org/>.

  HTML is used to markup pages on the FreeBSD web site. It should not (generally) be used to mark up other documentation, since DocBook offers a far richer set of elements to choose from. Consequently, you will normally only encounter HTML pages if you are writing for the web site.

  HTML has gone through a number of versions, 1, 2, 3.0, 3.2, and the latest, 4.0 (available in both strict and loose variants).

  The HTML DTDs are available from the ports collection in the textproc/html port. They are automatically installed as part of the textproc/docproj port.


4.1.1 Formal Public Identifier (FPI)

  There are a number of HTML FPIs, depending upon the version (also known as the level) of HTML that you want to declare your document to be compliant with.

  The majority of HTML documents on the FreeBSD web site comply with the loose version of HTML 4.0.

PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

4.1.2 Sectional elements

  An HTML document is normally split into two sections. The first section, called the head, contains meta-information about the document, such as its title, the name of the author, the parent document, and so on. The second section, the body, contains the content that will be displayed to the user.

  These sections are indicated with <head> and <body> elements respectively. These elements are contained within the top-level <html> element.

例 4-1. Normal HTML document structure

<html>
  <head>
      <title>The document's title</title>
  </head>

  <body>

    ...

  </body>
</html>

4.1.3 Block elements

4.1.3.1 Headings

  HTML allows you to denote headings in your document, at up to six different levels.

  The largest and most prominent heading is <h1>, then <h2>, continuing down to <h6>.

  The element's content is the text of the heading.

例 4-2. <h1>, <h2>, etc.

Use:

<h1>First section</h1>

<!-- Document introduction goes here -->

<h2>This is the heading for the first section</h2>

<!-- Content for the first section goes here -->

<h3>This is the heading for the first sub-section</h3>

<!-- Content for the first sub-section goes here -->

<h2>This is the heading for the second section</h2>

<!-- Content for the second section goes here -->

  Generally, an HTML page should have one first level heading (<h1>). This can contain many second level headings (<h2>), which can in turn contain many third level headings. Each <hn> element should have the same element, but one further up the hierarchy, preceding it. Leaving gaps in the numbering is to be avoided.

例 4-3. Bad ordering of <hn> elements

Use:

<h1>First section</h1>

<!-- Document introduction -->

<h3>Sub-section</h3>

<!-- This is bad, <h2> has been left out -->

4.1.3.2 Paragraphs

  HTML supports a single paragraph element, <p>.

例 4-4. <p>

Use:

<p>This is a paragraph.  It can contain just about any
  other element.</p>

4.1.3.3 Block quotations

  A block quotation is an extended quotation from another document that should not appear within the current paragraph.

例 4-5. <blockquote>

Use:

<p>A small excerpt from the US Constitution:</p>

<blockquote>We the People of the United States, in Order to form
  a more perfect Union, establish Justice, insure domestic
  Tranquility, provide for the common defence, promote the general
  Welfare, and secure the Blessings of Liberty to ourselves and our
  Posterity, do ordain and establish this Constitution for the
  United States of America.</blockquote>

4.1.3.4 Lists

  You can present the user with three types of lists, ordered, unordered, and definition.

  Typically, each entry in an ordered list will be numbered, while each entry in an unordered list will be preceded by a bullet point. Definition lists are composed of two sections for each entry. The first section is the term being defined, and the second section is the definition of the term.

  Ordered lists are indicated by the <ol> element, unordered lists by the <ul> element, and definition lists by the <dl> element.

  Ordered and unordered lists contain listitems, indicated by the <li> element. A listitem can contain textual content, or it may be further wrapped in one or more <p> elements.

  Definition lists contain definition terms (<dt>) and definition descriptions (<dd>). A definition term can only contain inline elements. A definition description can contain other block elements.

例 4-6. <ul> and <ol>

Use:

<p>An unordered list.  Listitems will probably be
  preceded by bullets.</p>

<ul>
  <li>First item</li>

  <li>Second item</li>

  <li>Third item</li>
</ul>

<p>An ordered list, with list items consisting of multiple
  paragraphs.  Each item (note: not each paragraph) will be
  numbered.</p>

<ol>
  <li><p>This is the first item.  It only has one paragraph.</p></li>

  <li><p>This is the first paragraph of the second item.</p>

    <p>This is the second paragraph of the second item.</p></li>

  <li><p>This is the first and only paragraph of the third
    item.</p></li>
</ol>

例 4-7. Definition lists with <dl>

Use:

<dl>
  <dt>Term 1</dt>

  <dd><p>Paragraph 1 of definition 1.</p>

    <p>Paragraph 2 of definition 1.</p></dd>

  <dt>Term 2</dt>

  <dd><p>Paragraph 1 of definition 2.</p></dd>

  <dt>Term 3</dt>

  <dd><p>Paragraph 1 of definition 3.</p></dd>
</dl>

4.1.3.5 Pre-formatted text

  You can indicate that text should be shown to the user exactly as it is in the file. Typically, this means that the text is shown in a fixed font, multiple spaces are not merged into one, and line breaks in the text are significant.

  In order to do this, wrap the content in the <pre> element.

例 4-8. <pre>

You could use <pre> to mark up an email message:

<pre>  From: nik@FreeBSD.org
  To: freebsd-doc@FreeBSD.org
  Subject: New documentation available

  There is a new copy of my primer for contributors to the FreeBSD
  Documentation Project available at

    &lt;URL:http://people.FreeBSD.org/~nik/primer/index.html&gt;

  Comments appreciated.

  N</pre>

Keep in mind that < and & still are recognized as special characters in pre-formatted text. This is why the example shown had to use &lt; instead of <. For consistency, &gt; was used in place of >, too. Watch out for the special characters that may appear in text copied from a plain-text source, e.g., an email message or program code.


4.1.3.6 Tables

注意: Most text-mode browsers (such as Lynx) do not render tables particularly effectively. If you are relying on the tabular display of your content, you should consider using alternative markup to prevent confusion.

  Mark up tabular information using the <table> element. A table consists of one or more table rows (<tr>), each containing one or more cells of table data (<td>). Each cell can contain other block elements, such as paragraphs or lists. It can also contain another table (this nesting can repeat indefinitely). If the cell only contains one paragraph then you do not need to include the <p> element.

例 4-9. Simple use of <table>

Use:

<p>This is a simple 2x2 table.</p>

<table>
  <tr>
    <td>Top left cell</td>

    <td>Top right cell</td>
  </tr>

  <tr>
    <td>Bottom left cell</td>

    <td>Bottom right cell</td>
  </tr>
</table>

  A cell can span multiple rows and columns. To indicate this, add the rowspan and/or colspan attributes, with values indicating the number of rows of columns that should be spanned.

例 4-10. Using rowspan

Use:

<p>One tall thin cell on the left, two short cells next to
  it on the right.</p>

<table>
  <tr>
    <td rowspan="2">Long and thin</td>
  </tr>

  <tr>
    <td>Top cell</td>

    <td>Bottom cell</td>
  </tr>
</table>

例 4-11. Using colspan

Use:

<p>One long cell on top, two short cells below it.</p>

<table>
  <tr>
    <td colspan="2">Top cell</td>
  </tr>

  <tr>
    <td>Bottom left cell</td>

    <td>Bottom right cell</td>
  </tr>
</table>

例 4-12. Using rowspan and colspan together

Use:

<p>On a 3x3 grid, the top left block is a 2x2 set of
  cells merged into one.  The other cells are normal.</p>

<table>
  <tr>
    <td colspan="2" rowspan="2">Top left large cell</td>

    <td>Top right cell</td>
  </tr>

  <tr>
    <!-- Because the large cell on the left merges into
         this row, the first <td> will occur on its
         right -->
        
    <td>Middle right cell</td>
  </tr>

  <tr>
    <td>Bottom left cell</td>

    <td>Bottom middle cell</td>

    <td>Bottom right cell</td>
  </tr>
</table>

4.1.4 In-line elements

4.1.4.1 Emphasizing information

  You have two levels of emphasis available in HTML, <em> and <strong>. <em> is for a normal level of emphasis and <strong> indicates stronger emphasis.

  Typically, <em> is rendered in italic and <strong> is rendered in bold. This is not always the case, however, and you should not rely on it.

例 4-13. <em> and <strong>

Use:

<p><em>This</em> has been emphasized, while
  <strong>this</strong> has been strongly emphasized.</p>

4.1.4.2 Bold and italics

  Because HTML includes presentational markup, you can also indicate that particular content should be rendered in bold or italic. The elements are <b> and <i> respectively.

例 4-14. <b> and <i>

<p><b>This</b> is in bold, while <i>this</i> is
  in italics.</p>

4.1.4.3 Indicating fixed pitch text

  If you have content that should be rendered in a fixed pitch (typewriter) typeface, use <tt> (for “teletype”).

例 4-15. <tt>

Use:

<p>This document was originally written by
  Nik Clayton, who can be reached by email as
  <tt>nik@FreeBSD.org</tt>.</p>

4.1.4.4 Content size

  You can indicate that content should be shown in a larger or smaller font. There are three ways of doing this.

  1. Use <big> and <small> around the content you wish to change size. These tags can be nested, so <big><big>This is much bigger</big></big> is possible.

  2. Use <font> with the size attribute set to +1 or -1 respectively. This has the same effect as using <big> or <small>. However, the use of this approach is deprecated.

  3. Use <font> with the size attribute set to a number between 1 and 7. The default font size is 3. This approach is deprecated.

例 4-16. <big>, <small>, and <font>

The following fragments all do the same thing.

<p>This text is <small>slightly smaller</small>.  But
  this text is <big>slightly bigger</big>.</p>

<p>This text is <font size="-1">slightly smaller</font>.  But
  this text is <font size="+1">slightly bigger</font.</p>

<p>This text is <font size="2">slightly smaller</font>.  But
  this text is <font size="4">slightly bigger</font>.</p>

4.1.5 Links

注意: Links are also in-line elements.


4.1.5.1 Linking to other documents on the WWW

  In order to include a link to another document on the WWW you must know the URL of the document you want to link to.

  The link is indicated with <a>, and the href attribute contains the URL of the target document. The content of the element becomes the link, and is normally indicated to the user in some way (underlining, change of color, different mouse cursor when over the link, and so on).

例 4-17. Using <a href="...">

Use:

<p>More information is available at the
  <a href="http://www.FreeBSD.org/">FreeBSD web site</a>.</p>

  These links will take the user to the top of the chosen document.


4.1.5.2 Linking to other parts of documents

  Linking to a point within another document (or within the same document) requires that the document author include anchors that you can link to.

  Anchors are indicated with <a> and the name attribute instead of href.

例 4-18. Using <a name="...">

Use:

<p><a name="para1">This</a> paragraph can be referenced
  in other links with the name <tt>para1</tt>.</p>

  To link to a named part of a document, write a normal link to that document, but include the name of the anchor after a # symbol.

例 4-19. Linking to a named part of another document

Assume that the para1 example resides in a document called foo.html.

<p>More information can be found in the
  <a href="foo.html#para1">first paragraph</a> of
  <tt>foo.html</tt>.</p>

  If you are linking to a named anchor within the same document then you can omit the document's URL, and just include the name of the anchor (with the preceding #).

例 4-20. Linking to a named part of the same document

Assume that the para1 example resides in this document:

<p>More information can be found in the
  <a href="#para1">first paragraph</a> of this
  document.</p>

4.2 DocBook

  DocBook was originally developed by HaL Computer Systems and O'Reilly & Associates to be a DTD for writing technical documentation [3]. Since 1998 it is maintained by the DocBook Technical Committee. As such, and unlike LinuxDoc and HTML, DocBook is very heavily oriented towards markup that describes what something is, rather than describing how it should be presented.

formal vs. informal: Some elements may exist in two forms, formal and informal. Typically, the formal version of the element will consist of a title followed by the informal version of the element. The informal version will not have a title.

  The DocBook DTD is available from the ports collection in the textproc/docbook port. It is automatically installed as part of the textproc/docproj port.


4.2.1 FreeBSD extensions

  The FreeBSD Documentation Project has extended the DocBook DTD by adding some new elements. These elements serve to make some of the markup more precise.

  Where a FreeBSD specific element is listed below it is clearly marked.

  Throughout the rest of this document, the term “DocBook” is used to mean the FreeBSD extended DocBook DTD.

注意: There is nothing about these extensions that is FreeBSD specific, it was just felt that they were useful enhancements for this particular project. Should anyone from any of the other *nix camps (NetBSD, OpenBSD, Linux, ...) be interested in collaborating on a standard DocBook extension set, please get in touch with 文档工程组 .

  The FreeBSD extensions are not (currently) in the ports collection. They are stored in the FreeBSD CVS tree, as doc/share/sgml/freebsd.dtd.


4.2.2 Formal Public Identifier (FPI)

  In compliance with the DocBook guidelines for writing FPIs for DocBook customizations, the FPI for the FreeBSD extended DocBook DTD is:

PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN"

4.2.3 Document structure

  DocBook allows you to structure your documentation in several ways. In the FreeBSD Documentation Project we are using two primary types of DocBook document: the book and the article.

  A book is organized into <chapter>s. This is a mandatory requirement. There may be <part>s between the book and the chapter to provide another layer of organization. The Handbook is arranged in this way.

  A chapter may (or may not) contain one or more sections. These are indicated with the <sect1> element. If a section contains another section then use the <sect2> element, and so on, up to <sect5>.

  Chapters and sections contain the remainder of the content.

  An article is simpler than a book, and does not use chapters. Instead, the content of an article is organized into one or more sections, using the same <sect1> (and <sect2> and so on) elements that are used in books.

  Obviously, you should consider the nature of the documentation you are writing in order to decide whether it is best marked up as a book or an article. Articles are well suited to information that does not need to be broken down into several chapters, and that is, relatively speaking, quite short, at up to 20-25 pages of content. Books are best suited to information that can be broken up into several chapters, possibly with appendices and similar content as well.

  The FreeBSD tutorials are all marked up as articles, while this document, the FreeBSD FAQ, and the FreeBSD Handbook are all marked up as books.


4.2.3.1 Starting a book

  The content of the book is contained within the <book> element. As well as containing structural markup, this element can contain elements that include additional information about the book. This is either meta-information, used for reference purposes, or additional content used to produce a title page.

  This additional information should be contained within <bookinfo>.

例 4-21. Boilerplate <book> with <bookinfo>

<book>
  <bookinfo>
    <title>Your title here</title>
    
    <author>
      <firstname>Your first name</firstname>
      <surname>Your surname</surname>
      <affiliation>
        <address><email>Your email address</email></address>
      </affiliation>
    </author>

    <copyright>
      <year>1998</year>
      <holder role="mailto:your email address">Your name</holder>
    </copyright>

    <releaseinfo>$FreeBSD$</releaseinfo>

    <abstract>
      <para>Include an abstract of the book's contents here.</para>
    </abstract>
  </bookinfo>

  ...

</book>

4.2.3.2 Starting an article

  The content of the article is contained within the <article> element. As well as containing structural markup, this element can contain elements that include additional information about the article. This is either meta-information, used for reference purposes, or additional content used to produce a title page.

  This additional information should be contained within <articlein