HTML Dialect Rule Reference

Author: Henrik Mikael Kristensen
Date: 11-Aug-2010
Copyright: 2010 - HMK Design
Version: 0.0.8

Introduction

The following rule-sets describe how the dialect parser works. The entire parser is built from the rule blocks or sets below. Actions in the code (the parentheses), are left out for clarity. We start with the types and work our way from the low-level rules and up toward the main parser.

Data Types

block-types

This describes all block types except for path!

block-types: [block! | hash! | list!]

value-types

This rule defines the datatypes that describe values directly, such as numbers, strings and urls. Tags are purposely disallowed.

value-types: [
  money! | binary! | number! | date! | time! | tuple!
  | url! | email! | file! | any-string! | char! | pair!
]

cell-types

This rule defines the allowed datatypes for input parameters for most tags. It should be the same types as allowed in html-gen.

cell-types: [
  ['do block-types]
  | get-word!
  | [
    value-types | block-types | datatype! | word!
    | lit-word! | path! | lit-path!
    | refinement! | logic!
  ]
]

href-types

These rules define URL inputs for use in page-rules.

href-types: [
  ['do block-types] | get-word! | [word! | url! | string! | path! | refinement!]
]

event-types

These rules describe events allowed when submitting a form for use in form-rules.

event-types: [string!]

doc-types

This rule is used as a word-rule.

doc-types: [
  html-2.0-dtd
  | html-3.2-dtd
  | html-4.01-strict
  | html-4.01-transitional
  | html-4.01-frameset
  | xhtml-1.0-strict
  | xhtml-1.0-transitional
  | xhtml-1.0-frameset
  | xhtml-1.0-dtd
  | xhtml-basic-1.0-dtd
  | xhtml-basic-1.1-dtd
  | mathml-1.01-dtd
  | xhtml-mathml-svg-dtd
  | svg-1.0-dtd
  | svg-1.1-full-dtd
  | svg-1.1-basic-dtd
  | svg-1.1-tiny-dtd
]

Rules

Small Rules

Some base rules that are used as elements in larger rules below.

set-val: [get-word!
set-class: [some refinement!]
set-opt-class [set-class | ()]
set-id [issue!]

verbatim-rules

These rules are output directly as they are input (verbatim). They are the first rules in the HTML dialect, as the input is not processed at all.

verbatim-rules: [
  value-types | tag! | lit-word! | path! | lit-path!
  | refinement! | datatype! | logic!
]

eval-rules

These rules are used for the TAG command.

eval-rules: [any ['do block-types | any-type!]]

base-rules

These rules produce various types of common tags and links and are used as base for higher levels of rules.

base-rules: [
  '=== word! opt ['opts block! cell-types] cell-types
  | 'tag into eval-rules
  | 'end-tag
  | block-types
  | 'do block-types
  | ['newline | 'crlf]
]

link-rules

These are the rules used with the at command to produce links using various input formats.

link-rules: [
  'at [
    2 cell-types
      any [
        'vars [block! | object! | get-word!]
        | 'words block!
      ]
  ]
]

image-rules

These are the rules for producing image references.

image-rules: [
  image cell-types
]

table-rules

These are the main rules for building an HTML table. It uses several sub-rules which are described below.

table-rules: [
  'table
    opt 'debug
    [0 2 [set-class | set-id]]
    any [
      [
        'format
          any [row-format-rule block-types]
          any [
            'rows [
              set-val | into table-format-rules
              | table-format-rules
            ]
          ]
      ] | [
        'rows [
          get-word! | into table-row-rules
          | table-row-rules | into table-block-rules
        ]
      ]
    ]
  ]
]

row-format-rule

This rule is used to determine the type of row used for a particular format block.

row-format-rule: [
  opt [
    'first | 'even-last | 'odd-last
    | 'last | 'odd | 'even | 'any
  ]
]

table-row-rules

These rules define how a single row in a table can be shaped.

table-row-rules: [
  some [
    'row any [
      ['cell | 'header]
      any [
        'colspan integer! | 'align word!
        | 'width integer! opt 'percent
        | set-class | set-id
      ]
      [none! | cell-types]
    ]
  ]
]

table-cell-rule

This table rule is used to generate a table cell, where there are multiple columns per row in the input data, or the input data consists of objects.

table-cell-rule: cell-types

table-row-rule

This table rule is used to generate a table cell, where there is only one column per row in the input data and the input data does not consist of objects.

table-row-rule: cell-types

table-format-rules

These rules are used after the format command and are identical in structure to table-block-rules, however when using blocks of blocks or plain blocks as input, the formatting is ignored.

table-format-rules: [
  any [object! | into [any table-cell-rule] | table-row-rule]
]

table-block-rules

These rules are used after the rows command without using format first. This means objects are just output cell by cell. The data row is parsed the same way as the formatting row.

table-block-rules: [
  any [object! | into [any table-cell-rule] | table-row-rule]
]

tag-rule

These rules are used in cases where a normal HTML tag is wanted. It can be used recursively.

tag-rule: [
  [
    'html | 'head | 'title | 'body | 'p | 'strong
    | 'em | 'b | 'i | 'u | 'tt | 'big | 'small
    | 'strike | 'del | 'pre | 'ul | 'il | 'li
    | 'sup | 'sub | 'samp | 'code | 'blockquote | 'q
    | 'kbd | 'var | 'cite | 'tr | 'th | 'td
    | 'table | 'a | 'div | 'span | 'dl | 'dt | 'dd
    | 'h1 | 'h2 | 'h3 | 'h4 | 'h5 | 'h6
  ]
  0 2 [set-class | set-id]
  opt ['id cell-types]
  [tag-rule | get-word! | cell-types | ()]
]

tag-rules

It looks redundant here, but in the source code, this rule collects all tags properly from recursive runs of tag-rule and generates the required HTML code.

tag-rules: tag-rule

loop-rules

These rules produce loops, and allow traversing data blocks either wholly or partially.

loop-rules: [
  'loop integer! block-types opt ['alternate block-types]
  | 'traverse [block-types | 'do block-types | get-word!]
    opt ['using [lit-word! | word! | block-types | get-word!]]
    block-types
    opt ['alternate block-types]
]

text-format-rules

These rules allow special formatting parsers for text. The rules are meant to be extensible later, and are not really useful now.

text-format-rules: [
  'format [word! function! | 'type word! cell-types | cell-types]
]

form-rules

These rules produce form tags and are considered a higher level of rules. They also manage the form content, either from words or a specific form object.

form-rules: [
  'form cell-types
  opt [get-word! | ['vars | object!]]
  opt ['onsubmit event-types]
  cell-types
  | 'textarea word!
  | ['text | 'hidden | 'password] word!
  | 'checkbox word!
  | 'radio word! cell-types
  | 'select word! ['values | 'key-values |Ê()] cell-types
  | 'button word! string!
  | ['submit | 'reset | 'button] string!
]

page-rules

These rules produce the outer skeleton of the webpage by providing the HEAD and BODY section.

page-rules: [
  'page cell-types any [
    ['redirect | 'refresh] href-types integer!
    | 'favicon href-types
    | 'charset [string! | word!]
    | 'description string!
    | 'robots into [
      some [
        'noindex | 'index | 'nofollow | 'follow
        | 'noarchive | 'nosnippet | 'noodp | 'noydir
      ]
    ]
    | 'css href-types
    | 'rss href-types string!
    | 'atom href-types string!
    | 'script href-types
    | 'style string!
    | 'meta ['name | 'http-equiv] 2 cell-types
  ] block-types
]

error-rules

These rules (actually only one rule for now) are used for handling and printing errors generated by the parser during HTML creation. They will be extended later to become more useful.

error-rules: 'errors

word-rules

The word rules are used for lists of words that are either dynamic to the parser, i.e. lists of words that are built during parsing or are built into the HTML dialect, such as the word list for doc-types.

word-rules: [
  doc-types
  | set-word! [word! | value-types! | block-types]
  | get-word!
  | word!
]

all-rules

These rules are the collection of all the above mentioned rules. These rules are used directly by the parser, and you can see here in which order they are evaluated.

all-rules: [
  any [
    verbatim-rules | base-rules | link-rules | image-rules
    | table-rules | tag-rules | loop-rules | text-format-rules
    | form-rules | page-rules | error-rules | word-rules
  ]
]