HTML Dialect Rule Reference
Author: Henrik Mikael Kristensen Date: 11-Aug-2010 Copyright: 2010 - HMK Design Version: 0.0.8
The following rule-sets describe how the dialect parser works. The entire parser is built from the rule blocks or sets below. Actions in the code (the parentheses), are left out for clarity. We start with the types and work our way from the low-level rules and up toward the main parser.
This describes all block types except for path!
block-types: [block! | hash! | list!]
This rule defines the datatypes that describe values directly, such as numbers, strings and urls. Tags are purposely disallowed.
value-types: [ money! | binary! | number! | date! | time! | tuple! | url! | email! | file! | any-string! | char! | pair! ]
This rule defines the allowed datatypes for input parameters for most tags. It should be the same types as allowed in html-gen.
cell-types: [ ['do block-types] | get-word! | [ value-types | block-types | datatype! | word! | lit-word! | path! | lit-path! | refinement! | logic! ] ]
These rules define URL inputs for use in page-rules.
href-types: [ ['do block-types] | get-word! | [word! | url! | string! | path! | refinement!] ]
These rules describe events allowed when submitting a form for use in form-rules.
This rule is used as a word-rule.
doc-types: [ html-2.0-dtd | html-3.2-dtd | html-4.01-strict | html-4.01-transitional | html-4.01-frameset | xhtml-1.0-strict | xhtml-1.0-transitional | xhtml-1.0-frameset | xhtml-1.0-dtd | xhtml-basic-1.0-dtd | xhtml-basic-1.1-dtd | mathml-1.01-dtd | xhtml-mathml-svg-dtd | svg-1.0-dtd | svg-1.1-full-dtd | svg-1.1-basic-dtd | svg-1.1-tiny-dtd ]
Some base rules that are used as elements in larger rules below.
set-val: [get-word! set-class: [some refinement!] set-opt-class [set-class | ()] set-id [issue!]
These rules are output directly as they are input (verbatim). They are the first rules in the HTML dialect, as the input is not processed at all.
verbatim-rules: [ value-types | tag! | lit-word! | path! | lit-path! | refinement! | datatype! | logic! ]
These rules are used for the TAG command.
eval-rules: [any ['do block-types | any-type!]]
These rules produce various types of common tags and links and are used as base for higher levels of rules.
base-rules: [ '=== word! opt ['opts block! cell-types] cell-types | 'tag into eval-rules | 'end-tag | block-types | 'do block-types | ['newline | 'crlf] ]
These are the rules used with the at command to produce links using various input formats.
link-rules: [ 'at [ 2 cell-types any [ 'vars [block! | object! | get-word!] | 'words block! ] ] ]
These are the rules for producing image references.
image-rules: [ image cell-types ]
These are the main rules for building an HTML table. It uses several sub-rules which are described below.
table-rules: [ 'table opt 'debug [0 2 [set-class | set-id]] any [ [ 'format any [row-format-rule block-types] any [ 'rows [ set-val | into table-format-rules | table-format-rules ] ] ] | [ 'rows [ get-word! | into table-row-rules | table-row-rules | into table-block-rules ] ] ] ] ]
This rule is used to determine the type of row used for a particular format block.
row-format-rule: [ opt [ 'first | 'even-last | 'odd-last | 'last | 'odd | 'even | 'any ] ]
These rules define how a single row in a table can be shaped.
table-row-rules: [ some [ 'row any [ ['cell | 'header] any [ 'colspan integer! | 'align word! | 'width integer! opt 'percent | set-class | set-id ] [none! | cell-types] ] ] ]
This table rule is used to generate a table cell, where there are multiple columns per row in the input data, or the input data consists of objects.
This table rule is used to generate a table cell, where there is only one column per row in the input data and the input data does not consist of objects.
These rules are used after the format command and are identical in structure to table-block-rules, however when using blocks of blocks or plain blocks as input, the formatting is ignored.
table-format-rules: [ any [object! | into [any table-cell-rule] | table-row-rule] ]
These rules are used after the rows command without using format first. This means objects are just output cell by cell. The data row is parsed the same way as the formatting row.
table-block-rules: [ any [object! | into [any table-cell-rule] | table-row-rule] ]
These rules are used in cases where a normal HTML tag is wanted. It can be used recursively.
tag-rule: [ [ 'html | 'head | 'title | 'body | 'p | 'strong | 'em | 'b | 'i | 'u | 'tt | 'big | 'small | 'strike | 'del | 'pre | 'ul | 'il | 'li | 'sup | 'sub | 'samp | 'code | 'blockquote | 'q | 'kbd | 'var | 'cite | 'tr | 'th | 'td | 'table | 'a | 'div | 'span | 'dl | 'dt | 'dd | 'h1 | 'h2 | 'h3 | 'h4 | 'h5 | 'h6 ] 0 2 [set-class | set-id] opt ['id cell-types] [tag-rule | get-word! | cell-types | ()] ]
It looks redundant here, but in the source code, this rule collects all tags properly from recursive runs of tag-rule and generates the required HTML code.
These rules produce loops, and allow traversing data blocks either wholly or partially.
loop-rules: [ 'loop integer! block-types opt ['alternate block-types] | 'traverse [block-types | 'do block-types | get-word!] opt ['using [lit-word! | word! | block-types | get-word!]] block-types opt ['alternate block-types] ]
These rules allow special formatting parsers for text. The rules are meant to be extensible later, and are not really useful now.
text-format-rules: [ 'format [word! function! | 'type word! cell-types | cell-types] ]
These rules produce form tags and are considered a higher level of rules. They also manage the form content, either from words or a specific form object.
form-rules: [ 'form cell-types opt [get-word! | ['vars | object!]] opt ['onsubmit event-types] cell-types | 'textarea word! | ['text | 'hidden | 'password] word! | 'checkbox word! | 'radio word! cell-types | 'select word! ['values | 'key-values |Ê()] cell-types | 'button word! string! | ['submit | 'reset | 'button] string! ]
These rules produce the outer skeleton of the webpage by providing the HEAD and BODY section.
page-rules: [ 'page cell-types any [ ['redirect | 'refresh] href-types integer! | 'favicon href-types | 'charset [string! | word!] | 'description string! | 'robots into [ some [ 'noindex | 'index | 'nofollow | 'follow | 'noarchive | 'nosnippet | 'noodp | 'noydir ] ] | 'css href-types | 'rss href-types string! | 'atom href-types string! | 'script href-types | 'style string! | 'meta ['name | 'http-equiv] 2 cell-types ] block-types ]
These rules (actually only one rule for now) are used for handling and printing errors generated by the parser during HTML creation. They will be extended later to become more useful.
The word rules are used for lists of words that are either dynamic to the parser, i.e. lists of words that are built during parsing or are built into the HTML dialect, such as the word list for doc-types.
word-rules: [ doc-types | set-word! [word! | value-types! | block-types] | get-word! | word! ]
These rules are the collection of all the above mentioned rules. These rules are used directly by the parser, and you can see here in which order they are evaluated.
all-rules: [ any [ verbatim-rules | base-rules | link-rules | image-rules | table-rules | tag-rules | loop-rules | text-format-rules | form-rules | page-rules | error-rules | word-rules ] ]