RFC2396, Mgrammar, 0.2

I am in the process of translating the Uniform Resource Identifier (URI) grammar, alias RFC 2396, a key component of the Web architechture, into Mgrammar. The URI syntax is expressed in Augmented Backus-Naur Form (ABNF) notation [RFC2234].

The translation is quite straight-forward. There appears to still be some errors in the draft and the projections are missing. Once 100% correct, I will add the projections so that the Mgraph value will be well-defined and easy to use.

Any feedback and fixes appreciated. Intellipad, which is the tool I’m using to form this Mg specification, is still quite unstable, slow and buggy. This is natural as it is in CTP mode, but it also means it’ll become easier to form this grammar as the CTP recycles.

So here, we, – go

module IETF
{
    language URI
    {
        /* syntax rules */
        
        syntax Main
         = scheme ":" hier_part ( "?" query )? ( "#" fragment )?
         //=> URI { Scheme { s }, Hierarchy { h }, Query { q }, Fragment { f } }
         ;
        syntax hier_part
         = "//" authority
           ( path_abempty | path_absolute | path_rootless )? // | path_empty
         ;
        syntax URI_reference
         = URI
         | relative_ref
         ;
        syntax absolute_URI
         = scheme ":" hier_part ( "?" query )?
         ;
        syntax relative_ref
         = relative_part ( "?" query )? ( "#" fragment )?
         ;
        syntax relative_part
         = "//" authority
           ( path_abempty | path_absolute | path_noscheme )? //| path_empty
         ;
        syntax scheme
         = alpha ( alpha | digit | "+" | "-" | "." )*
         ;
        syntax authority
         = ( userinfo "@" )? host ( ":" port )?
         ;
        syntax userinfo
         = ( unreserved | pct_encoded | sub_delims | ":" )*
         ;
        syntax host
         = IP_literal | IPv4address | reg_name
         ;
        syntax port
         = digit*
         ;
        syntax query
         = ( pchar | "/" | "?" )*
         ;
        syntax fragment
         = ( pchar | "/" | "?" )*
         ;
        syntax IP_literal
         = "[" ( IPv6address | IPvFuture ) "]"
         ;
        syntax IPvFuture
         = "v" hexdig+ "." ( unreserved | sub_delims | ":" )+
         ;
        syntax reg_name
         = ( unreserved | pct_encoded | sub_delims )*
         ;
        syntax path
         = path_abempty    // begins with "/" or is empty
         | path_absolute   // begins with "/" but not "//"
         | path_noscheme   // begins with a non-colon segment
         | path_rootless   // begins with a segment
        // | path_empty      // zero characters
         ;
        syntax path_abempty
         = ( "/" segment )*
         ;
        syntax path_absolute
         = "/" ( segment_nz ( "/" segment )* )
         ;
        syntax path_noscheme
         = segment_nz_nc ( "/" segment )*
         ;
        syntax path_rootless
         = segment_nz ( "/" segment )*
         ;
        syntax segment
         = pchar*
         ;
        
        /* token rules */
        
        token IPv6address
         =                            ( h16 ":" )#6 ls32
         |                       "::" ( h16 ":" )#5 ls32
         | (               h16 ) "::" ( h16 ":" )#4 ls32
         | ( ( h16 ":" )#1 h16 ) "::" ( h16 ":" )#3 ls32
         | ( ( h16 ":" )#2 h16 ) "::" ( h16 ":" )#2 ls32
         | ( ( h16 ":" )#3 h16 ) "::"   h16 ":"     ls32
         | ( ( h16 ":" )#4 h16 ) "::"               ls32
         | ( ( h16 ":" )#5 h16 ) "::"               h16
         | ( ( h16 ":" )#6 h16 ) "::"
         ;
        token h16
         = hexdig#1..4
         ;
        token ls32
         = ( h16 ":" h16 )
         | IPv4address
         ;
        token IPv4address
         = dec_octet "." dec_octet "." dec_octet "." dec_octet
         ;
        token dec_octet
         = "0".."9"
         | "1".."9" "0".."9"
         | "1" "1".."9" "0".."9"
         | "2" "0".."4" "0".."9"
         | "2" "5" "0".."5"
         ;
        // token path_empty
        // = pchar#0
        // ;
        token segment_nz
         = pchar+
         ;
        token segment_nz_nc
         = ( unreserved | pct_encoded | sub_delims | "@" )+
            // non-zero-length segment without any colon ":"
            // translates to this? => segment_nz - ":"
         ;
        token pchar
         = unreserved | pct_encoded | sub_delims | ":" | "@"
         ;
        token pct_encoded
         = "%" hexdig hexdig
         ;
        token unreserved
         = alpha
         | digit
         | "-"
         | "."
         | "_"
         | "~"
         ;
        token reserved
         = gen_delims
         | sub_delims
         ;
        token gen_delims
         = ":"
         | "/"
         | "?"
         | "#"
         | "["
         | "]"
         | "@"
         ;
        token sub_delims
         = "!"
         | "$"
         | "&"
         | "'"
         | "("
         | ")"
         | "*"
         | "+"
         | ","
         | ";"
         | "="
         ;
        token digit
         = "0".."9"
         ;
        token alpha
         = lower
         | upper
         ;
        token lower
         = "a".."z"
         ;
        token upper
         = "A".."Z"
         ;
        token hexdig
         = digit
         | "a".."f"
         | "A".."F"
         ;
    }
}

0.1: 2009.02.18 - First clean up. Still not usable, but now the grammar compiles.
0.2: 2009.02.19 - Read up on ABNF and corrected a couple of mistakes. Still not quite kosher in Intellipad.

To-do
- 100% correctness
- Projections

Enjoy!

Advertisements

About xosfaere

Software Developer
This entry was posted in Datamodel, Declarative, Paradigm, Program, Software, Technical and tagged , , , , , , , , . Bookmark the permalink.

2 Responses to RFC2396, Mgrammar, 0.2

  1. Very nice post, thanks man!

    • xosfaere says:

      No problem. I still very much need to complete it as transparent URI processing has been high on my wish-list. The idea that “there is no microformat”, so to speak.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s