Last modified: 2014-11-17 10:35:25 UTC
It is important to make a project to give the exact EBNF syntax wich contain all the subtilities of the wikisyntax
(In reply to comment #0) > It is important to make a project to give the exact EBNF syntax wich contain all > the subtilities of the wikisyntax Why don't you start a meta page with the basic framework?
[[meta:EBNF]] http://www.garshol.priv.no/download/text/bnf.html http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html (I didn't know what ebnf stood for...)
I boggled my mind over this recently. What exactly would the [E]BNF for Wiki Syntax describe? In theoretical computer science, formal grammars are used to generate a language (a set of strings). Some grammars can be turned into a characteristic algorithm, i.e. one that determines if a given string is in the language. The algorithm is said to "accept" or "reject" input strings. However, MediaWiki is supposed to accept *ALL* strings: all strings are valid inputs and are turned into some valid XHTML. In practice, grammars are used to write parsers such as the one I'm currently working on. Here, the grammar tells the parser what to do - or more precisely, the production rules do, and as such, they sort of set out the semantics of the mark-up. But how do you clarify semantics without the production rules? Makes you wonder about stuff :)
Oh, and I forgot to mention this. EBNF seems to be for context-free grammars only. The MediaWiki syntax for lists is not context-free however. I am circumventing this in my parser by using a post-processing step, but if you're only writing BNF, you can't do that...
(In reply to comment #4) > Oh, and I forgot to mention this. EBNF seems to be for context-free grammars > only. The MediaWiki syntax for lists is not context-free however. I am > circumventing this in my parser by using a post-processing step, but if you're > only writing BNF, you can't do that... In light of that, is this bug WONTFIX? Or is it possible to describe wiki in some sort of pseduo-BNF, short of duplicating your flex/bison parser?
This bug is, "go write it on Meta" fix. ;-)
Not sure I understand why this was closed. A formal grammar is something we really need (and it may require fixes to the grammar as well ;)
Some work has been going on at mediawiki.org (http://www.mediawiki.org/wiki/Markup_spec and http://www.mediawiki.org/wiki/Markup_spec/BNF/). It's early days and any input would be appreciated.
Another work on meta: http://meta.wikimedia.org/wiki/Wikitext_Metasyntax
A hopefully complete representation of the MW 1.12 preprocessor in ABNF is at: http://www.mediawiki.org/wiki/Preprocessor_ABNF
Please note that the set of production rules alone does not allow you to derive the correct parse tree from a given input text. Wikitext is ambiguous in lots of complex and interesting ways. The disambiguation rules need to be specified along with the grammar. I found the preprocessor ABNF project an enlightening exercise. You can say a lot about the syntax in a short space. And while I attempted to explain the disambiguation process, I know of no way to do this rigorously, without resorting to writing algorithms.
It seems that with http://www.mediawiki.org/wiki/Preprocessor_ABNF this bug is fixed
No it is not fixed. That page only describes a tiny portion of parser behaviour.
We have a fairly complete PEG tokenizer grammar in Parsoid (http://www.mediawiki.org/wiki/Parsoid), which describes the context-free portions of wikitext. Context-sensitive portions are handled in token stream transformers. The PEG parse tree is flattened to a token stream so that we can support unbalanced template expansions, and finally converted into a DOM using a tree builder library according to the error recovery algorithms described in the HTML5 spec. The grammar is interspersed with actions and uses syntactic scope flags to compress the grammar productions a bit, so it is not the most readable grammar ever. Unrolling productions for all scope permutations might not help that much either, as this would increase the size of the grammar a lot.
Describing all of WikiText in EBNF is simply impossible, as parts of it are context-sensitive. Closing as wontfix for that reason.