» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with ANTLR + parser

ANTLR Cheat Sheet - ANTLR 3 - ANTLR Project

Cheat sheet on ANTLR's syntax. ANTLR's another language parser generator.

opensource: del.icio.us tag/opensource

MF Bliki: dsl

<sep/>external DSLs is that it's hard to write a parser. Indeed one of the justifications for using XML as the carrier syntax for an external DSL is that "you

XML: del.icio.us/tag/xml

ANTLR Parser Generator

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

opensource: del.icio.us tag/opensource

The case for a parser replacement

Following up on the last post, I want to give my rationale behind why I think it would be beneficial to produce a replacement for MySQL's recognizer.

Beware, this post has gotten much longer than I intended...

Let me address the common objections first.

Common objections

But, we have an implementation already

Yes, but it is based on YACC and a handwritten lexer and we are thus constrained to C/C++ leaving out many interesting applications.

Although there are YACC implementations for languages other than C/C++, they are different products, with different authors, different codebases and different behavior. Also it's hard to add a new target language because essentially you have to rewrite YACC for that language, thus porting the whole parser generator and not only the code generator. An error prone and tedious exercise.

The lexer on the other hand must be rewritten completely for each new target language, simply because it's not generated code at all.

Most languages can bind to C/C++ code

Also true, but the point to consider is that ripping out the lexer and parser implementations to use them independently from mysqld is hard. Both of them are closely tied to the server code and contain many many actions that rely on surrounding infrastructure.

Aside: I'm told the Workbench Team has done something like this, to avoid having to rewrite the recognizer. Given their goal (produce a developer tool and actually ship it - hooray, btw :)) that was a smart move. My goals are different and I'm not constrained by deadlines regarding this long-term project.

One other point is that sometimes it's not trivial or desired to use C/C++ code in projects. A native language implementation has its advantages, for example debugger support.

A new implementation brings new bugs

Of course it will. But the existing implementation contains bugs, too. By concentrating on the actual language grammar (not its implementation!) and going through the steps to create a new grammar we will find bugs in the existing implementation and that will help everyone. This is not a theoretical advantage, because I have found a bug that leads to a server crash, by writing and testing a new grammar. Taking the step back and thinking about how the language is supposed to work helps to uncover flaws. Parser bugs can become glaringly obvious this way.

Having two implementations helps with squishing bugs in both implementations, however. Taking a lot of queries and running them through both will help discovering bugs and thus makes both of them more robust.

It adds no value to MySQL (= no new feature)

I disagree wholeheartedly. The current state of the parser subsystem is bad. Almost no one really understands what's going on in there, there's no clear separation of code, and it's virtually impossible to create a plugin interface for parsers.

The latter stems from the fact that the parser does a lot more than just parsing. It sets all kinds of hints for later subsystems, there's no clear API for building trees that are used to drive the query execution and it's all being done inline. In short, a tangled mess.

Cleaning that code up will add immense value to MySQL because then it will be possible to support different dialects of SQL or even completely different query languages (as long as it's possible to map them to something that makes sense for the rest of the relational model in MySQL).

I think it is vitally important for MySQL to move to a pluggable architecture all the way through the server. Just look at how much good the Storage Engine concept has done for MySQL. Let's extend that to other parts and make the code more accessible to the community developers. (At the same time our own developers will have a better time, too :))

LL-parsing is for dummies

Many people I have spoken with about my proposal take offense at the LL-parsing algorithm ANTLR uses. The general reason given was that LALR(1) parsing is so much more efficient. I do not think that ANTLR with its LL(*) algorithm has to hide itself in terms of efficiency but only real benchmarking can tell how big the impact really is. I'm inclined to think that it actually is negligible and that the clarity of implementation and gain of productivity, as well as the new possibilities that come with the use of ANTLR, negate those concerns.

New possibilities

As mentioned above, having a pluggable parser architecture with ANTLR would make it possible to

  • test the parser(s) in isolation, finally making unit tests possible for that subsystem
    • Does anyone doubt that this is a good thing?
  • implement different query languages or dialects for use with MySQL, without having to go through some sort of middleware to provide that.
    • I know people wanted to explore MDX support in MySQL, which is a huge task given the current codebase.
    • Even dynamically changing parsers within one connection is a possibility.
  • enable stuff like query tree rewriting at the grammar level.

Basing a new parser implementation on an actively developed tool has the benefit that we would gain new functionality basically for free. For example ANTLR will soon support combining existing grammars using composition, making it easy to reuse and modify existing grammars just be specifying the differing rules. An ideal solution for dialects of one language - and dialects SQL has...

A point I briefly mentioned above is ANTLR's capability to generate code for a variety of different target languages, without even changing a single line of code. There's simply no lock-in into a single language like YACC or handwritten solutions pose (actions of course have to be written in the respective target languages, but usually that can be factored out into an API). This is a good thing, IMHO, because it would enable heavy reuse of the "official" grammars. Incidentally that's one major point that comes up as an objection: "Please use the official grammar MySQL uses internally, so it's compatible." With this solution it basically comes for free!

Wouldn't it be awesome when people can just take the grammar file, generate code from it, and use the result in their application? How much easier would it be to make IDEs that "know" about MySQL syntax? Syntax highlighting wouldn't be hard to do anymore. Even query rewriting (and thus support for sharding, parallel queries, backwards compatibility with old schemas, analyzing column or table usage) would be made a lot easier - and there are not regexes involved at all :)

In short: Having easy access to the syntax tree in a language agnostic way would enabled all sorts of interesting applications outside the MySQL server, which I think would be a good thing for the community.

Here's an interesting thought: You could even use a tree building parser and a subsequent phase to check for SQL injections and even filter them out on the fly. Not that you should have SQL injections in the first place, but it's an interesting application.

Not to mention how cool it would be to simply tell the MySQL server "The following query is actually using Oracle 8 syntax, please just use the parser for that." :P

MySQL: Planet MySQL

[from wnpxrz] more ANTLR - Java, and comparisons to PLY and PyParsing

"arser. See the first and second essays.There are several reasons to use ANTLR over one of the Python parsers like PLY and PyParsing. The GUI interface is very nice, it has a sophisticated understanding of how to define a grammar, it's top-down approach m

User:jeyrb: jey's network's del.icio.us bookmarks

ANTLR: Easy XML Parsing, based on the ANTLR parser generator

The ANTLR 3 Eclipse Plugin helps you develop ANTLR 3 grammars inside Eclipse. It currently provides a project nature<sep/>

Eclipse: del.icio.us/tag/eclipse

Fig - Generic configuration language interpreter - ANTLR 3 - ANTLR Project

useful as an intro to ANTLR, but also in its own right as an antidote to XML and .properties config file hell

XML: del.icio.us/tag/xml

Using Parser-Generators to Convert Legacy Data Formats to XML

"A parser-generator is a program which takes a formal description of a grammar (e.g. in BNF) and outputs source code for a parser which will recognise valid strings obeying that grammar and perform actions associated with grammar rules."

XML: del.icio.us/tag/xml

ANTLR Parser Generator

check this out someday ... currently it's out of scope

opensource: del.icio.us tag/opensource

Playing fast and loose with Parsec for parsing in Haskell

"In the Haskell world, a close approximation to ANTLR is available in the form of Parsec, a library for building parsers using combinators."

Haskell: del.icio.us tag/haskell

Page 1 | Next >>