» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with ANTLR + SQL

SQL lexing madness

Continuing on my quest to write a replacement for the MySQL lexer and parser (Hi David! :)), I come across many interesting and slightly disgusting details about SQL in general and MySQL's notion about it specifically.

So here I am reading our fine documentation and our totally "obvious" lexer and parser implementations trying out cornercases to figure out what really is valid and what is not. Finding the occasional bug, of course... I won't link to it from here, but let me assure you, it's a nice one.

Anyway, the latest thing I came across blew my mind:

Consider this simple statement

mysql> CREATE TABLE foo (a INT);

Obvious what it does, isn't it? Totally valid. No problem. Now for fun, let's try to figure out what happens when we use a reserved word:

mysql> CREATE TABLE table (a INT);

Boom:

ERROR 1064 (42000): You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near 'table (a INT)' at line 1

Ok, fine. I expected that. Extra credit for this one:

mysql> CREATE TABLE test.table (a INT);
Query OK, 0 rows affected (0.04 sec)

Wait a moment...! That ain't right!

Except it is. Puzzled I dug around the parser's YACC grammar and couldn't find any indication that a reserved word is allowed in that place at all. In fact, it expects an identifier:

table_ident:
      ident { $$=new Table_ident($1); }
    | ident '.' ident { $$=new Table_ident(YYTHD, $1,$3,0);}
    | '.' ident { $$=new Table_ident($2);} /* For Delphi */
    ;

The rule ident is made up of several other rules and tokens, none of which include the lexer symbol TABLE_SYM. (And please don't get me started on the /* For Delphi */ comment, please...)

Still puzzled, I took the risk to look into the lexer. Normally I would stay away from that handwritten beast with all its special cases, but alas, it must contain the answer. Must.

Indeed, poking around, there it is:

case MY_LEX_IDENT_SEP:      // Found ident and now '.'
  yylval->lex_str.str= (char*) lip->get_ptr();
  yylval->lex_str.length= 1;
  c= lip->yyGet();                  // should be '.'
  lip->next_state= MY_LEX_IDENT_START;// Next is an ident (not a keyword)

Sweet, isn't it? Made my day.

Effectively, this means that you can use any old keyword in place of tablename when using the database.tablename notation. With the added fun that you now have created a table that you cannot select from without quoting or using the database.tablename notation:

mysql> SELECT a FROM table;
ERROR 1064 (42000): [...]
mysql> SELECT a FROM `table`;
Empty set (0.00 sec)
mysql> SELECT a FROM test.table;
Empty set (0.00 sec)

Another datapoint that allowing reserved words as identifiers is just plain wrong. Plain wrong, I tell you. All it does is to make the life hard on the parser authors. And it doesn't add anything for the user, except allowing constructs to shoot yourself in the foot with. I mean I don't go around in C-land and write stuff like

if (if > 0) {
    ...
}

Do I?

To sum it up, here are the three ways to create a table named table:

mysql> CREATE TABLE `table` (a INT);
mysql> CREATE TABLE test.table (a INT);
mysql> CREATE TABLE .table (a INT);

Lovely.

MySQL: Planet MySQL

Antlr--从Hibernate3如何解释HQL语言学习Antlr的使用 - 花钱的年华 - BlogJava

用Parser把AST抽取出来,再用TreeParser进行动作的Double Pass Builder模式

Hibernate: del.icio.us tag/hibernate