» tagged pages
» logout

(Feed found, click Add Page to syndicate.) Error finding feed, please try again » Find feed title

A Blog Page allows you to add entries, for news or other time sensitive postings

(Login required to save to your tagged pages.)
(or Cancel)

Make further edits, (or Cancel)

(Login required to save to your tagged pages.)
(or Cancel)

(Editing anonymously: to be credited for your changes, login or register a new account)

Change Page Permissions? Changing these permissions will adjust who can modify this page.

Anonymous (change)
(change)
(or Cancel)
Upload an image from your computer:
or Copy an image from a URL:
or Erase the current icon:
Icon Preview:

or Cancel

Erase tokenizer? The contents of tokenizer page and all pages directly attached to tokenizer will be erased.

or Cancel

(Editing anonymously: to be credited for your changes, login or register a new account)

other page actions:
tokenizer

tokenizer

Tags Applied to tokenizer

No one has tagged this page.

tokenizer Wiki Pages

Tag Cloud

To further filter what appears in the Things Tagged tokenizer list, select a tag from the Tag Cloud.
What is tokenizer? Edit this page and describe it here.

sorted by: recent | see : popular
Content Tagged tokenizer

SQL lexing madness

Continuing on my quest to write a replacement for the MySQL lexer and parser (Hi David! :)), I come across many interesting and slightly disgusting details about SQL in general and MySQL's notion about it specifically.

So here I am reading our fine documentation and our totally "obvious" lexer and parser implementations trying out cornercases to figure out what really is valid and what is not. Finding the occasional bug, of course... I won't link to it from here, but let me assure you, it's a nice one.

Anyway, the latest thing I came across blew my mind:

Consider this simple statement

mysql> CREATE TABLE foo (a INT);

Obvious what it does, isn't it? Totally valid. No problem. Now for fun, let's try to figure out what happens when we use a reserved word:

mysql> CREATE TABLE table (a INT);

Boom:

ERROR 1064 (42000): You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near 'table (a INT)' at line 1

Ok, fine. I expected that. Extra credit for this one:

mysql> CREATE TABLE test.table (a INT);
Query OK, 0 rows affected (0.04 sec)

Wait a moment...! That ain't right!

Except it is. Puzzled I dug around the parser's YACC grammar and couldn't find any indication that a reserved word is allowed in that place at all. In fact, it expects an identifier:

table_ident:
      ident { $$=new Table_ident($1); }
    | ident '.' ident { $$=new Table_ident(YYTHD, $1,$3,0);}
    | '.' ident { $$=new Table_ident($2);} /* For Delphi */
    ;

The rule ident is made up of several other rules and tokens, none of which include the lexer symbol TABLE_SYM. (And please don't get me started on the /* For Delphi */ comment, please...)

Still puzzled, I took the risk to look into the lexer. Normally I would stay away from that handwritten beast with all its special cases, but alas, it must contain the answer. Must.

Indeed, poking around, there it is:

case MY_LEX_IDENT_SEP:      // Found ident and now '.'
  yylval->lex_str.str= (char*) lip->get_ptr();
  yylval->lex_str.length= 1;
  c= lip->yyGet();                  // should be '.'
  lip->next_state= MY_LEX_IDENT_START;// Next is an ident (not a keyword)

Sweet, isn't it? Made my day.

Effectively, this means that you can use any old keyword in place of tablename when using the database.tablename notation. With the added fun that you now have created a table that you cannot select from without quoting or using the database.tablename notation:

mysql> SELECT a FROM table;
ERROR 1064 (42000): [...]
mysql> SELECT a FROM `table`;
Empty set (0.00 sec)
mysql> SELECT a FROM test.table;
Empty set (0.00 sec)

Another datapoint that allowing reserved words as identifiers is just plain wrong. Plain wrong, I tell you. All it does is to make the life hard on the parser authors. And it doesn't add anything for the user, except allowing constructs to shoot yourself in the foot with. I mean I don't go around in C-land and write stuff like

if (if > 0) {
    ...
}

Do I?

To sum it up, here are the three ways to create a table named table:

mysql> CREATE TABLE `table` (a INT);
mysql> CREATE TABLE test.table (a INT);
mysql> CREATE TABLE .table (a INT);

Lovely.

MySQL: Planet MySQL

Dhaka

context free grammar parsing

open-source: del.icio.us tag/open-source

jTokenizer

java string tokenizer, converts strings into tokens. gui now in v2

opensource: del.icio.us tag/opensource

Username:
Password:
(or Cancel)