To avoid having to think about databases with billions of rows, I radically denormalized the data and stored the word coordinates as a blob of JSON in the database. So we just have a word_coordinates_json column in the OCR table, and when we need to look up the coordinates for a page we just load up the JSON dictionary and we’re good to go. JSON is nice with django, since django’s ORM doesn’t seem to support storing blobs in the database, and JSON is just text.
After seeing Douglas Crockford's talk on JSON at XML 2006 recently, I figured that some sort of great debate between XML and JSON advocates was brewing. I had been waiting for Elliotte Harold's rebuttal of what Crockford is missing, but haven't seen it
" * Paste some JSON in each of the text fields. Click "Compare" to see the diff<sep/>green. Deletions are displayed in red. * It also works as a JSON viewer. Click the dis