created on 02 Jul 2008, by Syndication, read more…
Incidentally, if you want to see why the weighting schemes work like
this, consider the case of a database with two documents, one of which
contains all the text from the first twice. You probably want to give
these similar weight - certainly the doubled document shouldn't get
twice the weight for most applications.
For BM25 you can adjust a parameter to tune how much influence the
document length has.
If you're happy using QueryParser, just apply the above to the Query
object it produces (i.e. query in the above code snippet comes from
QueryParser).
Cheers,
Olly