Recent Documents First in Apache Lucene

Having spent a long time looking for how to put the latest documents first in Apache Lucene to no avail. Finally, I’ve found a solution that works.

Most of the answers on the web suggested using a boost on documents based on their date. However, I was unconvinced how these solutions would pan out in the long term. The other day, I came across Apache Lucene Sort Tips which describes how to use the TopFieldDocCollector. By chance it mentioned the constant SortField.FIELD_SCORE that can be used when constructing a multi-field Sort object.

So, the answer is simple, but I thought I’d write a post specifically addressing this use-case so that an answer is easy for others to find. You need a field containing the modified date of all your documents. Storing this as an ISO 8601 string does the trick. Now you construct a sort object passing SortField.FIELD_SCORE as the first field and your date field (descending) as the second and hey presto!

So, here’s how we create our sort:

var sort = new Sort(new[] {
    SortField.FIELD_SCORE,
    new SortField("last_modified", SortField.STRING, true)});

And use this with a TopFieldDocCollector in the usual way.

Massive thanks to the author of the original post. I just thought it was worth posting something specifically for this use-case.

Author: Phill Luby

Phill is a Software Developer and Co-founder of NewRedo Ltd.