Enhanced Indexing and Search

Exposing Content: WAS 3.1.8

The open source tools available to the field of web archiving continue to evolve and indexing tools have changed significantly in the last two years. WAS, along with most other web archiving projects, is migrating away from the "NutchWAX" search engine to SOLR. Both are open source indexers based on Lucene, but SOLR is more scalable, has improved response time, enables more functionality and is supported by a wider open source community. SOLR will better support indexing of content captured with duplicate-reduction settings (see item 6.1) The new indexing features will also enable faceted-browsing of search results and additional search limits such as curator-supplied topics, file types and date. This task will require the re-indexing of all existing WAS content.

