it.unimi.dsi.archive4j.tool
Class Scan

java.lang.Object
  extended by it.unimi.dsi.archive4j.tool.Scan

public class Scan
extends Object

Scans a DocumentSequence to build a SequentialBitstreamArchive or a RandomAccessBitstreamArchive, using preprocessed data generated by Preprocess and MergePreprocessedData.

Each document returned by the sequence is parsed, its terms are fed into a term processor and then checked against the term map provided by the preprocessing phase. Those surviving the process are part of the summary.

An optional URI-to-id map can be provided to set the id of each document. The archive thus obtained will not be in general sorted (but you can use SortBitstreamArchive to fix that).

Author:
Alessio Orlandi, Sebastiano Vigna
See Also:
BitstreamArchiveWriter, SequentialBitstreamArchive, Preprocess, MergePreprocessedData

Constructor Summary
Scan()
           
 
Method Summary
static void main(String[] args)
           
static void run(DocumentSequence sequence, TermProcessor processor, StringMap<? extends CharSequence> terms, StringMap<? extends CharSequence> urls, BitstreamArchiveWriter<ArrayDocumentSummary> writer, String indexedField)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Scan

public Scan()
Method Detail

run

public static void run(DocumentSequence sequence,
                       TermProcessor processor,
                       StringMap<? extends CharSequence> terms,
                       StringMap<? extends CharSequence> urls,
                       BitstreamArchiveWriter<ArrayDocumentSummary> writer,
                       String indexedField)
                throws Exception
Throws:
Exception

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception