Package it.unimi.dsi.archive4j

Classes that represent archives of documents.


Interface Summary
Archive<T extends DocumentSummary> An immutable archive of document summaries.
DocumentSummary A summary of the term information of a document.

Class Summary
ArchiveLoader Static container providing commodity methods to load archives.
ArchiveWriter<T extends DocumentSummary> An abstract archive writer.
ArrayArchive Dynamic implementation of an in-memory archive.
ArrayDocumentSummary A simple, array-based implementation of DocumentSummary.
BitstreamArchiveWriter<T extends DocumentSummary> A writer for SequentialBitstreamArchive or RandomAccessBitstreamArchive archives.
RandomAccessBitstreamArchive An Archive implementation providing random access.
SequentialBitstreamArchive An Archive implementation providing sequential access only.
SequentialBitstreamArchive.CompressionFlags Class representing compression flags for much of the data in this archive.

Enum Summary
ArchiveLoader.PropertyKeys The (capitalized) names for archive properties.
SequentialBitstreamArchive.CompressionFlags.Component Each component of the data file or frequency file.
SequentialBitstreamArchive.PropertyKeys Additional properties (w.r.t.

Package it.unimi.dsi.archive4j Description

Classes that represent archives of documents.

Package description

Classes in this package implement archives, a form of direct index for document collections. An archive stores some information about the documents of a collection (e.g., document lengths and term counts for a suitable subset of terms) similarly to what an inverted index does. However, an archive provides information per document, not per term.

To build archives from the command line, please have a look at the law.archive.tool package.