it.unimi.dsi.archive4j
Interface Archive<T extends DocumentSummary>

All Superinterfaces:
Closeable, FlyweightPrototype<Archive<T>>, Iterable<T>
All Known Implementing Classes:
ArrayArchive, RandomAccessBitstreamArchive, SequentialBitstreamArchive

public interface Archive<T extends DocumentSummary>
extends Closeable, Iterable<T>, FlyweightPrototype<Archive<T>>

An immutable archive of document summaries.

Archives provide sequential access using an iterator, and random optional access by document id or by document index (you can check for random-access availability using hasRandomAccess()). Some global data about the archive is also available.

Author:
Alessio Orlandi, Sebastiano Vigna
See Also:
ArchiveLoader

Method Summary
 int frequency(int term)
          Return the frequency of a given term.
 T getDocumentById(int id)
          Returns a document given its id (optional operation).
 T getDocumentByIndex(int idx)
          Returns a document by index (position in the archive) (optional operation).
 boolean hasRandomAccess()
          Returns whether the archive supports random access, that is, getDocumentById(int) and getDocumentByIndex(int).
 int numberOfDocuments()
          Returns the number of documents in the archive.
 int numberOfTerms()
          Returns the number of terms in the archive.
 long numberOfWords()
          Returns the number of words in the collection (i.e., the sum of the lengths of all documents).
 
Methods inherited from interface java.io.Closeable
close
 
Methods inherited from interface java.lang.Iterable
iterator
 
Methods inherited from interface it.unimi.dsi.lang.FlyweightPrototype
copy
 

Method Detail

getDocumentById

T getDocumentById(int id)
                                          throws IOException
Returns a document given its id (optional operation).

Parameters:
id - a document id.
Returns:
the document with given id, or null if no such document exists.
Throws:
IOException

getDocumentByIndex

T getDocumentByIndex(int idx)
                                             throws IOException
Returns a document by index (position in the archive) (optional operation).

Parameters:
idx - the document index.
Throws:
IOException

hasRandomAccess

boolean hasRandomAccess()
Returns whether the archive supports random access, that is, getDocumentById(int) and getDocumentByIndex(int).

Returns:
whether the archive supports random access.

numberOfDocuments

int numberOfDocuments()
Returns the number of documents in the archive.

Returns:
the number of documents in the archive.

numberOfTerms

int numberOfTerms()
Returns the number of terms in the archive.

Returns:
the number of terms in the archive.

numberOfWords

long numberOfWords()
Returns the number of words in the collection (i.e., the sum of the lengths of all documents).

Returns:
the number of words in the collection.

frequency

int frequency(int term)
Return the frequency of a given term.

Parameters:
term - a term number.
Returns:
the frequency of the given term.