it.unimi.dsi.archive4j
Class RandomAccessBitstreamArchive

java.lang.Object
  extended by it.unimi.dsi.archive4j.SequentialBitstreamArchive
      extended by it.unimi.dsi.archive4j.RandomAccessBitstreamArchive
All Implemented Interfaces:
Archive<ArrayDocumentSummary>, FlyweightPrototype<Archive<ArrayDocumentSummary>>, Closeable, Iterable<ArrayDocumentSummary>

public class RandomAccessBitstreamArchive
extends SequentialBitstreamArchive
implements Closeable

An Archive implementation providing random access.

Each instance uses a single InputBitStream, so you must not mix sequential and random accesses.

Author:
Alessio Orlandi, Sebastiano Vigna
See Also:
BitstreamArchiveWriter, SequentialBitstreamArchive

Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.dsi.archive4j.SequentialBitstreamArchive
SequentialBitstreamArchive.CompressionFlags, SequentialBitstreamArchive.PropertyKeys
 
Field Summary
protected  SparseRank id2Index
          A rank structure whose underlying bit vector marks the ids of the documents missing from the archive.
protected  SimpleSelectZero index2id
          A selection structure on the same array as id2Index.
static String MISSING_EXTENSION
          The extension for the missing-document file.
protected  EliasFanoMonotoneLongBigList offsets
          A selection structure storing the bit offsets of each document.
static String OFFSETS_EXTENSION
          The extension for offset file
 
Fields inherited from class it.unimi.dsi.archive4j.SequentialBitstreamArchive
ARCHIVE_EXTENSION, basename, codings, data, fmbais, frequency, numberOfDocuments, numberOfTerms, numberOfWords, PERM_EXTENSION, rank2Term, uriList
 
Constructor Summary
protected RandomAccessBitstreamArchive(CharSequence basename, EliasFanoMonotoneLongBigList offsets, SparseRank id2Index, int[] rank2Term, Properties properties, List<? extends CharSequence> uriList, int[] frequency)
          Creates a new random-access bitstream archive.
protected RandomAccessBitstreamArchive(RandomAccessBitstreamArchive prototype)
           
 
Method Summary
 RandomAccessBitstreamArchive copy()
           
 ArrayDocumentSummary getDocumentById(int id)
          Returns a document given its id (optional operation).
 ArrayDocumentSummary getDocumentByIndex(int index)
          Returns a document by index (position in the archive) (optional operation).
static RandomAccessBitstreamArchive getInstance(CharSequence basename, Properties properties, CharSequence uriFilename)
          Returns a RandomAccessBitstreamArchive obtained by loading with given basename and optional URI list.
 boolean hasRandomAccess()
          Returns whether the archive supports random access, that is, Archive.getDocumentById(int) and Archive.getDocumentByIndex(int).
 Iterator<ArrayDocumentSummary> iterator()
          This methods returns an efficient iterator.
 
Methods inherited from class it.unimi.dsi.archive4j.SequentialBitstreamArchive
close, ensureOpen, frequency, getCodings, getPermutation, loadFrequencies, numberOfDocuments, numberOfTerms, numberOfWords, readCurrentDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.io.Closeable
close
 

Field Detail

OFFSETS_EXTENSION

public static final String OFFSETS_EXTENSION
The extension for offset file

See Also:
Constant Field Values

MISSING_EXTENSION

public static final String MISSING_EXTENSION
The extension for the missing-document file.

See Also:
Constant Field Values

id2Index

protected SparseRank id2Index
A rank structure whose underlying bit vector marks the ids of the documents missing from the archive.


index2id

protected transient SimpleSelectZero index2id
A selection structure on the same array as id2Index. It is initialised lazily upon the first access to getDocumentByIndex(int)


offsets

protected EliasFanoMonotoneLongBigList offsets
A selection structure storing the bit offsets of each document.

Constructor Detail

RandomAccessBitstreamArchive

protected RandomAccessBitstreamArchive(CharSequence basename,
                                       EliasFanoMonotoneLongBigList offsets,
                                       SparseRank id2Index,
                                       int[] rank2Term,
                                       Properties properties,
                                       List<? extends CharSequence> uriList,
                                       int[] frequency)
                                throws IOException
Creates a new random-access bitstream archive.

Parameters:
basename - the basename of the archive.
offsets - the list of offsets, pointing at the start of each summary.
id2Index - a ranking structure returning the index of a document given its id.
rank2Term - the permutation from rank to terms.
properties - the properties of the archive.
uriList - an optional list of URIs that will be used to associate a URI to each summary, or null.
frequency - the term frequencies.
Throws:
IOException

RandomAccessBitstreamArchive

protected RandomAccessBitstreamArchive(RandomAccessBitstreamArchive prototype)
                                throws IOException
Throws:
IOException
Method Detail

getDocumentById

public ArrayDocumentSummary getDocumentById(int id)
                                     throws IOException
Description copied from interface: Archive
Returns a document given its id (optional operation).

Specified by:
getDocumentById in interface Archive<ArrayDocumentSummary>
Overrides:
getDocumentById in class SequentialBitstreamArchive
Parameters:
id - a document id.
Returns:
the document with given id, or null if no such document exists.
Throws:
IOException

getDocumentByIndex

public ArrayDocumentSummary getDocumentByIndex(int index)
                                        throws IOException
Description copied from interface: Archive
Returns a document by index (position in the archive) (optional operation).

Specified by:
getDocumentByIndex in interface Archive<ArrayDocumentSummary>
Overrides:
getDocumentByIndex in class SequentialBitstreamArchive
Parameters:
index - the document index.
Throws:
IOException

hasRandomAccess

public boolean hasRandomAccess()
Description copied from interface: Archive
Returns whether the archive supports random access, that is, Archive.getDocumentById(int) and Archive.getDocumentByIndex(int).

Specified by:
hasRandomAccess in interface Archive<ArrayDocumentSummary>
Overrides:
hasRandomAccess in class SequentialBitstreamArchive
Returns:
whether the archive supports random access.

iterator

public Iterator<ArrayDocumentSummary> iterator()
This methods returns an efficient iterator.

Specified by:
iterator in interface Iterable<ArrayDocumentSummary>
Overrides:
iterator in class SequentialBitstreamArchive

copy

public RandomAccessBitstreamArchive copy()
Specified by:
copy in interface FlyweightPrototype<Archive<ArrayDocumentSummary>>
Overrides:
copy in class SequentialBitstreamArchive

getInstance

public static RandomAccessBitstreamArchive getInstance(CharSequence basename,
                                                       Properties properties,
                                                       CharSequence uriFilename)
                                                throws IOException,
                                                       ClassNotFoundException
Returns a RandomAccessBitstreamArchive obtained by loading with given basename and optional URI list.

Parameters:
basename - the archive basename.
properties - the archive properties.
uriFilename - the filename of a URI list, or null; the file must contained either a StringMap supporting StringMap.list(), or a List of CharSequences.
Returns:
the RandomAccessBitstreamArchive with given basename and URI list.
Throws:
IOException
ClassNotFoundException