it.unimi.dsi.archive4j
Class ArrayDocumentSummary

java.lang.Object
  extended by it.unimi.dsi.archive4j.ArrayDocumentSummary
All Implemented Interfaces:
DocumentSummary

public class ArrayDocumentSummary
extends Object
implements DocumentSummary

A simple, array-based implementation of DocumentSummary.

Author:
Alessio Orlandi, Sebastiano Vigna

Field Summary
 int[] count
          An array, parallel to term, containing the counts (a.k.a.
protected  int docLength
          The length of the underlying document in words.
protected  int id
          The document id.
protected  int numberOfTerms
          Number of terms in the summary.
protected  boolean sorted
          Whether term is sorted.
 int[] term
          The terms in this document, in increasing order.
protected  URI uri
          The document URI, if any, or null.
 
Constructor Summary
ArrayDocumentSummary(int[] terms, int[] counts, int numTerms, int id, URI uri, int wordLength)
          Creates an unsorted array-based document summary using part of given arrays.
ArrayDocumentSummary(int[] terms, int[] counts, int numTerms, int id, URI uri, int wordLength, boolean sorted)
          Creates an array-based document summary using part of given arrays.
ArrayDocumentSummary(int[] terms, int[] counts, int id, URI uri, int wordLength)
          Creates an unsorted array-based document summary using given arrays.
ArrayDocumentSummary(int[] terms, int[] counts, int id, URI uri, int wordLength, boolean sorted)
          Creates an array-based document summary using given arrays.
 
Method Summary
 int count(int i)
          Returns the count (a.k.a.
 boolean equals(Object o)
          Compares this document summary to another object.
 int hashCode()
           
 int id()
          Returns the id of the document this summary represents.
 int indexOf(int t)
          Returns the index of the given term.
 int length()
          Returns the length in words of the document this summary represents.
static ArrayDocumentSummary parse(String line, URI uri)
          * Creates a new array-based document summary by parsing ASCII text.
 int size()
          Returns the number of terms in this summary.
 ArrayDocumentSummary sort()
          Sorts this summary is sorted.
 boolean sorted()
          Returns whether this summary is sorted.
 int term(int i)
          Returns the term of given index.
 String toString()
          This implementation is compatible for reading via parse(String, URI).
 URI uri()
          The document uri.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

term

public final int[] term
The terms in this document, in increasing order.


count

public final int[] count
An array, parallel to term, containing the counts (a.k.a. within-document frequencies) of the respective terms.


id

protected final int id
The document id.


uri

protected final URI uri
The document URI, if any, or null.


docLength

protected final int docLength
The length of the underlying document in words.


numberOfTerms

protected final int numberOfTerms
Number of terms in the summary.


sorted

protected boolean sorted
Whether term is sorted.

Constructor Detail

ArrayDocumentSummary

public ArrayDocumentSummary(int[] terms,
                            int[] counts,
                            int id,
                            URI uri,
                            int wordLength)
Creates an unsorted array-based document summary using given arrays.

Parameters:
terms - the terms in the summary (in increasing order).
counts - an array parallel to terms specifying the counts.
id - the id of the document.
uri - a URI for the document, or null.
wordLength - the length in words of the underlying document.

ArrayDocumentSummary

public ArrayDocumentSummary(int[] terms,
                            int[] counts,
                            int id,
                            URI uri,
                            int wordLength,
                            boolean sorted)
Creates an array-based document summary using given arrays.

Parameters:
terms - the terms in the summary (in increasing order).
counts - an array parallel to terms specifying the counts.
id - the id of the document.
uri - a URI for the document, or null.
wordLength - the length in words of the underlying document.
sorted - if true, terms is sorted (which makes it possible to use binary search to locate a term).

ArrayDocumentSummary

public ArrayDocumentSummary(int[] terms,
                            int[] counts,
                            int numTerms,
                            int id,
                            URI uri,
                            int wordLength,
                            boolean sorted)
Creates an array-based document summary using part of given arrays.

Parameters:
terms - the terms in the summary (in increasing order).
counts - an array parallel to terms specifying the counts.
id - the id of the document.
uri - a URI for the document, or null.
wordLength - the length in words of the underlying document.

ArrayDocumentSummary

public ArrayDocumentSummary(int[] terms,
                            int[] counts,
                            int numTerms,
                            int id,
                            URI uri,
                            int wordLength)
Creates an unsorted array-based document summary using part of given arrays.

Parameters:
terms - the terms in the summary (in increasing order).
counts - an array parallel to terms specifying the counts.
id - the id of the document.
uri - a URI for the document, or null.
wordLength - the length in words of the underlying document.
Method Detail

parse

public static ArrayDocumentSummary parse(String line,
                                         URI uri)
* Creates a new array-based document summary by parsing ASCII text.

This method is mainly useful for testing and debugging purposes.

Parameters:
line - the ASCII text containing the description of the document summary.
uri - an optional URI that will be associated to the returned summary.
Returns:
a document summary, as specified by the ASCII text in line.

term

public int term(int i)
Description copied from interface: DocumentSummary
Returns the term of given index.

Specified by:
term in interface DocumentSummary
Returns:
the term of given index.

count

public int count(int i)
Description copied from interface: DocumentSummary
Returns the count (a.k.a. within-document frequency) of the term of given index.

Specified by:
count in interface DocumentSummary
Returns:
the count of the term of given index.

sorted

public boolean sorted()
Description copied from interface: DocumentSummary
Returns whether this summary is sorted.

Specified by:
sorted in interface DocumentSummary
Returns:
whether this summary is sorted.

sort

public ArrayDocumentSummary sort()
Description copied from interface: DocumentSummary
Sorts this summary is sorted.

After calling this method, DocumentSummary.sorted() will return true.

Specified by:
sort in interface DocumentSummary
Returns:
this document summary.

toString

public String toString()
This implementation is compatible for reading via parse(String, URI).

Overrides:
toString in class Object

id

public int id()
Description copied from interface: DocumentSummary
Returns the id of the document this summary represents.

Specified by:
id in interface DocumentSummary
Returns:
the id of the document this summary represents.

uri

public URI uri()
The document uri. It may be null if it is not available.

Specified by:
uri in interface DocumentSummary
Returns:
a URI representing the source of this summary, or null if no such URI is available.

size

public int size()
Description copied from interface: DocumentSummary
Returns the number of terms in this summary.

Note that due to pruning (e.g., of hapax legomena) the number of terms might be different from the number of terms of the document this summary represents.

Specified by:
size in interface DocumentSummary
Returns:
the number of terms in this summary.

indexOf

public int indexOf(int t)
Description copied from interface: DocumentSummary
Returns the index of the given term.

Specified by:
indexOf in interface DocumentSummary
Parameters:
t - a term number.
Returns:
the index of the given term in this summary, or -1 if the term does not appear in this summary.

length

public int length()
Description copied from interface: DocumentSummary
Returns the length in words of the document this summary represents.

Specified by:
length in interface DocumentSummary
Returns:
the length in words of the document this summary represents.

hashCode

public int hashCode()
Overrides:
hashCode in class Object

equals

public boolean equals(Object o)
Compares this document summary to another object.

Note that this method will try to be smart if at least one of the two summaries is sorted. If you plan on calling this method often, please consider using sorted summaries.

Overrides:
equals in class Object
Returns:
true if o is a DocumentSummary with the same id, length, size and term-count information.