org.basex.util
Class Token

java.lang.Object
  extended by org.basex.util.Token

public final class Token
extends java.lang.Object

This class provides convenience operations for handling 'Tokens'. Tokens are UTF-8 encoded strings, stored in a byte array.

Note that, to guarantee a consistent string representation, all string conversions should be done via the methods of this class.

Author:
BaseX Team 2005-12, BSD License, Christian Gruen

Field Summary
static byte[] COLON
          Colon.
static java.util.Comparator<byte[]> COMP
          Comparator for byte arrays.
static byte[] EMPTY
          Empty token.
static byte[] FALSE
          Token 'false'.
static byte[] HEX
          Hex codes.
static byte[] INF
          Token 'INF'.
static java.util.Comparator<byte[]> LC_COMP
          Case-insensitive comparator for byte arrays.
static byte[] NINF
          Token '-INF'.
static byte[] NULL
          Token 'null'.
static byte[] ONE
          Number '1'.
static byte[] SLASH
          Slash.
static byte[] SPACE
          Space.
static byte[] TRUE
          Token 'true'.
static java.lang.String UTF16
          UTF16 encoding string.
static java.lang.String UTF162
          UTF16 encoding string.
static java.lang.String UTF16BE
          UTF16BE (=UTF16) encoding string.
static java.lang.String UTF16LE
          UTF16 encoding string.
static java.lang.String UTF32
          UTF16 encoding string.
static java.lang.String UTF322
          UTF16 encoding string.
static java.lang.String UTF8
          UTF8 encoding string.
static java.lang.String UTF82
          UTF8 encoding string (variant).
static byte[] XML
          XML token.
static byte[] XMLC
          XML token with colon.
static byte[] XMLNS
          XMLNS token.
static byte[] XMLNSC
          XMLNS token with colon.
static byte[] ZERO
          Number '0'.
 
Method Summary
static boolean ascii(byte[] token)
          Checks if the specified token only consists of ASCII characters.
static byte[] chop(byte[] token, int max)
          Chops a token to the specified length and adds dots.
static byte[] chopNumber(byte[] token)
          Finishes the numeric token, removing trailing zeroes.
static int cl(byte cp)
          Returns the length of the specified UTF8 byte.
static int cl(byte[] token, int pos)
          Returns the length of a UTF8 character at the specified position.
static byte[] concat(byte[] token1, byte[] token2)
          Concatenates two tokens.
static byte[] concat(byte[] token1, byte[] token2, byte[] token3)
          Concatenates three tokens.
static boolean contains(byte[] token, byte[] sub)
          Checks if the first token contains the second token.
static boolean contains(byte[] token, int c)
          Checks if the first token contains the specified character.
static int cp(byte[] token, int pos)
          Returns the codepoint (unicode value) of the specified token, starting at the specified position.
static int[] cps(byte[] token)
          Converts a token to a sequence of codepoints.
static byte[] delete(byte[] token, int ch)
          Deletes the specified character from the token.
static int diff(byte[] token, byte[] compare)
          Compares two tokens lexicographically.
static boolean digit(int ch)
          Checks if the specified character is a digit (0 - 9).
static boolean endsWith(byte[] token, byte[] sub)
          Checks if the first token ends with the second token.
static boolean endsWith(byte[] token, int ch)
          Checks if the first token starts with the specified character.
static boolean eq(byte[] token, byte[]... tokens)
          Compares several tokens for equality.
static boolean eq(byte[] token1, byte[] token2)
          Compares two tokens for equality.
static boolean eq(java.lang.String str, java.lang.String... strings)
          Compares several strings for equality.
static boolean eqic(java.lang.String str, java.lang.String... strings)
          Compares several strings for equality, ignoring the case.
static byte[] escape(byte[] token)
          Escapes the specified token.
static boolean ftChar(int ch)
          Returns true if the specified character is a full-text letter or digit.
static int hash(byte[] token)
          Calculates a hash code for the specified token.
static byte[] hex(byte[] val, boolean uc)
          Returns a hex representation of the specified byte array.
static int indexOf(byte[] token, byte[] sub)
          Returns the position of the specified token or -1.
static int indexOf(byte[] token, byte[] sub, int pos)
          Returns the position of the specified token or -1.
static int indexOf(byte[] token, int c)
          Returns the position of the specified character or -1.
static int lastIndexOf(byte[] token, int c)
          Returns the last position of the specified character or -1.
static byte[] lc(byte[] token)
          Converts the specified token to lower case.
static int lc(int ch)
          Converts a character to lower case.
static int len(byte[] token)
          Returns the token length.
static boolean letter(int ch)
          Checks if the specified character is a computer letter (A - Z, a - z, _).
static boolean letterOrDigit(int ch)
          Checks if the specified character is a computer letter or digit.
static byte[] local(byte[] name)
          Returns the local name of the specified name.
static byte[] max(byte[] token, byte[] compare)
          Returns the bigger token.
static java.lang.String md5(java.lang.String string)
          Returns an MD5 hash in lower case.
static byte[] min(byte[] token, byte[] compare)
          Returns the smaller token.
static byte[] norm(byte[] token)
          Normalizes all whitespace occurrences from the specified token.
static int norm(int ch)
          Returns a normalized character without diacritics.
static java.lang.String normEncoding(java.lang.String encoding)
          Returns a unified representation of the specified encoding.
static java.lang.String normEncoding(java.lang.String encoding, java.lang.String old)
          Returns a unified representation of the specified encoding.
static int numDigits(int integer)
          Checks number of digits of the specified integer.
static byte[] prefix(byte[] name)
          Returns the prefix of the specified token.
static byte[] replace(byte[] token, int search, int replace)
          Replaces the specified character and returns the result token.
static byte[] replaceAll(byte[] token, java.lang.String pattern, java.lang.String replace)
          Performs a regular expression on the specified string.
static byte[][] split(byte[] token, int sep)
          Splits a token around matches of the given separator.
static boolean startsWith(byte[] token, byte[] sub)
          Checks if the first token starts with the second token.
static boolean startsWith(byte[] token, int ch)
          Checks if the first token starts with the specified character.
static java.lang.String string(byte[] token)
          Returns the specified token as string.
static java.lang.String string(byte[] token, int start, int length)
          Returns the specified token as string.
static byte[] substring(byte[] token, int start)
          Returns a substring of the specified token.
static byte[] substring(byte[] token, int start, int end)
          Returns a substring of the specified token.
static byte[] subtoken(byte[] token, int start)
          Returns a partial token.
static byte[] subtoken(byte[] token, int start, int end)
          Returns a partial token.
static boolean supported(java.lang.String encoding)
          Checks if the specified encoding is supported.
static double toDouble(byte[] token)
          Converts the specified token into a double value.
static int toInt(byte[] token)
          Converts the specified token into an integer value.
static int toInt(byte[] token, int start, int end)
          Converts the specified token into an integer value.
static int toInt(java.lang.String string)
          Converts the specified string into an integer value.
static byte[] token(boolean bool)
          Creates a byte array representation of the specified boolean value.
static byte[] token(double dbl)
          Creates a byte array representation from the specified double value; inspired by Xavier Franc's Qizx/open processor.
static byte[] token(float flt)
          Creates a byte array representation from the specified float value.
static byte[] token(int integer)
          Creates a byte array representation of the specified integer value.
static byte[] token(long integer)
          Creates a byte array representation from the specified long value, using Java's standard method.
static byte[] token(java.lang.String string)
          Converts a string to a byte array.
static byte[][] tokens(java.lang.String... strings)
          Converts the specified strings to tokens.
static long toLong(byte[] token)
          Converts the specified token into an long value.
static long toLong(byte[] token, int start, int end)
          Converts the specified token into an long value.
static long toLong(java.lang.String string)
          Converts the specified string into an long value.
static int toSimpleInt(byte[] token)
          Converts the specified token into a positive integer value.
static byte[] trim(byte[] token)
          Removes leading and trailing whitespaces from the specified token.
static byte[] uc(byte[] token)
          Converts the specified token to upper case.
static int uc(int ch)
          Converts a character to upper case.
static byte[] uri(byte[] token, boolean iri)
          Returns a URI encoded token.
static byte[] utf8(byte[] token, java.lang.String encoding)
          Converts a token from the input encoding to UTF8.
static boolean ws(byte[] token)
          Checks if the specified token has only whitespaces.
static boolean ws(int ch)
          Checks if the specified character is a whitespace.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EMPTY

public static final byte[] EMPTY
Empty token.


XML

public static final byte[] XML
XML token.


XMLC

public static final byte[] XMLC
XML token with colon.


XMLNS

public static final byte[] XMLNS
XMLNS token.


XMLNSC

public static final byte[] XMLNSC
XMLNS token with colon.


TRUE

public static final byte[] TRUE
Token 'true'.


FALSE

public static final byte[] FALSE
Token 'false'.


NULL

public static final byte[] NULL
Token 'null'.


INF

public static final byte[] INF
Token 'INF'.


NINF

public static final byte[] NINF
Token '-INF'.


SPACE

public static final byte[] SPACE
Space.


ZERO

public static final byte[] ZERO
Number '0'.


ONE

public static final byte[] ONE
Number '1'.


SLASH

public static final byte[] SLASH
Slash.


COLON

public static final byte[] COLON
Colon.


HEX

public static final byte[] HEX
Hex codes.


UTF8

public static final java.lang.String UTF8
UTF8 encoding string.

See Also:
Constant Field Values

UTF82

public static final java.lang.String UTF82
UTF8 encoding string (variant).

See Also:
Constant Field Values

UTF16

public static final java.lang.String UTF16
UTF16 encoding string.

See Also:
Constant Field Values

UTF162

public static final java.lang.String UTF162
UTF16 encoding string.

See Also:
Constant Field Values

UTF16BE

public static final java.lang.String UTF16BE
UTF16BE (=UTF16) encoding string.

See Also:
Constant Field Values

UTF16LE

public static final java.lang.String UTF16LE
UTF16 encoding string.

See Also:
Constant Field Values

UTF32

public static final java.lang.String UTF32
UTF16 encoding string.

See Also:
Constant Field Values

UTF322

public static final java.lang.String UTF322
UTF16 encoding string.

See Also:
Constant Field Values

COMP

public static final java.util.Comparator<byte[]> COMP
Comparator for byte arrays.


LC_COMP

public static final java.util.Comparator<byte[]> LC_COMP
Case-insensitive comparator for byte arrays.

Method Detail

string

public static java.lang.String string(byte[] token)
Returns the specified token as string.

Parameters:
token - token
Returns:
string

string

public static java.lang.String string(byte[] token,
                                      int start,
                                      int length)
Returns the specified token as string.

Parameters:
token - token
start - start position
length - length
Returns:
string

ascii

public static boolean ascii(byte[] token)
Checks if the specified token only consists of ASCII characters.

Parameters:
token - token
Returns:
result of check

token

public static byte[] token(java.lang.String string)
Converts a string to a byte array. All strings should be converted by this function to guarantee a consistent character conversion.

Parameters:
string - string to be converted
Returns:
byte array

tokens

public static byte[][] tokens(java.lang.String... strings)
Converts the specified strings to tokens.

Parameters:
strings - strings
Returns:
tokens

utf8

public static byte[] utf8(byte[] token,
                          java.lang.String encoding)
Converts a token from the input encoding to UTF8.

Parameters:
token - token to be converted
encoding - input encoding
Returns:
byte array

normEncoding

public static java.lang.String normEncoding(java.lang.String encoding)
Returns a unified representation of the specified encoding.

Parameters:
encoding - input encoding (UTF-8 is returned for a null reference)
Returns:
encoding

normEncoding

public static java.lang.String normEncoding(java.lang.String encoding,
                                            java.lang.String old)
Returns a unified representation of the specified encoding.

Parameters:
encoding - input encoding (UTF-8 is returned for a null reference)
old - previous encoding (optional)
Returns:
encoding

supported

public static boolean supported(java.lang.String encoding)
Checks if the specified encoding is supported.

Parameters:
encoding - encoding
Returns:
result of check

cp

public static int cp(byte[] token,
                     int pos)
Returns the codepoint (unicode value) of the specified token, starting at the specified position. Returns a unicode replacement character for invalid values.

Parameters:
token - token
pos - character position
Returns:
current character

cl

public static int cl(byte cp)
Returns the length of the specified UTF8 byte.

Parameters:
cp - codepoint
Returns:
character length

cl

public static int cl(byte[] token,
                     int pos)
Returns the length of a UTF8 character at the specified position.

Parameters:
token - token
pos - position
Returns:
character length

cps

public static int[] cps(byte[] token)
Converts a token to a sequence of codepoints.

Parameters:
token - token
Returns:
codepoints

len

public static int len(byte[] token)
Returns the token length.

Parameters:
token - token
Returns:
length

token

public static byte[] token(boolean bool)
Creates a byte array representation of the specified boolean value.

Parameters:
bool - boolean value to be converted
Returns:
boolean value in byte array

token

public static byte[] token(int integer)
Creates a byte array representation of the specified integer value.

Parameters:
integer - int value to be converted
Returns:
integer value in byte array

numDigits

public static int numDigits(int integer)
Checks number of digits of the specified integer.

Parameters:
integer - number to be checked
Returns:
number of digits

token

public static byte[] token(long integer)
Creates a byte array representation from the specified long value, using Java's standard method.

Parameters:
integer - value to be converted
Returns:
byte array

token

public static byte[] token(double dbl)
Creates a byte array representation from the specified double value; inspired by Xavier Franc's Qizx/open processor.

Parameters:
dbl - double value to be converted
Returns:
byte array

token

public static byte[] token(float flt)
Creates a byte array representation from the specified float value.

Parameters:
flt - float value to be converted
Returns:
byte array

chopNumber

public static byte[] chopNumber(byte[] token)
Finishes the numeric token, removing trailing zeroes.

Parameters:
token - token to be modified
Returns:
token

toDouble

public static double toDouble(byte[] token)
Converts the specified token into a double value. Double.NaN is returned if the input is invalid.

Parameters:
token - token to be converted
Returns:
resulting double value

toLong

public static long toLong(java.lang.String string)
Converts the specified string into an long value. Long.MIN_VALUE is returned when the input is invalid.

Parameters:
string - string to be converted
Returns:
resulting long value

toLong

public static long toLong(byte[] token)
Converts the specified token into an long value. Long.MIN_VALUE is returned when the input is invalid.

Parameters:
token - token to be converted
Returns:
resulting long value

toLong

public static long toLong(byte[] token,
                          int start,
                          int end)
Converts the specified token into an long value. Long.MIN_VALUE is returned when the input is invalid.

Parameters:
token - token to be converted
start - first byte to be parsed
end - last byte to be parsed - exclusive
Returns:
resulting long value

toInt

public static int toInt(java.lang.String string)
Converts the specified string into an integer value. Integer.MIN_VALUE is returned when the input is invalid.

Parameters:
string - string to be converted
Returns:
resulting integer value

toInt

public static int toInt(byte[] token)
Converts the specified token into an integer value. Integer.MIN_VALUE is returned when the input is invalid.

Parameters:
token - token to be converted
Returns:
resulting integer value

toInt

public static int toInt(byte[] token,
                        int start,
                        int end)
Converts the specified token into an integer value. Integer.MIN_VALUE is returned when the input is invalid.

Parameters:
token - token to be converted
start - first byte to be parsed
end - last byte to be parsed (exclusive)
Returns:
resulting integer value

toSimpleInt

public static int toSimpleInt(byte[] token)
Converts the specified token into a positive integer value. Integer.MIN_VALUE is returned if non-digits are found or if the input is longer than nine characters.

Parameters:
token - token to be converted
Returns:
resulting integer value

hash

public static int hash(byte[] token)
Calculates a hash code for the specified token.

Parameters:
token - specified token
Returns:
hash code

eq

public static boolean eq(byte[] token1,
                         byte[] token2)
Compares two tokens for equality.

Parameters:
token1 - first token
token2 - token to be compared
Returns:
true if the arrays are equal

eq

public static boolean eq(byte[] token,
                         byte[]... tokens)
Compares several tokens for equality.

Parameters:
token - token
tokens - tokens to be compared
Returns:
true if one test is successful

eq

public static boolean eq(java.lang.String str,
                         java.lang.String... strings)
Compares several strings for equality.

Parameters:
str - first string
strings - strings to be compared
Returns:
true if one test is successful

eqic

public static boolean eqic(java.lang.String str,
                           java.lang.String... strings)
Compares several strings for equality, ignoring the case.

Parameters:
str - first string
strings - strings to be compared
Returns:
true if one test is successful

diff

public static int diff(byte[] token,
                       byte[] compare)
Compares two tokens lexicographically.

Parameters:
token - first token
compare - token to be compared
Returns:
0 if tokens are equal, negative if first token is smaller, positive if first token is bigger

min

public static byte[] min(byte[] token,
                         byte[] compare)
Returns the smaller token.

Parameters:
token - first token
compare - token to be compared
Returns:
smaller token

max

public static byte[] max(byte[] token,
                         byte[] compare)
Returns the bigger token.

Parameters:
token - first token
compare - token to be compared
Returns:
bigger token

contains

public static boolean contains(byte[] token,
                               byte[] sub)
Checks if the first token contains the second token.

Parameters:
token - token
sub - token to be found
Returns:
result of test

contains

public static boolean contains(byte[] token,
                               int c)
Checks if the first token contains the specified character.

Parameters:
token - token
c - character to be found
Returns:
result of test

indexOf

public static int indexOf(byte[] token,
                          int c)
Returns the position of the specified character or -1.

Parameters:
token - token
c - character to be found
Returns:
position or -1

lastIndexOf

public static int lastIndexOf(byte[] token,
                              int c)
Returns the last position of the specified character or -1.

Parameters:
token - token
c - character to be found
Returns:
position or -1

indexOf

public static int indexOf(byte[] token,
                          byte[] sub)
Returns the position of the specified token or -1.

Parameters:
token - token
sub - token to be found
Returns:
position or -1

indexOf

public static int indexOf(byte[] token,
                          byte[] sub,
                          int pos)
Returns the position of the specified token or -1.

Parameters:
token - token
sub - token to be found
pos - start position
Returns:
result of test

startsWith

public static boolean startsWith(byte[] token,
                                 int ch)
Checks if the first token starts with the specified character.

Parameters:
token - token
ch - character to be found
Returns:
result of test

startsWith

public static boolean startsWith(byte[] token,
                                 byte[] sub)
Checks if the first token starts with the second token.

Parameters:
token - token
sub - token to be found
Returns:
result of test

endsWith

public static boolean endsWith(byte[] token,
                               int ch)
Checks if the first token starts with the specified character.

Parameters:
token - token
ch - character to be bound
Returns:
result of test

endsWith

public static boolean endsWith(byte[] token,
                               byte[] sub)
Checks if the first token ends with the second token.

Parameters:
token - token
sub - token to be found
Returns:
result of test

substring

public static byte[] substring(byte[] token,
                               int start)
Returns a substring of the specified token. Note that this method does not correctly split UTF8 character; use subtoken(byte[], int) instead.

Parameters:
token - input token
start - start position
Returns:
substring

substring

public static byte[] substring(byte[] token,
                               int start,
                               int end)
Returns a substring of the specified token. Note that this method does not correctly split UTF8 character; use subtoken(byte[], int) instead.

Parameters:
token - input token
start - start position
end - end position
Returns:
substring

subtoken

public static byte[] subtoken(byte[] token,
                              int start)
Returns a partial token.

Parameters:
token - input token
start - start position
Returns:
resulting text

subtoken

public static byte[] subtoken(byte[] token,
                              int start,
                              int end)
Returns a partial token.

Parameters:
token - input text
start - start position
end - end position
Returns:
resulting text

split

public static byte[][] split(byte[] token,
                             int sep)
Splits a token around matches of the given separator.

Parameters:
token - token to be split
sep - separation character
Returns:
array

replaceAll

public static byte[] replaceAll(byte[] token,
                                java.lang.String pattern,
                                java.lang.String replace)
Performs a regular expression on the specified string.

Parameters:
token - token to match
pattern - regular expression
replace - replacement string
Returns:
resulting string

ws

public static boolean ws(byte[] token)
Checks if the specified token has only whitespaces.

Parameters:
token - token
Returns:
true if all characters are whitespaces

replace

public static byte[] replace(byte[] token,
                             int search,
                             int replace)
Replaces the specified character and returns the result token.

Parameters:
token - token to be checked
search - the character to be replaced
replace - the new character
Returns:
resulting token

trim

public static byte[] trim(byte[] token)
Removes leading and trailing whitespaces from the specified token.

Parameters:
token - token to be trimmed
Returns:
trimmed token

chop

public static byte[] chop(byte[] token,
                          int max)
Chops a token to the specified length and adds dots.

Parameters:
token - token to be chopped
max - maximum length
Returns:
chopped token

concat

public static byte[] concat(byte[] token1,
                            byte[] token2)
Concatenates two tokens.

Parameters:
token1 - first token
token2 - second token
Returns:
resulting array

concat

public static byte[] concat(byte[] token1,
                            byte[] token2,
                            byte[] token3)
Concatenates three tokens. A TokenBuilder instance can be used to concatenate more than three tokens.

Parameters:
token1 - first token
token2 - second token
token3 - third token
Returns:
resulting array

delete

public static byte[] delete(byte[] token,
                            int ch)
Deletes the specified character from the token.

Parameters:
token - token
ch - character to be removed
Returns:
resulting token

norm

public static byte[] norm(byte[] token)
Normalizes all whitespace occurrences from the specified token.

Parameters:
token - token
Returns:
normalized token

ws

public static boolean ws(int ch)
Checks if the specified character is a whitespace.

Parameters:
ch - the letter to be checked
Returns:
result of check

letter

public static boolean letter(int ch)
Checks if the specified character is a computer letter (A - Z, a - z, _).

Parameters:
ch - the letter to be checked
Returns:
result of check

digit

public static boolean digit(int ch)
Checks if the specified character is a digit (0 - 9).

Parameters:
ch - the letter to be checked
Returns:
result of check

letterOrDigit

public static boolean letterOrDigit(int ch)
Checks if the specified character is a computer letter or digit.

Parameters:
ch - the letter to be checked
Returns:
result of check

ftChar

public static boolean ftChar(int ch)
Returns true if the specified character is a full-text letter or digit.

Parameters:
ch - character to be tested
Returns:
result of check

uc

public static byte[] uc(byte[] token)
Converts the specified token to upper case.

Parameters:
token - token to be converted
Returns:
resulting token

uc

public static int uc(int ch)
Converts a character to upper case.

Parameters:
ch - character to be converted
Returns:
resulting character

lc

public static byte[] lc(byte[] token)
Converts the specified token to lower case.

Parameters:
token - token to be converted
Returns:
resulting token

lc

public static int lc(int ch)
Converts a character to lower case.

Parameters:
ch - character to be converted
Returns:
resulting character

prefix

public static byte[] prefix(byte[] name)
Returns the prefix of the specified token.

Parameters:
name - name
Returns:
prefix or empty token if no prefix exists

local

public static byte[] local(byte[] name)
Returns the local name of the specified name.

Parameters:
name - name
Returns:
local name

uri

public static byte[] uri(byte[] token,
                         boolean iri)
Returns a URI encoded token.

Parameters:
token - token
iri - input
Returns:
encoded token

escape

public static byte[] escape(byte[] token)
Escapes the specified token.

Parameters:
token - token
Returns:
escaped token

md5

public static java.lang.String md5(java.lang.String string)
Returns an MD5 hash in lower case.

Parameters:
string - string to be hashed
Returns:
md5 hash

hex

public static byte[] hex(byte[] val,
                         boolean uc)
Returns a hex representation of the specified byte array.

Parameters:
val - values to be mapped
uc - upper case
Returns:
hex representation

norm

public static int norm(int ch)
Returns a normalized character without diacritics. This method supports all latin1 characters, including supplements.

Parameters:
ch - character to be normalized
Returns:
resulting character