edu.rice.cs.util
Class BalancingStreamTokenizer

java.lang.Object
  extended by edu.rice.cs.util.BalancingStreamTokenizer

public class BalancingStreamTokenizer
extends Object

A tokenizer that splits a stream into string tokens while balancing quoting characters.

Version:
$Id$
Author:
Mathias Ricken

Nested Class Summary
static class BalancingStreamTokenizer.KeywordStartsWithQuoteException
          Keyword starts with quote exception.
static class BalancingStreamTokenizer.KeywordStartsWithWhitespaceException
          Keyword starts with whitespace exception.
static class BalancingStreamTokenizer.QuoteStartsWithWhitespaceException
          Quote starts with whitespace exception.
static class BalancingStreamTokenizer.SetupException
          Setup exception.
static class BalancingStreamTokenizer.StartsWithWhitespaceException
          Quote or keyword starts with whitespace exception.
static class BalancingStreamTokenizer.State
          State of the tokenizer.
static class BalancingStreamTokenizer.Token
          Kind of tokens to be returned.
 
Field Summary
protected  Character _escape
          Escape character, if available.
protected  boolean _isEscape
          The current character is the escape character.
 Stack<Integer> _pushed
          Stack of characters having been pushed back.
protected  Reader _reader
          Input Reader.
protected  BalancingStreamTokenizer.State _state
          Current state of the tokenizer.
protected  Stack<BalancingStreamTokenizer.State> _stateStack
          Stack of previous states.
 BalancingStreamTokenizer.Token _token
           
protected  boolean _wasEscape
          The previous character was the escape character.
 
Constructor Summary
BalancingStreamTokenizer(Reader r)
          Create a new balancing stream tokenizer.
BalancingStreamTokenizer(Reader r, Character escape)
          Create a new balancing stream tokenizer.
 
Method Summary
 void addKeyword(String kw)
          Specify a new keyword.
 void addQuotes(String begin, String end)
          Specify a pair of quotes.
 void defaultThreeQuoteCurlySetup()
          Setup a tokenizer that recognizes ", ' and ` quotes and { } braces.
 void defaultThreeQuoteDollarCurlySetup()
          Setup a tokenizer that recognizes ", ' and ` quotes and ${ } braces.
 void defaultThreeQuoteSetup()
          Setup a tokenizer that recognizes ", ' and ` quotes.
 void defaultTwoQuoteCurlySetup()
          Setup a tokenizer that recognizes " and ' quotes and { } braces.
 void defaultTwoQuoteSetup()
          Setup a tokenizer that recognizes " and ' quotes.
 void defaultWhitespaceSetup()
          Setup a tokenizer with just whitespace.
protected  String escape(String s)
           
protected  String findMatch(int c, TreeSet<String> choices, Lambda<String,String> notFoundLambda)
           
 String getNextToken()
          Return the next token, or null if the end of the stream has been reached.
 BalancingStreamTokenizer.State getState()
          Return a copy of the current state of the tokenizer.
protected  int nextToken()
          Return the next token from the reader, or from the stack if it isn't empty.
protected  void popState()
          Pops the top of the state stack and makes it the current state.
static TreeSet<String> prefixSet(Set<String> set, String prefix)
          Return the subset of the set whose entries begin with the prefix.
protected  void pushState()
          Push the current state onto the stack.
protected  void pushToken(int token)
          Push a token back onto the stack.
 void setState(BalancingStreamTokenizer.State state)
          Set the stream tokenizer the the state specified.
 BalancingStreamTokenizer.Token token()
          Returns the type of the current token.
protected  String unescape(String s)
           
 void whitespace(int... c)
          Specify one or more characters as whitespace.
 void whitespaceRange(int lo, int hi)
          Specify a range characters as whitespace.
 void wordChars(int... c)
          Specify one or more characters as word characters.
 void wordRange(int lo, int hi)
          Specify a range characters as word characters.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_reader

protected Reader _reader
Input Reader.


_pushed

public Stack<Integer> _pushed
Stack of characters having been pushed back.


_state

protected BalancingStreamTokenizer.State _state
Current state of the tokenizer.


_stateStack

protected Stack<BalancingStreamTokenizer.State> _stateStack
Stack of previous states.


_escape

protected Character _escape
Escape character, if available. If this character is placed in front of any quote or keyword, the quote or keyword is treated as normal text. To get this character to exist alone, it has to be doubled up. If this escape character appears alone where it does not precede another escape character, whitespace, a quote or keyword, it is dropped. The escape character CANNOT be declared whitespace. The escape character CAN be part of a quote or keyword, but it has to be doubled up in the string, and when the quotes or keywords are added, the escape character is automatically doubled up if present. If set to null, no escaping is possible.


_wasEscape

protected boolean _wasEscape
The previous character was the escape character.


_isEscape

protected boolean _isEscape
The current character is the escape character.


_token

public volatile BalancingStreamTokenizer.Token _token
Constructor Detail

BalancingStreamTokenizer

public BalancingStreamTokenizer(Reader r)
Create a new balancing stream tokenizer.

Parameters:
r - reader to tokenize

BalancingStreamTokenizer

public BalancingStreamTokenizer(Reader r,
                                Character escape)
Create a new balancing stream tokenizer.

Parameters:
r - reader to tokenize
escape - escape character or null
Method Detail

defaultWhitespaceSetup

public void defaultWhitespaceSetup()
Setup a tokenizer with just whitespace.


defaultTwoQuoteSetup

public void defaultTwoQuoteSetup()
Setup a tokenizer that recognizes " and ' quotes.


defaultThreeQuoteSetup

public void defaultThreeQuoteSetup()
Setup a tokenizer that recognizes ", ' and ` quotes.


defaultTwoQuoteCurlySetup

public void defaultTwoQuoteCurlySetup()
Setup a tokenizer that recognizes " and ' quotes and { } braces.


defaultThreeQuoteCurlySetup

public void defaultThreeQuoteCurlySetup()
Setup a tokenizer that recognizes ", ' and ` quotes and { } braces.


defaultThreeQuoteDollarCurlySetup

public void defaultThreeQuoteDollarCurlySetup()
Setup a tokenizer that recognizes ", ' and ` quotes and ${ } braces.


nextToken

protected int nextToken()
                 throws IOException
Return the next token from the reader, or from the stack if it isn't empty.

Returns:
next token or -1 when end of stream
Throws:
IOException

pushToken

protected void pushToken(int token)
Push a token back onto the stack.

Parameters:
token - token to push back

getState

public BalancingStreamTokenizer.State getState()
Return a copy of the current state of the tokenizer.

Returns:
copy of the state

setState

public void setState(BalancingStreamTokenizer.State state)
Set the stream tokenizer the the state specified.

Parameters:
state - state

pushState

protected void pushState()
Push the current state onto the stack.


popState

protected void popState()
Pops the top of the state stack and makes it the current state.


token

public BalancingStreamTokenizer.Token token()
Returns the type of the current token.


wordRange

public void wordRange(int lo,
                      int hi)
Specify a range characters as word characters.

Parameters:
lo - the character beginning the word character range, inclusive
hi - the character ending the word character range, inclusive

wordChars

public void wordChars(int... c)
Specify one or more characters as word characters.

Parameters:
c - the character(s)

whitespaceRange

public void whitespaceRange(int lo,
                            int hi)
Specify a range characters as whitespace.

Parameters:
lo - the character beginning the whitespace range, inclusive
hi - the character ending the whitespace range, inclusive

whitespace

public void whitespace(int... c)
Specify one or more characters as whitespace.

Parameters:
c - the character(s)

addQuotes

public void addQuotes(String begin,
                      String end)
Specify a pair of quotes.

Parameters:
begin - the beginning quotation mark
end - the ending quotation mark

addKeyword

public void addKeyword(String kw)
Specify a new keyword.

Parameters:
kw - the new keyword

getNextToken

public String getNextToken()
                    throws IOException
Return the next token, or null if the end of the stream has been reached.

Returns:
next token, or null if end of stream has been reached.
Throws:
IOException

prefixSet

public static TreeSet<String> prefixSet(Set<String> set,
                                        String prefix)
Return the subset of the set whose entries begin with the prefix.

Parameters:
set - parent set
prefix - prefix string
Returns:
subset of only those entries that begin with the prefix

findMatch

protected String findMatch(int c,
                           TreeSet<String> choices,
                           Lambda<String,String> notFoundLambda)
                    throws IOException
Throws:
IOException

escape

protected String escape(String s)

unescape

protected String unescape(String s)