org.mmbase.module.lucene.analysis.en
Class StandardCleaningAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.mmbase.module.lucene.analysis.en.StandardCleaningAnalyzer
Direct Known Subclasses:
StandardCleaningAnalyzer

public class StandardCleaningAnalyzer
extends Analyzer

Filters StandardTokenizer with StandardFilter, LowerCaseFilter, StopFilter and ISOLatin1AccentFilter.

Version:
$Id $
Author:
Wouter Heijke

Field Summary
static String[] STOP_WORDS
          An array containing some common English words that are usually not useful for searching.
 
Fields inherited from class org.apache.lucene.analysis.Analyzer
overridesTokenStreamMethod
 
Constructor Summary
StandardCleaningAnalyzer()
          Builds an analyzer.
StandardCleaningAnalyzer(String[] stopWords)
          Builds an analyzer with the given stop words.
 
Method Summary
 void setCleanHtml(boolean clean)
           
 TokenStream tokenStream(String fieldName, Reader reader)
          Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setOverridesTokenStreamMethod, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STOP_WORDS

public static final String[] STOP_WORDS
An array containing some common English words that are usually not useful for searching.

Constructor Detail

StandardCleaningAnalyzer

public StandardCleaningAnalyzer()
Builds an analyzer.


StandardCleaningAnalyzer

public StandardCleaningAnalyzer(String[] stopWords)
Builds an analyzer with the given stop words.

Method Detail

setCleanHtml

public void setCleanHtml(boolean clean)

tokenStream

public TokenStream tokenStream(String fieldName,
                               Reader reader)
Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.

Specified by:
tokenStream in class Analyzer


MMBase 2.0-SNAPSHOT - null