org.mmbase.module.lucene.extraction
Interface Extractor

All Known Implementing Classes:
PDFBoxExtractor, POIExcelExtractor, POIWordExtractor, StringsExtractor, SwingRTFExtractor, TextMiningExtractor

public interface Extractor

Content Extractor interface

Version:
$Id $
Author:
Wouter Heijke

Method Summary
 String extract(InputStream source)
          Extract text from a source
 String getMimeType()
          Mimetype this Extractor handles
 void setMimeType(String mimetype)
          Mimetype this Extractor handles
 

Method Detail

setMimeType

void setMimeType(String mimetype)
Mimetype this Extractor handles

Parameters:
mimetype - String representing the MIME Type

getMimeType

String getMimeType()
Mimetype this Extractor handles

Returns:
String representing the MIME Type

extract

String extract(InputStream source)
               throws Exception
Extract text from a source

Parameters:
source - InputStream where the data comes from
Returns:
String representing the extracted text
Throws:
Exception


MMBase 2.0-SNAPSHOT - null