org.mmbase.applications.xmlimporter
Class FuzzyStringMatcher
java.lang.Object
|
+--org.mmbase.applications.xmlimporter.FuzzyStringMatcher
- public class FuzzyStringMatcher
- extends java.lang.Object
Utility class, providing methods for a fuzzy comparison between strings.
- Since:
- MMBase-1.5
- Version:
- $Id: FuzzyStringMatcher.java,v 1.2 2002/02/27 16:54:25 pierre Exp $
- Author:
- Rob van Maris (Finalist IT Group)
|
Method Summary |
static float |
getMatchRate(java.lang.String string1,
java.lang.String string2)
Calculates the match rate, a value between 0 and 1, proportional
to the rate the two strings match (1 is exact match). |
static int |
getMismatch(java.lang.String string1,
java.lang.String string2)
Calculates the mismatch between two strings. |
static java.lang.String |
normalizeString(java.lang.String str)
Creates normalized title, e.g. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
getMismatch
public static int getMismatch(java.lang.String string1,
java.lang.String string2)
- Calculates the mismatch between two strings.
- Parameters:
string1 - first stringstring2 - second string- Returns:
- The number of mismatches,this is the minimum number of typo's
necessary to account for the differences between the two strings,
if they were meant to be identical.
getMatchRate
public static float getMatchRate(java.lang.String string1,
java.lang.String string2)
- Calculates the match rate, a value between 0 and 1, proportional
to the rate the two strings match (1 is exact match).
This is calculated as
1 - (mismatch/max(string1.length(), string2.length())).
- Parameters:
string1 - first stringstring2 - second string- Returns:
- The match rate.
normalizeString
public static java.lang.String normalizeString(java.lang.String str)
- Creates normalized title, e.g. all non-alphanumeric
characters replaced by white space, all characters converted
to lowercase non-diacritical characters, and all white space
sequences contracted to a single white space character.
This is a convenience method, provided to make string comparison
easier by removing (more or less) arbitrary differences.
- Parameters:
str - The original title.- Returns:
- The normalized title.
MMBase build 1.6.5.20030923