다음을 통해 공유


Levenshtein Algorithm in Visual Basic


Introduction

Levenshtein's distance measures the minimum number of character edits required to change a word into another. In this tip, we'll see a simple implementation of the Levenshtein algorithm in Visual Basic. It will be useful in several situations, when managing - for example - large amount of text, and we are in need of fast and massive modifications in our data, to spot and correct typos, or to match strings in terms of similarity, and not simply in terms of equal/not equal. Levenshtein's algorithm allows to compute a score regarding string similarity. We'll see how in a moment.

Standard Levenshtein Algorithm

Here follows the standard Levenshtein implementation in VB.NET, according to the algorithm as shown at Wikipedia.

Public Function  Levenshtein(ByVal  s As  String, ByVal t As String) As  Integer
  Dim n As Integer  = s.Length
  Dim m As Integer  = t.Length
  Dim d(n + 1, m + 1) As Integer
  
  If n = 0 Then
      Return m
  End If
  If m = 0 Then
      Return n
  End If
  
  Dim i As Integer
  Dim j As Integer
  
  For i = 0 To n
      d(i, 0) = i
  Next
  For j = 0 To m
      d(0, j) = j
  Next
  For i = 1 To n
      For j = 1 To m
          Dim cost As Integer
          If t(j - 1) = s(i - 1) Then
             cost = 0
          Else
             cost = 1
          End If
          d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
      Next
  Next
  Return d(n, m)
End Function

Examples

Using the above function(s) is trivial: it's sufficient to call it by passing two strings, plus the optional boolean flag for case-sensitivity check.

Some examples could be:

MsgBox(Levenshtein("Inwards", "inwards").ToString)       ' Returns 1
MsgBox(Levenshtein("Inwards", "inwards").ToString)       ' Returns 1
MsgBox(Levenshtein("towards", "towards").ToString)       ' Returns 0
MsgBox(Levenshtein("dinner", "breakfast").ToString)      ' Returns 9
MsgBox(Levenshtein("breakfast", "braekfast").ToString)   ' Returns 2
  
MsgBox(Levenshtein("efficient", "sufficient").ToString)  ' Returns 2
MsgBox(Levenshtein("grandma", "anathema").ToString)      ' Returns 5
MsgBox(Levenshtein("aunt", "ant").ToString)              ' Returns 1

The results (1, 0, 0, 9, 2, 2, 5, 1) are the Levenshtein's distances between given strings, i.e., a score regardingstrings similarity. The lower the score, the nearer are the two entities. A value of zero means, obviously, a total convergence of the two. We could use a function like this (with predetermined conditions to be satisfied, like "the score must not exceed 3", for example) to correct typos (as in the "breakfast - braekfast" example), or to search for differences in hypothetical data (like "new York - New York"), and so on.

Bibliography