Text Similarity
Compare similarity between two texts (Levenshtein distance)
Frequently Asked Questions
What is the Levenshtein distance algorithm?
The Levenshtein distance measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another. The similarity percentage is calculated as 100% minus the normalized distance.
How is text similarity calculated?
Similarity = (1 - distance / maxLength) × 100%. The distance is divided by the length of the longer text to normalize the score. Identical texts return 100%, while completely different texts approach 0%.
When is text similarity useful?
Text similarity is useful for plagiarism detection, duplicate content identification, version comparison, fuzzy matching, spell checking suggestion ranking, and measuring document relatedness.
What are the limitations of Levenshtein distance?
Levenshtein distance treats all character changes equally and doesn't consider semantic meaning. It's sensitive to text length and may not accurately reflect conceptual similarity. For semantic comparison, natural language processing techniques work better.