The new score implemented in previous version is actually tested. For very small strings (1 word) it seems significantly better. But for medium strings, let's say 2 or 3 words, the previous implementation gave often an high score when only one word changes.
This is due to the fact that the score took in consideration spaces as single items, with same importance as words.
Example: let's compare "One string" with "The string".
The problem is that space is like a default separator, it is frequently present in most strings, so once the string has been cut, spaces are not anymore an information. For that reason, we now decided to still consider punctuations (dots, commas and quotation marks do really add an information) but not spaces.
When you display a diff string in the matches pane, actually in all cases when 2 words differ, it displays one word as a deletion and other as an insertion.
To make the string shorter, now we experiment the following changes:
Note that these two optimizations are mutually exclusive: if the difference contains both case and & change, we keep original algorithm.
This is consistent with the way the new score works: case changes or & changes have less weight in the score than a full change in the word.
Let's remember that all of this is experimental, we are intersted in receiving comments in order to build the best possible scoring alrorithm.
Add new comment