In the public version of OmegaT, except for regular expressions, the engine always searches for strings: even if you do not use joker * or ?, a segment containing «contest» will also match the string «test», just because what is considered is the string, not the word.
In our screens, on the contrary, exact and word searches must also be completed by a word mode type :
In exact and keywod expression modes, word modes are defined as follows:
Let's see the difference between them in a table. If you typed "test", without joker characters, then:
If the text contains… |
Strings |
Whole words |
Lemmas |
test |
Yes |
Yes |
Yes |
tested |
Yes (it "contains" the string t + e + s + t) |
No (only the word "test" is accepted) |
Yes |
protest |
Yes (it "contains" the string t + e + s + t) |
No (only the word "test" is accepted) |
No (protest is not a grammatical variant of "test") |
Now let's do the same with joker. If you typed "t*st":
If the text contains… |
Strings |
Whole words |
Lemmas |
test |
Yes |
Yes |
No (Lemmas mode does not support jokers) |
the_file.test |
Yes (* means 'any character except space') |
No (* does not accept '.') |
|
to test |
No (* will reject the space) |
No (* does not accept space) |
The difference between exact and keyword search does not change, even with lemmas: keyword search means that the lemmas can appear anywhere in the segment, eventually with a different order. In exact search + lemmas search mode, the segment is tokenized according to the MATCHING mode, meaning that each term is lemmatized but stop words are not removed (note: except for glossaries configuration, where GLOSSARY mode is used). Let's say that you typed «test element», in exact search mode, then:
If the text contains… |
Strings |
Whole words |
Lemmas |
test element |
Yes |
Yes |
Yes |
test elements |
Yes (it contains the string t + e + s + t + space + e + l + e + m + e + n + t) |
No (only the word "element" is accepted) (*) |
Yes |
tested elements |
No (it does not contain the string t + e + s + t + space + e + l + e + m + e + n + t) (*) |
No (only the words "test" and "element" are accepted) |
Yes |
contest elements |
Yes (it contains the string t + e + s + t + space + e + l + e + m + e + n + t) |
No (only the words "test" and "element" are accepted) |
No (contest is not a grammatical variant of "test") (*) |
contested elements |
No (it does not contain the string t + e + s + t + space + e + l + e + m + e + n + t) (*) |
No (only the words "test" and "element" are accepted) |
No (contested is not a grammatical variant of "test") (*) |
To be complete, note that for the entries marked with (*), almost one of the words is correct, meaning that in keyword search, this word will be marked. Of course, since keywords search is an AND search, the string will be rejected.
When you select «Regular expressions», then the «Word mode» is replaced by «Regular expression mode» :
The options include :
Partial segment: the segment must contain something matching the regular expression (this is what Omega-T actually does)
Full segment: the entire segment must match the regular expression. This is equivalent to adding \A at the beginning and \z at the end.
Whole words: equivalent to adding \b in the beginning and the end of the searched text.
Add new comment