Search by author / date in translation memories

In the code of the Searcher class we can read the following, in comment:
"Search TM entries, unless we search for date or author.
They are not loaded from external TM, so skip the search in that case."

The phrase "They are not loaded from external TM" was true when it was written (in 2010, accordig to git blame) but now, date and author are stored correctly in the PrepareTMXEntry, so this restriction should not be kept anymore.

On the other hand, some years later, when OmegaT team implemented search inside glossaries, they did not put the same restriction, despite the fact that glossaries do not store date and author!

In DGT-OmegaT, we restored a behaviour consistent with what is on memory. To summarize:

Search on Standard OmegaT DGT-OmegaT
Project memory Use filter (date and author are always present) Use filter (date and author are always present)
Orphan segments Use filter (date and author are always present) Use filter (date and author are always present)
Source files Non applyable:
Standard OmegaT does not allow search on source files
Reject
(date is not stored, so the entry cannot match the criteria)
Translation memories

Reject
(initially date and author were not stored,
so match was impossible. But now it should...)

Use filter
(if filter is present,we use it)

Glossaries Ignore filter
(search is not unactivated, but filter is ignored)

Reject
(date is not stored, so the entry cannot match the criteria)

This correction has been submitted to OmegaT (ticket 918), with patches for both OmegaT 3.6 and 4.1, but not yet integrated. On the contrary a small discussion started, which finally lets us think about a potential new option, now implemented in releases 3.4-TEST-3 and 3.5-DEV-3 :

Missing field means true

As described in previous table, the initial idea was that when a field such as date or author is missing in a given entry, while the user specified a date or an author, we consider that this entry is to be rejected.

However, glossaries are not the only entries which can miss author or date: TMX can store them but they are not obliged to do so, and if we extend to external providers, it could happen that some of them store date or not. It could also happen that some of them can retreive date or author in the result but do not allow to do a search for these fields. So, should we always exclude these providers when a required field is missing (as standard OmegaT does for TMX files), or on the contrary, consider that the entry is always valid (as standard OmegaT does for glossaries)?

Since both approaches are possible, the best way is to consider it as an option selectable by the user. By default we should consider this option as false: if you require "author = XXX" and your entry does not have an author, it also does not respect "author = XXX". But since this could beak compatibility with some providers (minimally with all glossaries, but potentially with some external TM providers, if they never store dates or authors) the user is now allowed to consider these entries as true instead. But warning: there is only one option, it is not possible to set true for glossaries and false for anything else (in order to mimic standard OmegaT's behavior?), for example.

Note that the option does not concern the project entries: they always store date and author. That is also the reason why this option only exists in normal search screen (pre-translate and replacement act only on ongoing work; directory search does not store dates). Now that the option is correctly implemented in 3.4-TEST and 3.5-DEV, for these releases the previous table should be modified as follows:

Search on Standard OmegaT DGT-OmegaT 3.3,  3.4 (STABLE) DGT-OmegaT 3.5-FINAL to 3.7-DEV
Project memory Use filter (date and author are always present) Use filter (date and author are always present) Use filter (date and author are always present)
Orphan segments Use filter (date and author are always present) Use filter (date and author are always present) Use filter (date and author are always present)
Source files (non-XLIFF) Non applyable:
Standard OmegaT does not allow search on source files
Reject
(date is not stored, so the entry cannot match the criteria)
Option
(date is not stored, so if option is true filter is ignored, else entry is rejected)
Source files (XLIFF) Non applyable:
Standard OmegaT does not allow search on source files
Reject
(date is stored but this version is not yet capable to use it in search)

If date/author are present: use them
Else: Option (see previous line)

Translation memories

Reject
(initially date and author were not stored,
so match was impossible. But now it should...)

Use filter (if filter is present,we use it)
If criteria is not present (null), reject the unit

If date/author are present: use them
Else: Option (see previous line)

Glossaries Ignore filter
(search is not unactivated, but filter is ignored)
Reject
(date is not stored, so the entry cannot match the criteria)

Option
(date is not stored, so if option is true filter is ignored, else entry is rejected)

Note: the name of the option is not actually considered as fully satisfactory. Don't hesitate to propose another one, if you have a better idea.

Add new comment