This feature is integrated in the core of DGT-OmegaT, but also available as a plugin which is compatible with standard OmegaT, almost tested with 3.6 (but in Java 8), 4.3 or 5.8. The plugin has no aditional dependancies, except the StaX API which is normally already present in Java 8 (and actually also in later versions, almost until Java 17, but we cannot ensure it for the future).
In both cases, once installed you will see the menu Options => File Filters, but you may have to activate them manually:
This introduces totally new filters for XLIFF 1.2, 2.0 and SDLXLIFF. These filters are neither based on Okapi's filters, nor on the actual OmegaT XLIFF filter. OmegaT's filter is based on their high-level API named filters3/XMLFilter, with which it is easy to write a new filter but difficult to add complex features: that is probably the reason why it is still monolingual. Instead, we based our work on filters2/AbstractFilter. But since it is possible to create other XML filters based on it, we introduced a new intermediate class named filters2/AbstractXMLFilter which implements XML parsing based on Java StAX. This is not so high-level as filters3, for that reason the source code of such filters are probably harder to read, but we keep all possibilities of the original AbstractFilter, such as bilinguism, and we can also parse complex XML formats like Microsoft's OpenXML (actually in test version, alost working but with some known bugs) in a less generic way. |
Note: some additions have been done in the filters2/filters3 API in order to implement, for example, the conversion of OmegaT's notes (for which no filter until now did write anything to output files) into SDLXLIFF comments. That is the reason why the plugin does not offer all features when used in standard OmegaT. If the core team is interested in these features, they should be easy to port to OmegaT 4.
More details, with schema, in the technical document.
One thing the core filter(filters3/XliffFilter) had while Okapi did not, is the possibility to decide about tag id (i.e. to use <b> for bold or <i> for italics) when the file follows XLIFF conventions. This was not possible to do it via script, but the StaX filter re-used the same algorithm in XLIFF 1 and an equivalent in XLIFF 2.
But as usual, SDLXLIFF uses its own way to specify the role of a tag. Fortunately, we succeeded to implement a specific algorithm for SDLXLIFF, almost if the original file is DOCX. Again compare (in bold, what is better compared to previous option):
Okapi filter alone | Okapi filter + Perl renumerotation | StaX filter |
Sample <g18>Text</g18> <segment 0010> <g18>Texte</g18> d'exemple <end segment> Sample <g24>Text</g24> <segment 0015> <g24>Texte</g24> d'exemple <end segment> |
Sample <g0>Text</g0> <segment 0010 (+ 1 more)> <g0>Texte</g0> d'exemple <end segment> Sample <g0>Text</g0> <segment 0015 (+ 1 more)> <g0>Texte</g0> d'exemple <end segment> |
Sample <b0>Text</b0> <segment 0010 (+ 1 more)> <b0>Texte</b0> d'exemple <end segment> Sample <b0>Text</b0> <segment 0015 (+ 1 more)> <b0>Texte</b0> d'exemple <end segment> |
Reads <source> or <seg-source> as source segment Reads <target> as auto-populated translation Tag is unique at document-level Segments are not recognized as identical Tag type is g (as in xliff) |
Reads <source> or <seg-source> as source segment Reads <target> as auto-populated translation Tag is reset to 0 at each paragraph Segments are recognized as identical Tag type is g (as in xliff) |
Reads <source> or <seg-source> as source segment Reads <target> as auto-populated translation Tag is reset to 0 at each paragraph Segments are recognized as identical Tag type is detected (here bold), in some cases |
This feature works perfectly in the plugin for standard OmegaT (4 or 5). However, warning, it makes the plugin incompatible with Okapi filter: if you open with StaX filter a project which has been translated with Okapi filter, you will loose translation for segments containing tags, because they are not anymore 100% matches. This is not a bug, but an incompatibility between filters: doing the contrary (translate with StaX and open with Okapi) would give same result and until now nobody told them that there was a bug in their side.
When you use bilingual formats in OmegaT, and when a segment has a translation in the source file, OmegaT is already capable of retreiving it, but it always sets author = unknown and no date.
DGT-OmegaT 3.2 has capacity to get author and date if it is registred in the source file: this is the case for SDLXLIFF files only (standard XLIFF does not have an author nor date attribute), not in PO files. Example:
Before | After |
Last modified by unknown Sample <g0>Text</g0> <segment 0010> <g0>Texte</g0> d'exemple <end segment> |
Last modified by cordoth on 26-janv.-2016 at 17:06:46 Sample <g0>Text</g0> <segment 0010> <g0>Texte</g0> d'exemple <end segment> |
Note that, due to the fact that it implies some changes in the core OmegaT classes, this feature is not available via the plugin for OmegaT 4.
Add new comment