Interaction with Trados Studio

Inside DGT, most translators work with Trados Studio, only a minority is using OmegaT. The situation where one translator using OmegaT needs interation with another using Trados is becoming more and more frequent. The following extensions are recent developments enabling to OmegaT users to work better with files provided by Trados Studio.

A more detailed, more technical description of all of this is in the dedicated document.

View source/target for sdlxliff files

We already described this feature as it worked for DGT-2 here. In DGT-OmegaT 3.1, whose main features were integration with Trados Studio, the general idea is to consider that when the source files are in SDLXLIFF format, then "view source" or "view target" should not display the SDLXLIFF file (which would open in Trados Studio, if you have it) but the file in the native format. More concretely:

  • View source: If the source is in sdlxliff format but the corresponding project is not present in Studio, now it has the capability to extract the file which is encapsulated in <internal> markup, in Base64 format. The file is extracted into directory "source-native", and "View Source" opens this file, not the SDLXLIFF.
    In DGT-OmegaT 3.2 update 6, this extraction can also be done in command-line, which can be useful for those who use SDLXLIFF files with other CAT tools without having Trados Studio at all.
  • If really you wanted to open the SDLXLIFF in Studio, a new menu "Open in Trados Studio" is available (grayed if the file is not SDLXLIFF). This menu will first search if the Trados project exists, and if yes, compile the target SDLXLIFF, copy it to the project and open it instead.
    Warning: In such a case, DGT-OmegaT ensures synchronisation from OmegaT project to Trados project, but not in the reverse direction! This option is mainly useful to check that the generated SDLXLIFF is correctly treated by Trados Studio: you should not edit it, or you manage manually the fact to copy your changes to OmegaT! Now release 3.7 implements an experimental approach to synchronize in reverse direction.
  • View target file is now capable to create a small Trados Studio project, and to run Studio's "create translated document" command, then to display the result in the native format.
    To make it work, a small batch exe file has to be installed inside the directory of Trados Studio. As it depends on Trados API, contrarily to cross-compilation, it can only work if Studio is correctly installed in your PC, meaning that this can work only under Windows.
    Advantage against cross-compilation is that you obtain the same result as users of Studio, you don't have problems of segmentation and even segments with tags are correctly translated. But compliant with Studio means, also, bug for bug : an SDLXLIFF file which has problems under Studio (for example with links) will have exactly the same problems using this executable (because it does nothing more than calling Studio API), while it could eventually work with cross compilation.
    In any case you have the possibility to configure if you want or not to use cross-compilation as a fallback when this operation fails. Unfortunately we cannot put in the result file the information about how the document has been generated (if direct compilation worked or if cross-compilation was used), because there would be too many file formats (natives) to consider. So we implemented as an option the possibility to add in the name of the generated file a small string saying which algorithm was finally used. Then, if you open the file in the native tool (such as Word) you should probably see this information in the window's toolbar.
  • Last but not least, all these possibilities, initially designed for view source or target, now work also for any kind of project compilation, even if done in background.

 

Renumerotation

The XLIFF filter provided by OmegaT in the core does only support files where <target> is filled with the original version. SDLXLIFF files do not, by default, follow this definition.

The filter provided by Okapi is a true bilingual file filter, making the distinction between <source> and <target> markups from the file.  But it has another inconvenience. When you have tags inside a segment, for example <x id="158" />, then OmegaT will see <x158/>. And SDLXLIFF, in particular, always contain a unique number for each tag.

Compare (in green the feature we like, in red the one we don't like):

OmegaT filter Okapi filter

<g0>Texte</g0> d'exemple
<segment 0010 (+ 1 more)>

<end segment>

<g0>Texte</g0> d'exemple
<segment 0015 (+ 1 more)>

<end segment>

Sample <g18>Text</g18>
<segment 0010>
<g18>Texte</g18> d'exemple
<end segment>

Sample <g24>Text</g24>
<segment 0015>
<g24>Texte</g24> d'exemple
<end segment>
Reads <target> as if it was the source
Segment appears as untranslated

Tag is reset at paragraph level
Segments are recognized as identical

Tag type is g (as in xliff) in sdlxliff (not in some XLIFF files)
Reads <source> or <seg-source> as source segment
Reads <target> as auto-populated translation

Tag is unique at document-level
Segments are not recognized as identical

Tag type is g (as in xliff)

As a consequence, it is virtually impossible to have a 100% match from an external TMX file, because the numbering inside the XLIFF will always be different from the numbering inside the TMX. Even the SDLXLIFF containing repetitions (or previous version of the same document) would have different numbers! Even worse, segments from tm/auto will be totally ignored, because tm/auto works only with 100% matches!

Ticket has already bee submitted to Okapi core team, but nothing happens for the moment.

To avoid this problem, two approaches have been tested. Both are available not only inside DGT-OmegaT since release 3.1, but also as separate packages you can install into standard OmegaT (compatible almost with OmegaT 2.6 or later), that's why we detail them in dedicated page:

1. Renumerotation via scripts

2. New filters plugin []

Common features in both approaches:

  • Show in OmegaT a document where tag numbers are paragraph-based, reinitialized to zero at each paragraph (that is the main role of these developments)
  • Status conversion: segments which did not have the "Translated" status in original SDLXLIFF but have been modified by DGT-OmegaT receive "Translated" status in target file, in SDLXLIFF format (if the source is SDLXLIFF, else in standard XLIFF format)
    In DGT-OmegaT 3.2 update 8, this also works for segments which did not have status before, which is the case when you start from a fresh untranslated SDLXLIFF project. Older versions, including 3.1, can only change existing status (for example from Draft to Translated)
    In DGT-OmegaT 3.4, if you are in Revision mode, each segment which has revised status in OmegaT will receive status ApprovedTranslation in SDLXLIFF.
    Also DGT-OmegaT 3.2 update 8 sets correct author and date (taken from project memory) for all segments which have been modified via DGT-OmegaT, and keeps values for other segments intact.
  • Pseudo-tags conversion: DGT-OmegaT's pseudo-tags, which initially worked via an extra Perl script only for DOCX, are now usable in SDLXLIFF; during compilation, they will be converted to tags which Trados can understand
  • SDLXLIFF note/comments conversion :
    • When you open an SDLXLIFF file containing "comments" visible in Studio, these comments are readable in the "Comments" pane (or Segment Properties in OmegaT 4)
    • When you produce the final SDLXLIFF file, the notes which are present in the project memory (project_save.tmx) are converted into comments readable by Trados Studio (only in DGT-OmegaT, not in the plugin)

Some more features, compared to other OmegaT filters, have been added only to the StaX filter, they have no equivalent for other filters and cannot be realized using renumerotation:

Re-segmentation of SDLXLIFF files

In Trados, segmentation rules are stored only in translation memory files (SDLTM), which are in reality SQLite databases. When you open a project, the rules stored in the first memory are applied, or if none is available, the default rules which are hardcoded and not available. In such conditions, getting Trados rules, even modified by a team inside our organisation, and produce something usable for another CAT tool, is very difficult.

As if things were not so complicated, when you create a project with Trados, the SDLXLIFF file is not segmented! You will have one translation unit per paragraph, that is usual, but the <seg-source> markup will not correctly be filled using their own rules, unless you open the file in Studio, and edit almost one segment.

All existing filters (OmegaT's original one, Okapi, and ours) are capable of using segmentation from <seg-source> if exists, in which case you should unactivate "sentence-level segmenting". But if it is not here, either you work by paragraph or you activate "sentence-level segmenting"... but then it works using OmegaT's rules, and you share one of the biggest advantages of using SDLXLIFF!

To solve this problem, the command-line executable has been modified to be able to call any of the "tasks" provided by Trados API. Then, the task "Analyze", which is not executed when you create a project, can be called in order to produce a correctly segmented SDLXLIFF file. This can be done if you use the Wizard (last version in Java only), or alternatively, we also provide a Groovy script which does nothing more than calling the command-line. In any case, if you want to use it, don't forget to install the command-line executable - which means, that you must have Trados installed - and update the script to tell where it is.

Add new comment