New filter for MS-Office formats

A new OmegaT filter is available for formats of Microsoft Office (DOCX, XLSX and PPTX). Based on same architecture as the XLIFF filter, it enables to add features too difficult to implement in original (filters3) filter, such as:

  • Correctly balanced tags (opening and closing tags), while older filter always produced empty tags
    Sample:
    • With old (filters3) filter:
    • With new (filters4) filter:
  • Tries to use the correct prefix (b for bold, i for italics, etc.), works almost in simple cases
  • Compact adjacent list of empty tags. But as you can see in previous sample, it does not fully replace tag wiper (which would have also compacted the adjacent <bN> tags)
  • Sets the language of the target document (older filter maintained it unchanged). Warning: works only if the language is qualified with country (for example FR-FR, not FR) else Word may misunderstand it
  • DGT-3.4 only: implement reformatter directly in the filter, rather than in a script.
    For the moment we keep the original method (script-based) in DGT-3.3 because we would like to have a way to compare the two implementations and ensure that the new one gives best results. Don't hesitate to make the test and send comments.

Let us know if you think that these features justify the development of a new filter, or if you prefer older filter. It has been developped to test the possibilities of new filters4 architecture for complex XML-based formats (filters4 instances are harder to read, but they are more powerful because filters3 hides lot ot things)

The package also improves treatment of SDLXLIFF files by using less memory, especially by ignoring the big native file in BASE64. However, this remains experimental, that's why the new filters plugin in release 2.0 should be considered as experimental.

In any case, please do not forget that this filter has been only tested in a small number of documents: consider it as a beta version in best case, so, test it but do not replace original filter immediately!!!

Available as a plugin for OmegaT 4/5, or integrated in new releases DGT-OmegaT 3.3 update 1 and 3.4 update 9.

 

 

Theme: 
OmegaT

Add new comment

Limited HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.