DGT-OmegaT 3.4-TEST-4.0 and 3.5-DEV-5.0 published

Download: 3.4-TEST-4.0, DGT-3.5-DEV-5.0

These changes should imply less memory usage, but except of that, they should be totally transparent for the users. If it does not seem to be the case, download production release (3.3), compare and tell us.

Browsable memory: use cursors

When a memory is browsable, that means that it is possible to read all translation units, from the beginning to the end. This is used by the tm/auto feature in DGT, but also for search in some cases.

Previous release used an Iterable for that: contrarily to a collection, this means that data are not necessarily in the RAM. That is a good thing because we have users who do not hesitate to add gigabytes of TMX files in the tm/ folder and ask us why it does not work...

However, using an iterable prevents from having all in the RAM, but it does not prevent creation of an object for each iteration. Even if this object is immediately put in the garbage collector after displaying the result, this is not neutral in terms of memory usage and even in CPU time, because creation of objects can take time!

Now we try a new logic based on cursor. A cursor does not really contain the data even for the current row: each field is loaded in memory only when it is used. The famous jave classes ResultSet (JDBC) or XMLStreamReader (StaX) are cursors, and inspired to me this idea.

Project Memory

Distant project memories are migrated to cursors as well. Not only it can save memory but we also hope that this new API is simpler to understand for those who would like to implement a new provider.

Note: since it does not yet have a test and production environment, Silvestris Cyclotis project is not yet migrated to new API, but we actively work on it. Normally it should change nothing for the user, but see it as a good occasion to test that it is really the case.

DGT-3.5-DEV-5.0 only: use concurrent maps

OmegaT is highly multithreaded application: each time you go to a segment there are simultaneous searches in the TMX files, in the MT...

To avoid concurrency errors (for example when you modify an ongoing segment but a search is actually done) the initial strategy used by OmegaT developers was to do a non-depth copy of the full maps before starting a search. This can take time and memory.

Here I try a new strategy: use the maps from package java.util.concurrent, which are normally specially designed to enable such simultaneous searches and modifications without the need to duplicate any data. I can suppose that developers of the JDK have thought about the best strategy to have concurrency. That said, I am not absolutely sure that I did all the necessary changes to make it working well, that is the reason why I leave it in DEV only for the moment.

DGT-3.5-DEV-5.0 only: adopt StaX filters 2.1

Changes for StaX filters described here are also applied to branch 3.5-DEV.  Since StaX filters 2.0 are in 3.4-TEST only (prod still uses 1.1) we volontarily do not apply this change in TEST right now, in order to make comparison easier.


Add new comment

Limited HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.