Hans Lenting wrote:
I've tried OmegaT's ... SRX file, to no avail.
Where did you get "OmegaT's SRX file"? I have OmegaT but there is no SRX file here. I know OmegaT's own segmentation file is based off of SRX, but how do you convert OmegaT's segmentation rules to an actual SRX file?
Could someone please provide a SRX file that offers the option to ignore TAB characters when segmenting?
It is my understanding that SRX does not segment on tab by default, which would mean that either tabs are indicated as break positions somewhere in your current SRX file, or your CAT tool automatically segments by tab before processing the SRX file.
In SRX (according to the 2008 specification), tabs are indicated using \t or \u0009. Does your SRX file specify a break at \t or u0009 at all?
There should be no splitting at the TAB characters.
When I get files like this, I replace the tabs with e.g. {{TAB}} (and mark them as internal text), then do the translation, and then afterwards replace the {{TAB}}s with actual tabs again. Ditto line feeds. Ditto line breaks inside tables, etc. You're very brave to fiddle with your CAT tool's advanced segmentation settings.
[Edited at 2020-02-27 12:20 GMT]