snowball vs. lucene tokenizers
Thread poster: Deborah Kolosova
Deborah Kolosova
Deborah Kolosova  Identity Verified
United States
Russian to English
+ ...
Nov 3, 2011

The instructions on the OmegaT site for installing tokenizers say you should select the appropriate tokenizer from the list. For my source language, Russian, there are two listed: the SnowballRussianTokenizer and the LuceneRussianTokenizer. What is the difference, and which one is the best to use? Or do they each have their own advantages?

 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 08:26
Russian to English
+ ...
I use lucene Nov 3, 2011

My recollection of past discussions is that lucene has a "stop word" function that snowball does not (meaning it ignores little irrelevant words like "and" and "the" when matching segments). Someone will probably correct me if I'm wrong. You can try them both and see what you like.

I translate from Russian, and lucene works great for me.

Susan


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


snowball vs. lucene tokenizers






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »