Translation Memory

Translation Memory is a concept widely accepted by translators. It is based upon observation that translators spend lot of time translating very similar texts, i.e. doing repetitive taks which is something computers are usually better in. This is especially true when translating programs' UI (which is probably by far the most common application for poEdit). Translation memory remembers all past translations for you and can retrieve them later, when you're translating something similar. This is important property of TM database: it is organized in such way that it is relatively fast to extract translations from sentenses that differ in one or more words from it.

Setting up TM

The very first thing you have to do in order to be able to use TM is to setup the database, This can be done in File/Preferences dialog, on the Translation Memory tab.

Here, you can set where to store the database (most users won't need to change the default value), languages you translate to (in the control called My Languages). Press "Add" to add new language. Languages are identified with their ISO 639 two-letter codes.

Next, click on "Generate database" button and fill-in search paths. These are directories where poEdit will look for existing catalogs and will build personalized TM from them. poEdit can extract translations from files of three formats: PO files (as used by poEdit), their compiled version, MO files, and RPM packages (this feature is Unix only). It will search not only the directories you entered but all subdirectories as well. The most common way of filling the database is pointing poEdit to /usr/share/locale and /usr/local/share/locale directories. (Windows users: just copy these files from some friendly Unix box.) Alternatively, you may put your Linux installation CD into drive and scan RPMs in /mnt/cdrom (of course, this only applies to RPM-based distros).

If you decide to add your own directories to the search, it's important to understand how the lookup works. poEdit builds one database per language (choosen by you) and so it has to recognize catalog's language somehow. There's unfortunately no way of telling the language of PO or MO file, because gettext searches catalogs based on their name. This is what poEdit does, too. Make sure all catalogs you want to scan match one of these wildcards (this is just an example, substitute "cs" with any ISO 639 code, "CZ" with any country code, "po" with "po" or "mo", "foo" can be replaced with anything):

cs.po
*/cs.po
*/cs/LC_MESSAGES/foo.po
*/cs/foo.po
*/cs_CZ.po
*/cs_CZ/LC_MESSAGES/foo.po
*/cs_CZ/foo.po
Be prepared that scanning takes a while.

Configuration section contains various parameters that affect TM's capabilities. Max. number of missing words and Max. difference in sentence length are self-explanatory. They are parameters for database retrieval function and the higher these values are, the more matches DB queries return and the less exacts these results are. Automatically translate when updating catalog tells poEdit to attempt to translate all new strings gathered during catalog update. Such automatically translated strings are marked with a gray computer icon.

Using TM

If you enabled the option mentioned above, TM will be used when updating catalogs. This is not always optimal - for example, you might decide not to use update feature of poEdit at all or the suggested translation was wrong and you want to try other possibilities (as if no exact match is found, TM usually returns several rough translations from that you can choose). To get access to all rough for currently selected string in translations list (poEdit's main window), simply right-click the item. Popup menu will contain list of translations obtained from TM. Don't panic if there's no translation, it means that the database does not contain anything related.

Whenever you save catalog, all modified entries are stored into your TM database, together with their translations. Next time you use it, TM will know them. This approach provides seamless TM actualization and adaptation for the specific domain you work in.