Using DéjàVu’s Lexicon

and the Build Lexicon function in particular

12 June 2002

Atril's DéjàVu is a tool for translators, also known as a translator's memory.

DéjàVu stores elements from translations in three different ways:

The MDB or Memory Database. This is for complete "segments", or sentences.
The TDB or Terminology Database. This is for terms, i.e. single words, or short combinations of words.
The Lexicon. Also intended for terms.

So at first sight the TDB and the Lexicon seem to be the same. But there are important differences:

The Lexicon is project specific. The TDB can be and should be shared by many projects. This makes the Lexicon useful when a client of subject requires special translations that are different from those of the same terms in other projects. The Lexicon then takes precedence: it is searched first, and if a term is found, it overrides the term that may be in the TDB.
The Lexicon can be filled - if you wish - using the Build Lexicon function. But it can also be filled - like the TDB - on the fly while translating. What is best depends on the project and on you, the translator.

In the MDB and the TDB - most experienced DV users recommend to use a single MDB and a single TDB, shared between all projects - segments and terms are flagged with the subject and client as defined for the project. It is important to set the project's subject and client before you import any files into the project. Using these flags, DV can pick the right term in case of multiple hits: it uses the one that fits best to the current project's settings for subject and client.
This means it is safe to send a client-specific Lexicon to the TDB after the project is completed, so its contents are available for re-use: the client who wanted this specific terminology is marked in the TDB, so for a later project for that same client these terms take precedence.

Filling the Lexicon

There are basically three ways to fill the Lexicon. They may of course also be combined:

Obtained from client.
Built on the fly.
Prepared using "Build Lexicon".

To build a Lexicon on the fly, while doing the translation, you translate a segment, and before continuing to the next segment, you mark a source term (mouse, or shift-(ctrl)-arrows) and corresponding term in the target windows (press Tab to get there, then mark using mouse of keyboard), and finally press Ctrl-F9. This is similar to sending a term to the TDB, using Ctrl-F11. In fact, you use the Lexicon as a project-specific intermediate stage to the TDB. Later on, you'll send the Lexicon to the TDB, but not just now.

Advantages of filling on the fly:

You decide what to put in the Lexicon, so you can judge how useful it is to put just that term in. The Lexicon won't contain any noise.
A Lexicon built in this manner is a useful end-result of a project, to yourself and - if so agreed - also to fellow translators in a team, and to the client.

Disadvantages:

Even though there are keyboard shortcuts, putting terms in the Lexicon this way requires rather a lot of manual work, which interrupts the natural flow of translating.
You can decide what to put in the Lexicon, but you also must. This can be difficult to judge. It's probably easiest if you're translating a subject that is rather familiar to you, but then, many terms will already be in the TDB.
To get the maximum effect, you should put the term in the Lexicon the first time it occurs. But it may be difficult to decide about the best translation just then. If you change your mind later, corrections will be necessary, to the Lexicon itself, but also to segments in which the term was used. DéjàVu helps you in doing this with the Check Terminology functions (under Tools), but even then it means extra work.

A different approach is to set up the Lexicon, not (only) while you translate, but before you start translating. DéjàVu helps you by featuring the Build Lexicon function, to be found in the menu under File / Lexicon / Build. This function takes all the words from the source files, sorts equal words together, and counts how many times they occur. It generates a lexicon containing only source words, not target words. You can however resolve the Lexicon with the TDB, which means terms already in the TDB from earlier projects are used to fill Lexicon target terms. This hardly seems useful at first sight, because if these terms are already in the TDB, DV will find them in there, even if no translation is available in the Lexicon. But seeing the term is already known from the TDB saves you the trouble to think of a translation in the Lexicon.
(You can even resolve the Lexicon with the MDB, but it doesn't make much sense, because it can cause a lot of fuzzy matches (short source term - longer target segment) that aren't useful in the Lexicon. Perhaps it is useful if you first switch the fuzziness off).
What is still untranslated after resolving with the TDB you must either translate yourself, or you decide the term doesn't belong in the Lexicon.

Advantages of using the Build Lexicon approach:

DV tells you how many times each term occurs. You can have the list displayed sorted frequent words first. That means you can concentrate your effort on frequent words, which will save you typing and will improve consistency, and ignore infrequent words.
When dealing with a subject you're not yet very familiar with, this can be a good way to "get into" the text. You look up frequent terms to see them in context, in order to decide upon the best translation. Thus you get a better idea of what the text is about, where problem areas lie, what requires research. It also helps limit the number of terminology decisions you prefer to redo later.

Disadvantages:

You'll see a lot of trivial words among the most frequent ones, like (assuming English as source languages) and, is, that, are, this, it, in, on. Depending on language pairs, it is not always useful to have such words in a Lexicon, because the translation will wildly vary with context. Where it is useful, you'll probably already have these in your TDB anyway.
Also, such trivial words don't look very good in a project- and subject-specific Lexicon that you'd like to share with others, like fellow translators or clients.
It is often useful to have DV generate terms consisting of more than one word, like max. three or four. (It takes some extra time, but a modern computer has hardly any problems with that). That not only generates some useful combinations, but also a lot of noise. Even if you ignore these useless terms, along with a lot of trivial "dictionary" words, you still have to look at them and decide whether to translate them in the Lexicon or not. You may even have to look up context to see if they are useless at all.
After being done you can easily delete untranslated terms: select segments with empty target, right mouse button, clear all translation, select all segments again. Or you can leave them in, with their empty target, in which case DV will ignore them.

Disclaimer: In this article I express only my own views. I have no business association with Atril other than having obtained a software license to use DéjàVu.