Match rates from translation memories and LiveDocs corpora

When memoQ looks for a segment in a translation memory or a LiveDocs corpus, it will not just find the exact same text: it will find slightly (or quite) different segments – and it will also find segments where not only the text is identical but its context as well.

When you use a match from a translation memory, you reuse an earlier translation – of the same text or a slightly different one. In most cases, you will need to look at the earlier translation and fix it, so that it will match your current source text better. In theory, you need to work less if the match is closer to the source text.

This is why memoQ gives a score to each translation memory match. The score is a percent value between 50% and 102%. This topic summarizes the different types of match rates and explains what each match type means in the text.

Exact match (100%)

The source text in the segment text is exactly the same as the match from the translation memory. The context of the segment does not match – or we do not know anything about it.

Nearly exact match (95%-99%)

The source text of the segment is exactly the same as the match, but there are slight differences: numbers, tags, punctuation marks and spaces might be different.

Fuzzy match (50%-94%)

The source text is similar to the source text in the match, but there are already differences in the text, too. From the point of view of the editing needed, we can talk of three classes of fuzzy matches. These are listed separately when memoQ analyzes the text with a translation memory and produces statistics.

▪High fuzzy (85%-95): In average-length or longer segments (8-10 words or more), normally there is a difference of one word. In pre-translation, this is what a good match means (by default, because that can be configured).

▪Medium fuzzy (75%-84): In average-length or longer segments (8-10 words or more), normally there is a difference of two words.

▪Low fuzzy (50%-74%): In average-length or longer segments (8-10 words or more), the difference is more than two words. In pre-translation, Any match means all sorts of fuzzy matches together, and they start at 50% (by default, because that can also be configured).

Note: If the segment text is shorter than average, the match rate number and the actual difference in the text may not correspond so clearly as described above.

Exact match with context or Context match (101%)

The source text in the segment text is exactly the same as the match from the translation memory. In addition, the context of the source text is also the same as the context that was stored in the translation memory.

▪In running text, the context is the source text of the previous and the next segment.

▪In structured (XML etc.) or tabular (Excel, CSV etc.) documents, the context can be an identifier label (ID).

Double context match (102% or XLT)

The source text in the segment text is exactly the same as the match from the translation memory. In addition, we have both types of context (the surrounding segments and the ID) in both the document and in the translation memory match, and they are also the same.

Context matches and double context matches can be used to reconstruct the exact same translation when a document or at least a part of it is exactly the same as the original document was.

I see a different match rate, why?

memoQ may show a different match rate than the text would suggest. On the one hand, memoQ takes the source of the match into account, not just the text itself. On the other hand, memoQ can correct the translation of matches in some special cases.

▪Penalties: memoQ will use a lower match rate if there is a penalty for a match. Penalties are set up in the settings of translation memories and LiveDocs. Here are the most frequent cases where memoQ may apply a penalty:

oThe match comes from a certain TM that we know to be unreliable.

oThe match comes from a certain user (translator) whom we consider unreliable.

oThe match originally comes from the alignment of two documents, where we cannot be certain that the alignment links were fully reviewed.

oThe match comes from a LiveDocs corpus where the corpus is not confirmed to be fully reviewed.

oThe match comes from a document or document pair in a LiveDocs corpus where the alignment of the documents is not confirmed to be fully reviewed.

▪Automatic adjustments: memoQ will substitute numbers if it can, so you might receive an almost-exact match where a number was originally different but it was fixed by memoQ, so that you already see the correct number in the translation.

▪Patching: In the translation of a segment, memoQ will adjust tags and even the text from term base matches when possible. When this happens, memoQ will also boost the match rate for the segment, mostly to match the categories suggested under Fuzzy matches. When a match is boosted this way, the percent value will be preceded by an exclamation mark (!).