Segmentation rules

When you add a translation document to a project, it is segmented during the import process. Segmentation is a process of splitting the text in a translation document into segments, or translation units. The translator's job is then to create a translation for every unit, establishing a translation pair (consisting of the original segment, called the source segment, and its translation, called the target segment) that can be stored in a translation memory.

By default, the segment boundaries are punctuation marks. In most cases, a full stop signifies the end of a sentence, and a sentence is usually a meaningful unit with a possible translation. However, there are exceptions. It may, for example, happen that a full stop is not followed by a new sentence, e.g. in the case of an ordinal number in some languages.

The segmentation is defined by segmentation rules which are specific to each source language. For memoQ, segmentation rules are resources that can be edited. memoQ comes with pre-defined segmentation rules that should be sufficient in most cases. However, you can edit and share segmentation rules, which increases the flexibility and the accuracy of memoQ.

Important: When you are migrating from another tool, memoQ offers support for the SRX (Segmentation Rule eXchange) 1.0 standard to provide 100% compatibility. This ensures a perfect recycling of translation memories. You can import SRX files if you select the resource and click Edit.

Note: memoQ's segmentation is based on regular expressions – an extremely powerful tool for working with text. Among other things, regular expressions make it easy for users (translators) to customize the segmentation rules.