1.At the top of the Document import settings dialog, click Add cascading filter. In the Add cascading filter dialog, choose the Regex tagger:
2.Click OK. The Document import settings dialog now displays the options for the Regex tagger:
3.Add three rules to this dialog. First, we add a rule to replace opening XML tags that look like this: <tag_text>. Fill in the Regular expression, Tag type, and Display text fields as follows, and then click Add after each rule: Regular expression: <[^/]*?>
We need to find the shortest possible character sequences that start with <, end with >, and have no forward slashes inside. The [^/] pattern matches any character but the forward slashes, and the *? pattern tells memoQ that we need this repeated, but we need as few as possible. This way memoQ will stop matching at the first > character.
Note: If we used the more standard * character to look for repetitions instead of *?, memoQ would look for the longest possible match, and stop matching at the last > character in the text, possibly incorporating multiple XML tags and a good deal of text as well.
Tag type: Open
Display text: $0
This will copy the tag text into the inline tag that memoQ displays.
4.Next, we need to replace closing tags that look like this: </tag_text>: Regular expression: </[^/]*?>
This is very similar to the previous one, but we look for character sequences that start with </ instead of <. Inside the tag text, we still prohibit forward slashes.
Tag type: Close
Display text: $0
5.Third and last, we need to cover empty tags that look like this: <tag_text/> Regular expression: <[^/]*?/>
Tag type: Empty
Display text: $0
When the rules are set up, the Document import settings dialog should look like this:
|