Chaining filters: processing document in document
After a filter processes a document, you can run another filter to further process the text. For example, cells in an Excel workbook might contain HTML markup. In this case, you can apply the HTML filter to the contents of the cells, turning HTML markup into sensible inline tags.
Or, a plain text filter may contain lines in the 'name=value' form, where the 'name=' part should not be touched. In this case, you can apply a Regex tagger to turn the 'name=' parts into inline tags.
To run another filter after the first one, click the Add cascading filter... link below the Filter configuration box. The Add cascading filter dialog appears.
Important: You cannot specify a second filter if the first filter returns inline tags. You can still use the XML filter as the first filter, but the second filter will fail if the XML filter returns at least one inline tag.
Note: In some cases, XML- and HTML-based filters (including .DOCX) represent whitespace characters (tabs, newlines) with inline tags. However, if there is a second filter, the first filter will pass on the whitespace characters as they are, and these characters will be replaced with inline tags only after the second filter processed the text.
Note: This does not apply to MS Office documents that contain embedded objects or documents – memoQ still does not import the embedded objects.
Note: If you have an XML -> HTML filter chain, and the XML filter imports context or comments, that information is lost. Only the Regex Tagger keeps such information from "previous" filters from a previous filter.
Cascading filters vs. encoding
If you attach the HTML filter, the Regex Text filter, etc. as a second or further filter in a cascading filter chain, then you can change the import/export encoding options for that second or third filter. But if you do change the encoding, you might get export problems. memoQ relies on the content to be passed on in the chain (from filter 1 to filter 2 at import and back at export) in UTF-8 encoding, which is the case if you do not change the import/export encodings. But if you change the encoding in the 2nd. etc. filter, the export may fail with corrupted characters.
Therefore, we recommend not to change the encoding options in cascading filters for the second, third, etc. filters in your filter chain. You can change the encoding in the first filter of the filter chain.