PDF (Portable Document Format) files
memoQ can import PDF files. On its own, memoQ can open them as plain text, or convert them into DOCX first, and imports the DOCX file.
The PDF format is not designed for translation. It is more similar to a printed paper. Whenever possible, work on the original document that was converted into PDF. Then you can convert the translated document to PDF.
The TransPDF service is no longer available in memoQ 10.4 and later.
When you import PDF documents into memoQ, remember that:
- Can't export PDF: If the source document is PDF, memoQ exports the translation in plain text or in DOCX, depending on the method of the import.
- Can't import password-protected PDF files.
- Can't import scanned PDF files: memoQ doesn't extract text from scanned PDF files, where the pages are saved as images and not as text. To translate these documents, run them through a page reader (OCR) program such as Nuance OmniPage or ABBYY FineReader (PDF Reader). These programs save well-formed DOCX files where the text flow and the formatting is retained as much as possible.
- Text may become garbled: PDF is not a text format. Normally, it doesn't try to preserve the text flow. As a result, some of the text may be missing or may appear in the wrong order when you import a PDF into memoQ. When this happens, use an OCR program (see above).
How to get here
-
In the Document import options window, select the PDF files, and click Change filter and configuration.
-
The Document import settings window appears. From the Filter drop-down list, choose PDF (Portable Document Format).
What can you do?
Normally, memoQ imports PDF files by converting them into Word documents (DOCX) first. This keeps most of the formatting from the original PDF file. However, memoQ cannot always import PDF files with this option. For example, if the PDF document contains text in images, memoQ will not recognize the text in there.
If you need the plain text only, click the Import by converting to Plain Text radio button. This is not recommended, though. Don't use plain text to import documents for translation. You can use plain text when you import documents into a LiveDocs corpus, either on their own or for alignment.
When you import a PDF as plain text, all formatting is lost.
Plain-text import has no settings: If you import a PDF document as plain text, there are no more settings.
Export file will be the same format as the import method: memoQ can't export a PDF file. If you export a PDF as DOCX, memoQ exports a DOCX file. If it is imported as plain text, memoQ exports a plain-text file.
Normally, memoQ imports PDF documents by converting them into Word documents (DOCX) first.
To set up how the PDF is converted into a Word document (DOCX), use these options:
Under Conversion mode, choose what to preserve: the text flow (the order of the text), or the formatting.
-
If text flow is more important - and you can afford losing some of the formatting -, click the Text flow conversion (might slightly change formatting) radio button.
-
If keeping the formatting is more important - and you can afford losing some of the text -, click the Attempt to keep formatting (some text bay be lost) radio button.
Under Conversion options, you can set the character spacing and the bulleted lists in the converted Word document.
-
To set the character spacing: Check the Specify relative horizontal proximity (between 0 and 1) check box, and enter a number between 0 and 1. The number 1 means that each character occupies the space that the font size specifies. If the number goes below 1, the characters get closer to each other. Normally, you don't need to change this setting.
-
To recognize bulleted lists: Check the Recognize bullet point symbols check box. Normally, memoQ doesn't do that. If you check this, bulleted lists in the PDF documents will become bulleted lists in the resulting Word document. In memoQ, this means that there will be no extra symbols at the beginning of bullet points.
On the DOCX options tab, you can control how memoQ imports the converted Word document (DOCX) file.
To learn more about these settings: See the topic about Microsoft Word 2007 and higher (DOCX).
When you finish
-
To confirm the settings, and return to the Document import options window: Click OK.
In the Document import options window: Click OK again to start importing the documents.
-
To return to the Document import options window, and not change the filter settings: Click Cancel.
-
If this is a cascading filter, you can change the settings of another filter in the chain: Click the name of the filter at the top of the window.
memoQ doesn't import PDF directly
memoQ relies on external modules that help importing PDF documents. These modules are installed with memoQ, but come from other software makers.
To convert PDF documents into Word (DOCX), memoQ uses Aspose.PDF. To learn how this is done: See the developer's web page.
To convert PDF documents into plain text memoQ uses xPDF. Xpdf copyright © 1996-2009 Glyph & Cog, LLC.