PDF (Portable Document Format) files
memoQ can import PDF files. On its own, memoQ can open them as plain text, or convert them into DOCX first, and imports the DOCX file.
To make sure all PDF documents are imported successfully, even if they have text in images: Use the TransPDF service. You can choose to use TransPDF in this Document import settings window. Before you do this, you need to register with TransPDF and save your TransPDF account in memoQ.
You need to pay for TransPDF: TransPDF is not free. After you register, you can produce 25 pages of translated PDF for free, but you need to pay for the rest. TransPDF will charge you after the number of the final, translated pages that you export. So, the PDF will be imported for free, and you pay when you export the finished work.
If you do not use TransPDF but rely on memoQ to import PDF documents, you need to live with these limitations:
- Can't export PDF: If the source document is PDF, memoQ exports the translation in plain text or in DOCX, depending on the method of the import.
- Can't import password-protected PDF files. You need to supply the password to TransPDF, too.
- Can't import scanned PDF files: Without TransPDF, memoQ doesn't extract text from scanned PDF files, where the pages are saved as images and not as text. To translate these documents, run them through a page reader program such as Nuance OmniPage or ABBYY FineReader (PDF Reader). These programs save well-formed DOCX files where the text flow and the formatting is retained as much as possible. Or, use TransPDF, it is probably cheaper than these two.
- Text may become garbled: PDF is not a text format. Normally, it doesn't try to preserve the text flow. As a result, some of the text may be missing or may appear in the wrong order when you import a PDF into memoQ. When this happens, run the documents through a page reader program such as Nuance OmniPage or ABBYY FineReader (PDF Reader). These programs save well-formed DOCX files where the text flow and the formatting is retained as much as possible. This may happen with TransPDF, too, although it is less likely.
How to get here
- Start importing an Portable Document Format (PDF) file.
- In the Document import options window, select the PDF files, and click Change filter and configuration.
- The Document import settings window appears. From the Filter drop-down list, choose PDF (Portable Document Format).
What can you do?
Normally, memoQ imports PDF files by converting them into Word documents (DOCX) first. This keeps most of the formatting from the original PDF file. However, memoQ cannot always import PDF files with this option. For example, if the PDF document contains text in images, memoQ will not recognize the text in there.
To make sure the PDF documents are imported successfully, even if the text appears in images: Click the Import through TransPDF radio button. Before you do this, you need to register with TransPDF and save your TransPDF account in memoQ.
You need to pay for TransPDF: TransPDF is not free. After you register, you can produce 25 pages of translated PDF for free, but you need to pay for the rest. TransPDF will charge you after the number of the final, translated pages that you export. So, the PDF will be imported for free, and you pay when you export the finished work.
If you need the plain text only, click the Import by converting to Plain Text radio button. This is not recommended, though. Don't use plain text to import documents for translation. You can use plain text when you import documents into a LiveDocs corpus, either on their own or for alignment.
When you import a PDF as plain text, all formatting is lost.
Plain-text import has no settings: If you import a PDF document as plain text, there are no more settings.
Export file will be the same format as the import method: memoQ can't export a PDF file. If you export a PDF as DOCX, memoQ exports a DOCX file. If it is imported as plain text, memoQ exports a plain-text file.
Normally, memoQ imports PDF documents by converting them into Word documents (DOCX) first.
To set up how the PDF is converted into a Word document (DOCX), use these options:
Under Conversion mode, choose what to preserve: the text flow (the order of the text), or the formatting.
- If text flow is more important - and you can afford losing some of the formatting -, click the Text flow conversion (might slightly change formatting) radio button.
- If keeping the formatting is more important - and you can afford losing some of the text -, click the Attempt to keep formatting (some text bay be lost) radio button.
Under Conversion options, you can set the character spacing and the bulleted lists in the converted Word document.
- To set the character spacing: Check the Specify relative horizontal proximity (between 0 and 1) check box, and enter a number between 0 and 1. The number 1 means that each character occupies the space that the font size specifies. If the number goes below 1, the characters get closer to each other. Normally, you don't need to change this setting.
- To recognize bulleted lists: Check the Recognize bullet point symbols check box. Normally, memoQ doesn't do that. If you check this, bulleted lists in the PDF documents will become bulleted lists in the resulting Word document. In memoQ, this means that there will be no extra symbols at the beginning of bullet points.
On the DOCX options tab, you can control how memoQ imports the converted Word document (DOCX) file.
To learn more about these settings: See the topic about Microsoft Word 2007 and higher (DOCX).
When you finish
To confirm the settings, and return to the Document import options window: Click OK.
To return the Document import options window, and not change the filter settings: Click Cancel.
If this is a cascading filter, you can change the settings of another filter in the chain: Click the name of the filter at the top of the window.
In the Document import options window: Click OK again to start importing the documents.
memoQ doesn't import PDF directly
memoQ relies on external modules that help importing PDF documents. These modules are installed with memoQ, but come from other software makers.
To convert PDF documents into Word (DOCX), memoQ uses Aspose.PDF. To learn how this is done: See the developer's web page.
To convert PDF documents into plain text memoQ uses xPDF. Xpdf copyright © 1996-2009 Glyph & Cog, LLC.