Statistics
In the Statistics window, you can count the words, segments, and characters in the project or in the selected documents. You can analyze them against the translation memories and the LiveDocs corpora, and find out how much you actually need to work, considering the resources you already have.
You use the Statistics command to find out how much you will need to bill for this project - again, based on the analysis.
This is the manual way: In project templates, memoQ can run an analysis report automatically.
This is the fine-tuned way: In the project overview, you can run quick analysis reports on the entire project, with fewer settings.
How to get here
- Open a project. Or, create a project and import documents.
- You may open a document for translation - if you need to analyze part of a document.
- In the Documents ribbon, click Statistics.
The Statistics window opens.
Requires memoQ project manager: You need the project manager edition of memoQ to manage online projects.
You need to be a project manager or an administrator: You may manage online projects only if you are a member of the Project managers or Administrators group on the memoQ server – or if you have the Project manager role in the project.
- Open an online project for management. Or, create an online project and import documents.
- In the memoQ online project window, choose Translations.
- Under the document list, click Statistics.
The Statistics window opens.
What can you do?
A scope tells memoQ which documents to look at. You have the following options - choose one radio button:
- Project: memoQ analyzes all segments in all documents of the current project. If the project has two or more target languages, memoQ will check segments in every target language.
- Active document: memoQ analyzes all segments in the active document. The active document is the one you are looking at in the translation editor. You can choose this only if you are working on a document in the translation editor.
- Selected documents: memoQ analyzes all segments in the selected documents. You can choose this only if you select several documents in Translations under Project home. It doesn't work when the translation editor is open.
- From cursor: memoQ analyzes segments below the current position in the active document. The active document is the one you are looking at in the translation editor. You can choose this only if you are working on a document in the translation editor.
- Open documents: memoQ analyzes all segments in every document that is open in a translation editor tab.
- Selection: memoQ analyzes the selected segments in the active document. The active document is the one you are looking at in the translation editor. You can choose this only if you are working on a document in the translation editor.
- Work on views check box: Check this to make memoQ go through segments in the views in the current project. You can choose this only if there is at least one view in the project.
To analyze segments in just one target language: Before opening the Statistics window, choose a language on the Translations pane of Project home. Select all documents, then open Statistics, and choose Selected documents.
Compared to translating the text from scratch, you need to work less on a segment if there's a translation memory match - in theory. Before you start working, you need to have an idea how much that work will be. Especially when the job is about upgrading the translation of some documents to newer versions. In this case, the actual translation work can be as little as 10% (or less) of the total word count of the source documents - because for much of the text, you can use the translation from before.
memoQ gives you the word counts grouped by match categories: you'll know how many words are in segments that have 100% matches; how many have 95-99% matches etc.
For each match category, you can have a weight. The weight is between 0% (no work at all) and 100% (translating every word from scratch). You multiply the word count in a category by the weight. Then you'll have a theoretical, weighted, word count.
Example: If a segment has a 90% match, that usually means a difference of one word. Your weight for that match category (85-94%) could be 50%. If the segment is 10 words, you count 5 words for that segment.
Counting this makes sense only if your project has at least one translation memory or LiveDocs corpus.
To set this up for yourself in the Create analysis report window, use these options:
These settings are for a local project.
- Check the Use project TMs and corpora check box. This makes sure that memoQ checks the segments in every translation memory and LiveDocs corpus in your project.
- Check the Calculate homogeneity check box. memoQ counts internal fuzzy similarities, too. That is, memoQ predicts what matches you will get during translation while your translation memory is filled up.
Use this only if you really work on this job alone: Don't check the Calculate homogeneity check box if others will also work on this translation.
- Clear the Include locked rows check box: If you received documents that contain locked rows, most of the time your client means that you mustn't touch those.
- Make sure that the Repetitions take precedence over 100% check box is checked: Use this option if a consistent translation is more important than using all possible matches from the translation memory.
- Make sure that the Cross-file repetitions check box is checked: Since you will work on this translation on your own, you can use a segment that is repeated from another document in the project. Clear this check box only if others will work on this job, too.
- Check the Show weighted counts check box. memoQ will calculate an approximate "actual" word count of the job. To do that, memoQ uses the weights set in Options (Miscellaneous category, Weighted counts tab).
Managing an online project? If you are running Statistics on an online project, memoQ will use the weights from the memoQ server. To set up weights on a memoQ server, use the Server Administrator. Choose Weighted counts, and check or set the weights.
If several translators work on a set of documents, you can't predict when a translator will translate a segment or a document. You can't know who will be able to use translations from others. As a result, you can't predict what internal repetitions will be used.
Bottom line: If the translation is done by a team rather than one single translator, you can't have a precise estimate of the work in advance. To find out how much the work actually is, use Post-translation analysis after the translation is finished.
To set this up for your team in the Create analysis report window, use these options:
These settings are for an online project. Use them in a local project, too, if you are a project manager, and you plan to publish the project on a server, or to distribute the project using packages.
- Check the Use project TMs and corpora check box. This makes sure that memoQ checks the segments in every translation memory and LiveDocs corpus in your project.
-
Clear the Calculate homogeneity check box. memoQ shouldn't count internal fuzzy similarities because we don't know who will translate a segment first.
For more details: see Help about Homogeneity and repetitions in a project.
- Clear the Include locked rows check box: If you received documents that contain locked rows, most of the time your client means that you mustn't touch those.
- Make sure that the Repetitions take precedence over 100% check box is checked: Use this option if a consistent translation is more important than using all possible matches from the translation memory.
- Make sure that the Cross-file repetitions check box is cleared: We don't know who will be able to use repetitions.
- Check the Show weighted counts check box. memoQ will calculate an approximate "actual" word count of the job. To do that, memoQ uses the weights from the Weighted counts tab of the Miscellaneous pane of Options.
Managing an online project? If you are running Statistics on an online project, memoQ will use the weights from the memoQ server. To set up weights on a memoQ server, use the Server Administrator. Choose Weighted counts, and check or set the weights.
To simply have an idea about the size of the project, use these settings:
- Clear the Use project TMs and corpora check box.
- Clear the Calculate homogeneity check box.
- Check the Include locked rows check box.
- Check the Repetitions take precedence over 100% check box.
- Clear the Cross-file repetitions check box.
An editor - or proofreader - needs to work on the translation that is already there. Their work doesn't depend on translation memory matches. They will work on the entire text, so it makes no sense to analyze the work through translation memories.
- Clear the Use project TMs and corpora check box.
- Clear the Calculate homogeneity check box.
- If the editor needs to review all the translation, not just the newly translated segments: Check the Include locked rows check box.
- Check the Repetitions take precedence over 100% check box.
- Clear the Cross-file repetitions check box.
In most cases, especially when you translate from English, the work is measured by the word count of the source text. But in some markets or subject fields, translation is measured by the number of characters.
In the resulting table, memoQ always gives you the character count as well as the word count. In most cases, you need to count the spaces, too.
Make sure that the Include spaces in character counts check box is checked. memoQ counts every space separately. Two spaces right after each other count as two, not one.
Some document formats bring along a lot of inline tags in the text. Such formats are XML, HTML, PDF, InDesign, sometimes Microsoft Word - and potentially many others.
Inserting these tags in the right places can be a lot of work. The analysis report must reflect that.
Normally, memoQ counts tags, but in a separate number. That is not easy to include in the final word count.
To set this up, you can count tags as words or characters.
In the Tag weight row, type a number in the word(s) box. For example, if you type 0.25, memoQ counts one word after every four inline tags - or one-quarter word after every tag.
You can also count this with the characters. Type a number in the character(s) box. For example, if you type 2, memoQ counts two characters after every tag.
In the Statistics window, there are several options that tell memoQ what details to include in the report.
- To get a report separately for each document: Check the Show results for each file check box.
- To show the total count of segments, words, and characters, not just the analysis results: Check the Show counts check box. Normally, memoQ does this.
- To get a status report - how many segments or words are confirmed, edited, pre-translated, or untouched: Check the Status report check box.
- To learn the size of the translation: Check the Include target counts check box. Use this if you are billing from the target text, not from the source.
Don't use Trados 2007-like word counts: Normally, memoQ counts words like Microsoft Word does. In the past, when Trados 2007 or earlier (Trados Translator's Workbench) used to be a dominant translation tool, it was important that memoQ could produce similar word counts - so that translation companies could compare them. This is no longer the case. Use the Trados 2007-like word counts only if your client still works with an early Trados version, and they insist on using it.
Billing from target counts, not from the source? Check the Include target counts check box.
After you set all the options above, you can run the analysis. Click Calculate.
This may take several minutes or even longer, depending on the size of the project, and the resources you are using.
When the analysis is finished, memoQ shows it at the bottom of the Statistics window:
You can review it here. While you are preparing the project, you may decide to close this window, and go back to the project. For example, you may want to add a new translation memory, so that the No match counts will be lower.
Or, if you need to send on the analysis: You can save it in a file. memoQ can save reports you can open in Excel, or reports that you can open in a Web browser.
To save the analysis results:
- Click Export. The Export statistics result window opens.
- Choose one of the formats:
- HTML (Reflecting displayed results): Saves the displayed statistics as HTML file.
- CSV (Reflecting displayed results): Save the results in a CSV file (to be opened in Excel).
- CSV (Per-file, TRADOS-compatible): Saves the results in a CSV file, where the details of each document occupy exactly one row. This is the old Trados style.
- CSV (Per-file, All information): Save the results in a CSV file, where the results are laid out exactly as in the Statistics window.
- If you choose one of the CSV formats, you can choose the separator character that memoQ uses to delimit the columns in the table. There is no reason to use anything but the tab character: Under CSV separator, click Tab.
- Click Export. A Save As window opens. Find a folder and a name for the report file, and click Save. memoQ exports the report, and returns to the Statistics window.
While memoQ calculates the analysis, it can collect the relevant segments from the translation memories and the LiveDocs corpora of the project. You can put together a special translation memory called the Project TM that contains only those segments that the analysis found.
To do that:
- Before running the analysis, check the Create Project TM check box.
- Run the analysis: Click Calculate.
- Save the translation memory: Click Project TM. The Export Project TM window opens:
- Choose where memoQ should put the segments:
- You can save them in a TMX file, so that it can be imported on another computer, into a different translation tool.
- You can save them in a translation memory that is already in your project. From the Name drop-down box, choose the translation memory.
- Or, you can create a new translation memory in the project, and save the segments there.
- Click Export. memoQ saves the segments.
If you choose TMX file: A Save As window opens. Choose a folder and a name for the file, and click Save.
The results have two parts: a Counts section and one or more Analysis sections. This depends on the number of translation memories in your project, and the settings in theShow results for each file or the Details by source check boxes.
Scope: The scope of the analysis selected in the Select scope section.
Resources: Indicates the resources against which the results were gained. Here you find the name of a translation memory or "Homogeneity" for homogeneity checks. If these are aggregate results, you see the caption Every TM or Every TM, Homogeneity.
Type column:
- All: This row indicates the number of all source segments, source words and source characters and the source-wordcount based percentage within the selected scope.
- X-translated: This row indicates the number of x-translated source segments, source words and source characters and the source-wordcount based percentage within the selected scope.
- Repetition: This row indicates the number of repeated source segments, source words and source characters and the source-wordcount based percentage within the selected scope.
Analysis works for the selected scope: For example, if you have two documents in your project, and both contain the same segment only once, the statistics calculated for the project scope will show one segment as repetition. If you calculate statistics separately for the two documents, the results will not show any repetitions.
This difference may be significant if you plan to split a large project between different translators, because the overall statistics for the complete project may show a considerably higher rate of repetitions that the different subsets of documents in themselves.
- Not started: Number of untouched source segments, source words and source characters and the percentage of the text counted from the word count.
- Pre-translated: Number of pre-translated source segments, source words and source characters and the percentage of the text counted from the word count.
- Fragments: Number of segments, source words and source characters and the percentage of the text counted from the word count, where there are fragment-assembled matches.
- Edited: Number of edited source segments, source words and source characters and the percentage of the text counted from the word count.
- Translator confirmed: Number of confirmed source segments, source words and source characters and the percentage of the text counted from the word count.
- Reviewer 1 confirmed: Number of Reviewer 1 confirmed source segments, source words and source characters and the percentage of the text counted from the word count.
- Reviewer 2 confirmed (proofread): Number of Reviewer 2 source segments, source words and source characters the percentage of the text counted from the word count.
- Locked: Number of locked source segments, source words and source characters and the percentage of the text counted from the word count.
- Percentage ranges: These rows show the number of source segments, source words and source characters and the percentage of the text counted from the word count, for segments that have a match that falls in the same category. For example, if you see 5 after 75-84%, when the resource is Every TM, and the scope is the Project, it means that the combination of all translation memories will give 75-84% matches for five segments.
Segments column: Number of source segments, specified by the Type column, within the selected scope.
Source words column: Number of source words, specified by the Type column, within the selected scope. If the Tag weight is not 0, this may be higher than the actual word count.
Source chars column: Number of source characters, specified by the Type column, within the selected scope. Character counts include white space but do not include uninterpreted formatting tags. If the Tag weight is not 0, this may be higher than the actual character count.
Source tags column: Number of tags included in the segments specified by the Type column, within the selected scope.
Percent column: Percentage of the source words in this category against the total word count, within the selected scope. The sum of all percentages may not be precisely 100% because of rounding margins.
Target words column: Number of target words, specified by the Type column, within the selected scope. This column appears only if the Include target counts check box is checked.
Target chars column: Number of target characters, specified by the Type column, within the selected scope. This column appears only if the Include target counts check box is checked.
In the project manager edition of memoQ, you can have two or more target languages in a project - in a local project and in online projects, too.
When you run Statistics for all the target languages of the project, you will get extra details:
When you export the analysis, memoQ now adds a separate row for each target language document for the HTML and CSV (reflecting shown results) options. If you choose to export as CSV (per file, Trados-compatible) or CSV (per file, all information), memoQ will export a CSV with a prefix for each target language, e.g. [ger] sample.txt:
Note: When you run statistics in the all languages mode in a local project, the option to create a Project TM is not available.
If the source language of the project is an East Asian language, memoQ adds the Source non-Asian words and Source Asian characters columns to the statistics results:
Japanese and Chinese do not use spaces to separate words from each other, so the word count will not be reliable. Use the Asian character count instead. In the Statistics window, the Source words column shows the sum of Source non-Asian words and Source Asian characters (source words = characters).
Korean uses spaces: Korean, unlike Japanese and Chinese, uses spaces to separate words. If Korean is your source language, you can use word count, and Korean is counted as alphabetical script like German.
When you finish
To return to Project home, to the memoQ online project window, or to the translation editor: Click Close.