Term extraction editor
When you extract terms from the source documents, translation memories or LiveDocs corpora in your project, memoQ displays the candidate list in a candidate list editor tab. In this tab, you can clean up, filter, edit, and reconfigure the candidate list. You can also confirm candidates that will be added to a term base created from the candidate list.
In the candidate list editor, memoQ shows the list of term candidates that belong to one term extraction session. This list is saved in a project.
How to get here
- Create or open a project.
Add text before you run term extraction: The project needs to have the text to process. The text can be in project documents, translation memories, or LiveDocs corpora.
- In the project, import the documents, or add the translation memories and LiveDocs corpora you need.
memoQ can use existing term bases to help with term extraction: Before you run term extraction, add those term bases to the project, too.
- On the Preparation ribbon, click the Extract Terms icon. The Extract candidates window opens.
If this is not the first time you run term extraction in this project: The Extract terms window opens first. If you're sure you need to run a new round of term extraction, click Start new session. To learn more, see the Help page about the Extract terms window.
memoQ extracts the candidates from the documents, and opens the candidate list editor automatically.
- Open a local project.
-
On the Preparation ribbon, click the Extract Terms icon. If you ran term extraction before in this project: The Extract terms window opens.
If there are no previous term extraction sessions: The Extract candidates window opens instead.
- Select the session you want to return to. Click Continue session.
The session opens in the candidate list editor tab.
.
What can you do?
The result of term extraction is a list of term candidates. A candidate either becomes a term in a term base, or is dropped from the list.
The candidate list editor lists the term candidates in a table. The table contains the following columns:
- Status: Shows the status of the candidate. This column can contain one of the following values:
- Candidate: This is a candidate that was not hidden, dropped, or accepted. Intially, the status of all candidates is Candidate.
- Accepted: These candidates were accepted using the Accept link or the Ctrl+Enter key shortcut. Only accepted candidates are copied to the final term base.
- Dropped: These candidates are discarded from the list using the Drop link or the Ctrl+D key shortcut. Dropped candidates are sorted to the end of the list when you use the Re-sort now link or the Ctrl+R key shortcut.
- (number): Shows the position of the candidate within the list.
- $ (score): Shows how confident memoQ is that the candidate is a valid term. It is computed from the frequency and the length of the candidate. memoQ computes a higher score if the candidate or part of it was found in the term bases used in the project.
Note: The frequency of a word or expression is the number of times it occurs in the source text where candidates were extracted from.
- Length: The length of the candidate, in characters.
- Hides shorter: Each row contains an eye icon that shows whether or not all other candidates that are parts of this one are hidden. For example, closing the eye for the candidate "a common scenario" hides "common", "scenario", "common scenario" etc.). Click the eye icon to hide the candidates.
Note: Hidden candidates remain visible, but if you re-sort the list (using the Re-sort now link or the Ctrl+R key shortcut), they will be sorted to the end of the list.
- Source: The source term as it was extracted from the text. You can edit this cell. The Source row can contain additional information:
- Also: This candidate was merged with another, and now they form a single term base entry that has two or more source-language alternatives. The word Also is followed by a list of alternatives.
- Original: This candidate was edited, and the source-language expression is not the same as the one memoQ extracted from the text. The word Original is followed by the original candidate.
- Source example: An example showing how the term is used. You can type into this field. To copy the selected source segment from the Occurrences field: Click the Add source as example link or press Ctrl+S.
- Target: The target term. You can either type translations for the candidates. To add one from the Occurrences field: Right-click the occurrence, and from the menu, choose Add as Target. To make memoQ fill in the cells: On the ribbon, click the Look up terms now button.
- Target example: An example showing how the term is used. You can type into this field. To copy the selected target segment from the Occurrences field: Click the Add target as example link or press Ctrl+T.
By default, the list is sorted by status, then by score, then by frequency, and then in alphabetic order. This sort order means that accepted candidates are listed on top. These are followed by unmodified candidates (with the Candidate status), dropped, and then hidden candidates.
Within the same status, candidates are listed in a descending order by score. If two candidates have the same score (which is almost impossible), they are sorted by frequency (also in a descending order).
The candidates are always sorted by Status first. To change further sorting: click the column header you want to sort by.
Note: The Status field is not clickable. To sort the list by status after you accepted or dropped some candidates: click the Re-sort now link.
In the Candidates section of the Term Extraction ribbon, click Candidate statistics. The Term extraction statistics window opens.
- The Term base matching section shows how many of the candidates have full or partial matches in the term base.
- The Status section shows how many of the candidates you accepted, dropped or hid.
- The Sources and Settings sections show information of the term extraction session.
To export the statistics report to an HTML file: Click Export.
To close the Term extraction statistics window: Click Close.
You can move around in the candidate list using the down and up arrow keys. You can also use the mouse. If you click a row, that becomes the active one.
- You can use the PageDown and PageUp keys to jump down or up.
- To jump at the end of the list, press Ctrl+End.
- To jump at the beginning of the list, press Ctrl+Home.
- To make the active row stay in the middle of the screen: In the View ribbon's Layout section, click Active row, then choose In the middle.
- To let the active row move freely: In the View ribbon's Layout section, click Active row, then choose Anywhere.
- To jump between candidates of a specific status: In the Term Extraction ribbon's Go to section, choose a status from the dropdown, then click Next or Previous (or press Ctrl+Alt+Down or Ctrl+Alt+Up).
- You can change row height in the candidate list and in the occurrence list by dragging.
-
You can type in both the Source and Target columns. When you edit the term candidate in the Source column, its original form is displayed at the bottom of the cell, with the Original label.
You can select one or more rows:
- Click a row to select it.
- To select multiple rows, click the first one, then press and hold down Ctrl, and click all the others one by one.
- To select a contiguous sequence of rows, click the first one, then press and hold down Shift, then click the last row.
- To select all rows, click Select all rows in the Term section of the ribbon, or press Ctrl+Shift+A. (Ctrl+A selects the source text or the target text in the current row.)
The candidate list may contain many irrelevant phrases that should not be accepted as terms. Some of the candidates may be synonyms (different phrases with the same meaning) or parts of others, and one member of the group of synonyms might be accepted as a term. This means you need to clean the list before you can use it.
When you clean the list, you can accept, drop (discard), hide, and merge candidates. Merging candidates means that you treat a group of candidates as a single terminology entry where the source-language term can take multiple forms.
Only accepted candidates will be copied into the final term base. Candidates you hide and don't accept will not be copied.
- Filtering: You can filter the list using the Filter text box. If you type a phrase in the text box, memoQ shows only those candidates that contain the phrase. When you click the text box, the previous filter phrases appear. If you check the Only with TB result check box, the list will contain only those candidates that have one or more hits in the term bases used in the term extraction session. You can also press Ctrl+Shift+F to access the Filter text box.
The Term Extraction ribbon offers commands to clean the term candidate list. All commands have a shortcut key to speed up your work.
- Drop Term: Marks the current candidate or the selected candidates as Dropped. Dropped candidates remain on the list, but when you sort the list again using the Re-sort now command, they are sorted to the end of the list. Shortcut key: Ctrl+D.
- Accept as term: Marks the current candidate or the selected candidates as Accepted. Accepted candidates will be copied to the final term bases. When you sort the list using the Re-sort now command, accepted candidates are sorted to the beginning of the list. Shortcut key: Ctrl+Enter.
- Hide/unhide shorter: Hides those candidates where the source term is a part of the current candidate. Hidden candidates are sorted to the end of the list. If the shorter candidates are hidden, this command uncovers them (it works like a toggle). Shortcut key: Ctrl+L.
- Merge candidates: If two or more candidates are selected, this command merges them into a single candidate (a single row in the list). The new row shows the first selected candidate as the main term, but displays the other candidates, marked with the word Also. This command is similar to the Join segments command in the translation grid, so its shortcut key is Ctrl+J.
- Unmerge: If the current candidate is a merged one – that is, it is created from multiple original candidates –, this command unmerges them: they become separate candidates again. This command is similar to the Split segment command in the translation grid, so its shortcut key is Ctrl+T.
- Prefix merge and hide: This command looks for candidates in the list that have the same prefix as the current one. If memoQ finds two or more candidates with the same prefix, it automatically merges them. Normally, a source term must include a prefix marker – the pipe | character – to run this command (example: system|s). However, if there is no prefix marker in the source term, memoQ displays the No prefix markers in term window to ask for confirmation. If that is confirmed, the entire source term in the current candidate is used as a prefix. Shortcut key: Ctrl+M.
- Add as stop word: Displays a menu with three items:
- Add selection: Use this option when you selected a part of a candidate. The New stop word window opens and you can add the selected text part to the current stop word list.
- Add selected candidate(s): Use this option when you selected one or more candidates. The New stop word window opens and you can add the selected candidates to the current stop word list.
- Add all dropped: The Add all dropped as stop words window opens. Choose if you want to add all the candidates you marked as Dropped to the current stop word list, or to a new one. If you choose a new one, the Create new stop word list window opens.
When using a read-only stop word list, you can only add anything to a new stop word list, the other options are not available. This is also true if you did not choose a stop word list for this session.
-
As you clean up the list, dropped, accepted, and unprocessed candidates may get mixed in the display. To tidy up the view, click the Re-sort now link, or press Ctrl+R: this will sort the list, so that the accepted candidates come first, the unprocessed candidates second, hidden candidates third, and dropped candidates last. The display jumps to the first unprocessed candidate. If you scroll up after re-sorting the list, you will find the accepted candidates.
You have four ways to fill in the target cells for the candidates:
- Typing it in: You can simply type one or more target-language equivalents (translations) in the target cell for a candidate. If you want to enter two or more translations, separate them with a semicolon (;).
- Dragging from the term base hit: If the term bases returned one or more hits for the current candidate, they are displayed in the lower-right corner of the candidate list editor tab. If there are multiple hits, they are listed in the left side of the Term base results panel. You can either click a hit to display its details, or you can move up and down in the list using the Ctrl+Up and Ctrl+Down key shortcuts. The selected entry is displayed in a formatted layout in the right side of the Term base results panel. You can select text in this entry, and drag it to the target cell.
- Dragging from the Occurrences list: At the lower-left corner, the candidate list editor displays the Occurrences panel where it shows the text where the source term was found during term extraction. If there are no term base hits, or they are not relevant, you can use the context of the term to determine its translation. This is a concordance view. If the source document contains translations, those are also displayed. You can drag selected text from the Occurrences panel to the target cell. If there are multiple occurrences, you can navigate between them using Ctrl+Up and Ctrl+Down.
Note: The Occurrences list starts with concordance hits from the translation memories used in the project. These items are displayed with a blue background. Items with a white background are the occurrences in the source documents.
Note: You can use the same key shortcuts to navigate among the term base hits and the occurrences. You can switch between the two using the Toggle between occurrences and term base results link, or the Ctrl+G key shortcut.
- Repeating the lookup in the term bases of the project: If you click the Look Up Terms Now button on the Term Extraction ribbon tab, memoQ displays the Look up terms now window, and searches through all term bases in the project for possible hits. You can change the term base ranking in the project, and then run Look Up Terms Now to get different results from the ones you got with the initial term extraction.
Normally, memoQ searches all the term bases in the project. If you want to search the highest-ranking term base only: In the Look up terms now window, click the Term base with the highest rank only radio button, and click OK.
To fill the candidates' example cells:
- Source example: You can type into this field. To copy the selected source segment from the Occurrences field: Click the Add source as example link or press Ctrl+S.
- Target example: You can type into this field. To copy the selected target segment from the Occurrences field: Click the Add target as example link or press Ctrl+T.
Some commands affect the entire term extraction session:
- Target language drop-down list: If the current project has two or more target languages, you can look up and enter translations in all target languages. Use this drop-down list to select the target language you want to work with. If there is only one target language in the project, this drop-down list is grayed out.
- Accepted items provide lookup results check box: If this is checked, the term extraction session works like an ordinary term base. When you work on a document in the translation grid, accepted terms from the term extraction session appear in the Translation results pane. The check box is checked by default.
- Export To TaaS icon: Displays the upload window to upload the accepted terms to your TaaS collection.
- Restart Session icon: Displays the Extract candidates window, and starts the session again. The candidate list is cleared, and a new one is created. You will lose all changes you made to the candidate list – use this command with care.
- Export To Term Base icon: Displays the Export accepted terms to term base window, and copies the accepted terms, their translations, and examples to a term base you choose.
- Export to Excel icon: Displays the Export accepted candidates window - a Save window where you can tell memoQ where to save the accepted terms, their translations, and examples as an XLSX file.
When you finish
To switch to another tab: Press Ctrl+Tab (You can also click the other tab.)
To close the candidate list editor tab: Click the Close button or press Ctrl+F4.