Understanding word count differences in memoQ

If you've ever compared word counts between memoQ, Microsoft Word, or other CAT tools, you've probably noticed the numbers don’t always match. Each CAT tool uses its own way of defining words, handling segmentation, numbers, tags, etc. That means counts can vary by several percent.

 

memoQ doesn't just count raw words. It considers how much of each segment is already translated or can be reused. This helps translators and project managers get a more accurate picture of how much work is actually needed.


Here’s what affects word count in memoQ and how to interpret it:

1. Content import and filtering

memoQ lets you define what to include or exclude during import. This affects your word count.

For example:

  • Using cascading filters can remove HTML code from an Excel file before analysis.

    For example:

    If you want to import an Excel file that contains HTML code, memoQ lets you use cascading filters to clean it up. First, you apply the Excel filter, then add the HTML filter. This way, memoQ removes the HTML code during import and it won’t be included in your word count analysis. Some other tools can’t filter out these tags and might count them as regular text, which can inflate the word count.

  • Importing different formats

    Importing the same file in different formats (e.g., DOCX or RTF) may show different counts, as memoQ applies different import filters for each format.

  • Excluding elements from import

    You can exclude elements like XML attributes, footnotes, text boxes, or specific code using Regex text filter or filter settings. This can significantly reduce your word count by focusing only on what actually needs translation.

2. Word-counting methods

memoQ offers two count methods:

  • memoQ (default):

    • Counts any text between spaces as a word (like MS Word).

    • Numbers and hyphenated words are each counted as a single word.

    This method is more aligned with modern tools. It's recommended for simplicity and consistency.

  • Trados: Mimics older Trados (6.5) behavior:

    • Hyphenated words are counted as multiple words (e.g., hydroxypropyl‑beta‑cyclodextrin = 3 words).

    • Numbers like 255,234 are ignored (not counted), but 25 cups counts as two words.

    This method is more useful for legacy workflows.

You can activate Trados‑style counting to mimic Trados rules, which use different algorithms for hyphens, segmentation, etc.

3. Tags and formatting matter

memoQ detects inline tags and counts them separately. You can assign weights (e.g., 0.25 words per tag) so they impact word count.

4. Fuzzy matches reduce word counts

When you're working it the Statistics window, memoQ segments text and uses translation memory (TM) to categorize matches.

Instead of treating all words equally, memoQ applies a weight based on match quality.

Weighted word counts reflect actual effort, not just raw word numbers.

Match type Word count weight
No match (0-49%) 100%
Fuzzy (50-74%) 80%
Fuzzy (75-94%) 50%
Exact match (100%) 30%
Repetition 20%
Context match (101–102%) 10%
  • What are fuzzy matches?

    When memoQ compares your content to translation memory (TM) segments, it scores similarity:

    • Exact match (100%) – identical segment.

    • Context match (101–102%) – identical plus same surrounding context.

    • Fuzzy match (50–99%) – similar, not identical. Higher fuzzy percentages mean closer match which is less work and a lower word count.

  • How fuzzy matches lower word counts?

    Instead of counting every matched word as a full new word, memoQ applies weights:

    100% matches count as only 30% of their words.

    95–99% fuzzy = 50% weight.

    85–94% = 80%

    < 75% = 100% (full effort).

    For example:

    1000 words with 100% matches counts as 1000 words x 30% = 300 weighted words, not 1000.

    or

    1000 words, all with 75-94% fuzzy matches counts as 1000 words x 50% = 500 weighted words.

    or

    A 200-word document with 50 words = 100% match, 50 words = 80% fuzzy, 50 words = no match, 50 words = repetitions

    counts as

    Weighted total: Word count weight  
    50 words = 100% match 30% 50 x 0.3 = 15
    50 words = 80% fuzzy 80% 50 x 0.8 = 40
    50 words = no match 100% 50 x 1 = 50
    50 words = repetitions 20%

    50 x 0.2 = 10

     

    Total = 115 words

You can customize matches in Miscellaneous pane under the Weighted counts tab.

Tips

  • Always check both raw and weighted word counts in your project statistics.

  • For consistency, make sure all team members use the same count settings.

  • If comparing with Word or other tools, understand that memoQ is measuring effort, not just quantity.

Why this matters?

  • Accurate effort estimates: You know what really needs translating.

  • Fair billing: Clients don’t overpay for already translated content.

  • Smarter planning: Helps you assign the right resources and time.