Lemmatization is the process of transforming a word into its base form, or lemma. The lemma differs depending on the part of speech. For example:
- for verbs, it is the infinitive;
- for nouns, the singular form;
- for comparative and superlative adjectives, the positive degree.
Why lemmatization matters
Lemmatization is widely used across different fields, including:
- Search engines: connect user queries with relevant page content by reducing words to their base forms.
- SEO specialists: normalize all keyword forms on a page to ensure coverage of less common but important variations.
- Sociologists: analyze political speeches (e.g., a president’s address) by lemmatizing transcripts to measure tone and identify key recurring terms.
- Keyword research tools: group queries by lemma to avoid missing important variations when planning campaigns.
- PPC experts: lemmatize keywords before creating frequency dictionaries, making it easier to identify common concepts and match them to relevant landing pages.
Lemmatization made easy in Excel
For marketers without programming experience, using command-line tools or coding solutions for lemmatization is rarely practical. However, in keyword work — clustering, building frequency dictionaries, n-gram analysis, removing hidden duplicates — lemmatization can be invaluable.
That’s why I created a built-in lemmatizer in the !SEMTools add-in. With it, you can lemmatize keywords of any size directly in Excel in just a couple of clicks.
How the !SEMTools lemmatizer works
!SEMTools uses a precompiled dictionary of common word–lemma pairs, which I maintain and regularly update. Because the dictionary is large (about 10 MB), it isn’t built into the add-in itself. Instead, it’s downloaded from this site the first time you run the process, if it’s not already stored locally.
The tool is extremely fast — processing tens of thousands of lines per second — so you can handle large keyword lists without overloading your computer.
Limitations and recommendations
- It is advised to convert all text to lowercase before running lemmatization.
- Punctuation is not ignored and is treated as part of the word. Remove all non-letter characters before processing.
Need fast, accurate lemmatization without programming? Install !SEMTools and process massive keyword lists directly in Excel with just a couple of clicks!
This post is also available in RU.