Features of !SEMTools

Delete repeated (duplicate) words in Excel

When analyzing text data, you’ll often need to remove repeated words. Sometimes duplicates come from automation errors, sometimes from manual input — either way, they need to go.

Remove Duplicates Across a Column (keep first occurrence)

If you don’t yet know which words repeat most, first compute the most frequent words and build a list to act on. You can do that in the SEO & PPC tools group. If the words are short, a naive “replace with empty” might damage valid text; the safer route is to use “delete by list” tools that match whole words only. See the Delete Words section.

Need to remove all duplicates within a column while keeping only the first occurrence in each cell? There’s a one-click tool for that in the SEO & PPC group (works great for large keyword lists).

Remove Duplicate Words Inside Each Cell

Before deleting, you can quickly verify that duplicates exist using the checker: Find repeating words in cells. On big datasets (tens or hundreds of thousands of rows), this pre-check is noticeably faster.

Excel doesn’t include a built-in function to remove duplicate words inside a cell. Conceptually the algorithm is simple: split a cell into words, scan left-to-right, and keep a word only the first time it appears.

With !SEMTools you can do this in two clicks. The procedure removes duplicate words case-insensitively. You’ll find it on the DELETE tab under Delete Words. Here’s how it looks:

Remove duplicate words in a cell with !SEMTools
Remove exact duplicate words in all selected cells

Remove Duplicates by Lemmas (word forms)

Sometimes you want “duplicate” to mean not just exact matches but also other forms of the same word. In that case, enable lemmatization so the comparison is done by lemmas (base forms). The tool uses lemmatization and removes repeats by lemma, leaving the first occurring form in each cell. Case is ignored by design.

Remove duplicate words by lemma with !SEMTools
Remove duplicate words considering word forms

Remove Duplicates Across a Range, Gather Uniques, Count, and Output a List

For semantic analysis you may need to deduplicate across the entire range, not just within each cell, then list unique words and their frequencies. Use the frequency dictionary tool in SEO & PPC tools to build a clean word frequency dictionary.

Tip: before running any of these procedures, it’s a good idea to remove punctuation.

Need to remove duplicate words in Excel?
!SEMTools handles this (and hundreds of other text tasks) in just a couple of clicks.

This post is also available in RU.