Features of !SEMTools

Delete repeated (duplicate) words in Excel

Repeated words inside cells in Excel. First of repeated is red, 2nd and further are magenta

When analyzing text data, you’ll often need to remove repeated words. Sometimes duplicates come from automation errors, sometimes from manual input — either way, they need to go.

Before deleting, you can quickly verify that duplicate words are there after all: Find repeating words in cells. On big datasets (tens or hundreds of thousands of rows), this pre-check might save you time.

Remove duplicate words with a formula

This formula is designed to take a string of text from cell A1 and remove any duplicate words. The formula splits a sentence into words, removes duplicates, and then stitches the unique words back into a new sentence. But first, you need to remove punctuation.

Why Clean Punctuation First?
Punctuation can create false uniques. To a formula, “service.” and “service” are two different words. The Delete Punctuation tool in !SEMTools strips these characters first, ensuring that words are compared based on their actual letters, not the punctuation attached to them. This is a critical first step for reliable text cleaning.

=TEXTJOIN("",TRUE,IF(EXACT(LOWER(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)),UPPER(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)))+ISNUMBER(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)+0),MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1),""))

After the punctuation marks are gone, it’s time for deleting repetitive words:

=TEXTJOIN(" ";TRUE;UNIQUE(TEXTSPLIT(A1;" ");TRUE))

Formula explanation from inside out:

  1. TEXTSPLIT(A1;” “): This is the core of the formula. It takes the text in cell A1 and splits it into separate pieces (an array) wherever it finds a space character (” “).
    • Example: If A3 contains “apple banana apple cherry banana”, this function returns the array: {“apple”, “banana”, “apple”, “cherry”, “banana”}.
  2. UNIQUE( … ; TRUE): This function takes the array created by TEXTSPLIT and removes duplicate values.
    • The second argument TRUE tells Excel to compare the values exactly as they are (case-sensitive).
    • Continuing the example: It processes the array {“apple”, “banana”, “apple”, “cherry”, “banana”} and returns a new array with only the first occurrence of each unique word: {“apple”, “banana”, “cherry”}.
  3. TEXTJOIN(” “; TRUE; … ): Finally, this function takes the array of unique words and combines them back into a single text string.
    • The first argument ” ” is the delimiter; it specifies that a space should be placed between each joined word.
    • The second argument TRUE tells the function to ignore empty cells (which is good practice, even if there are none in this case).
    • Final result: It joins the array {“apple”, “banana”, “cherry”} to produce the final output: “apple banana cherry”.

Excel version support

This formula uses three modern dynamic array functions: TEXTSPLIT, UNIQUE, and TEXTJOIN.

These functions are not available in Excel 2019 or earlier perpetual (standalone) versions.

They are supported in:

  • Microsoft 365 (all subscribed versions, for Windows and Mac)
  • Excel for the Web (free version)

If you or someone you share the file with is using an older version of Excel (like Excel 2016, 2013, etc.), the formula will result in a #NAME? error.

Remove duplicate words in 2 clicks with !SEMTools

While the formula method works for exact matches, real-world text data — like customer reviews — is messy. The formula fails if the same word appears with different punctuation or in various forms (like “run,” “running,” and “ran”). To get clean data for accurate sentiment analysis or topic modeling, you need a more powerful approach.

With !SEMTools, you can achieve this in two clicks, with options for both simple and advanced deduplication. You’ll find it on the DELETE tab under Delete Words.

Remove Exact Duplicate Words

This is the simplest method to clean your data by removing words repeated identically. If you use latest Microsoft 365, you have a formula alternative for this task, as mentioned earlier.

It instantly deletes every repeated word that is spelled exactly the same way. It’s the perfect tool for fixing simple typos or cleaning up repetitive language, making your text concise for reports and summaries.

Remove Duplicates Using Word Forms (Lemmas)

This is the intelligent option for deeper analysis. It uses lemmatization, which finds the base form of a word (its “lemma”) before checking for duplicates. For example, in a sentence like “I ran to the bank, running late, the run was terrible,” it recognizes “ran,” “running,” and “run” as forms of the same word.

It keeps the first one it finds, removing the repetitive noise to reveal the true meaning of the text. This is essential for accurate sentiment analysis, as it stops common verbs from skewing your results and lets the important descriptive words stand out.

Ignore Punctuation When Checking for Duplicates

This option ensures that words are not treated as different just because of punctuation. For instance, it will correctly identify “service”, “service.” and “service!” as the same word. By ignoring commas, periods, and exclamation marks, this feature guarantees you are finding true duplicates, leading to a much cleaner and more reliable result.

Remove exactly matching duplicate words

All three of these powerful text-cleaning functions are available in !SEMTools for Excel, helping you get rid of complex formulas without worrying about your Excel version.

Remove duplicates across a column (keep first occurrence)

If you don’t yet know which words repeat most, first compute the most frequent words and build a list to act on. You can do that in the SEO & PPC tools group. If the words are short, a naive “replace with empty” might damage valid text; the safer route is to use “delete by list” tools that match whole words only. See the Delete Words section.

Need to remove all duplicates within a column while keeping only the first occurrence in each cell? There’s a one-click tool for that in the SEO & PPC group (works great for large keyword lists).

Remove duplicates across a range, gather uniques, count, and output a List

For semantic analysis you may need to deduplicate across the entire range, not just within each cell, then list unique words and their frequencies. Use the frequency dictionary tool in SEO & PPC tools to build a clean word frequency dictionary.

Tip: before running any of these procedures, it’s also a good idea to remove punctuation.

Need to remove duplicate words in Excel?
!SEMTools handles this (and hundreds of other text tasks) in just a couple of clicks.

This post is also available in RU.