Features of !SEMTools

Extract a given list of words from cells in Excel

When working with large text datasets, there are times when you need to find words from a specific list in a column. But not just find them—if a cell contains any of those words, you may also want to display the words themselves. If your word list is small, you can use Excel’s Advanced Filter, but when it’s not, the task becomes difficult—especially when your source text contains hundreds or thousands of rows. The main challenges:

  • You can’t search words by simple substring matching—short words can be part of other words. For example, “eco” is inside “economy,” and “economy” is inside “economical.”
  • If you need to find all words from the list (not just the first match), you can’t remove extra content from a cell until the search is complete and all matches have been identified.

Below is how you can solve these tasks with the !SEMTools Excel add-in.

Extract words from a specified list

This feature instantly removes all words from text except those found in the list you provide—effectively extracting the matching words. Similar tools include Find words from a list and Remove a list of words from text.

To preserve your original data, copy the column first and extract words from the copy. In the example below, several hundred words are extracted from more than 5,000 copied cells.

Here’s what happens: if a cell contains any word from the list, it is kept; if not, it’s removed. If multiple words in the cell are found in the list, all of them remain.

extract word list from text
Extracting a list of words from a dataset—example with 130 relationship markers in the novel “Eugene Onegin.”

For fully accurate results, you should first remove all punctuation from the cells. This tool is case-insensitive. It’s important to note again—this extracts words, no matter how short they are. If your range contains not only words but also free-form text, you should use the similar Extract phrases feature, because multi-word phrases (two words, three words, etc.) will be skipped in word-extraction mode.

Extract words from built-in marker lists

This group of macros lets you extract built-in marker lists such as rental terms, sales terms, review indicators, and more—new entity lists are added as the add-in develops. These are especially popular among SEO and paid search specialists. Below is an example of extracting commercial markers.

commercial markers in text

Extract all unique words from the current list

For a detailed guide on how to analyze a list of phrases and gather all unique words from it, see my other article: N-gram analysis in Excel.

This post is also available in RU.