Digital Taxonomy Information Center Support Center

Contact Us

Text Match Rules Syntax

Introduction
Items in a Codeit codeframe can have text matching rules attached to them.
For example, you might specify a rule that any verbatims containing the text "Coke" or "Pepsi" should be automatically mapped to  "Code (2) - Colas".
You can allow the AI system to use these rules in the autocoding process.  So, in the example above, the verbatim "I like Coke" would be automatically coded as code 2.
Text matching rules are also useful for filtering items. So, in the example above, filtering on the text match rules for "Code (2)" would display any items containing the text "Coke" or "Pepsi".


Syntax
The Codeit text match rules use Regular Expressions as a syntax.  Regular Expressions are a common technique for defining text matching patterns.
Although Regular Expressions are very powerful and can get quite elaborate, you can achieve a lot with very simple expressions.
Taking the example above, "Coke" or "Pepsi" can be defined by the regular expression: "Coke|Pepsi" - where the "|" symbol represents OR.  In this example, we are are searching for the string of characters "coke" OR "pepsi" anywhere in the verbatim text. To match only words that "begin" with "coke" or "pepsi" we would use:  

\bcoke|\bpepsi - the "\b" syntax stands for a "boundary", or anything other than a letter prior to the character string.  So, the word must begin with the letters "coke" OR "pepsi".  This regular expression would match "coke", "cokes", "cokey" OR "pepsi", "pepsilicious".  The string must begin with "coke" OR "pepsi" but, any number of trailing characters are accepted.

\bcoke\b|\bpepsi\b - This is a strict match for exactly "coke" OR "pepsi" with no leading or trailing characters. There must be a boundary at the beginning and end of each string.

More Examples

^none$ -The entire verbatim is just "none". The "^" character represents the beginning of the entire verbatim and the "$" character represents the end of the entire verbatim.

good.flavour - To represent "any" character in your string, use the "." character. This expression would match "good" followed by any character then, "flavour".  Examples: "good flavour", "good-flavour", "good#flavour".

\blike.{1,10}\bflavour - This regular expression introduces the "near" concept. We know that the "." character matches any single character so by adding a range, "{1,10}" we are now allowing between 1 and 10 "any characters" to occur between our matches for "like" followed by "flavour".  This expression would match: "I like the flavour", "I like the rich flavour".  Where "like" is within 10 characters (including spaces) of "flavour".

(like|love|prefer).{1,20}(flavour|taste|smell) - Complex expressions can make use of parentheses. So, we are searching for the strings "like", "love" or "prefer" within 20 characters of "flavour", "taste" or "smell".

\bflavo[ur] - To accept a range of characters within your string, use the "[" and "]" symbols to enclose the range.  In this example, we will match "flavou" and "flavor".  We are assuming that if we match "flavou" that it is likely that the word in the verbatim is actually flavour, flavours, flavouring, etc.

\brec[ei][ei]ve - To accommodate common misspellings you might allow for the commonly misused characters in your expression.  This example matches "receive", "recieve", "receeve" and "reciive".  Allowing for transposition of the "e" and "i" characters.


Links
A Regular Expression "cheat sheet" is available here

Powered by HelpSite