Regular Expression

Rules on YOCTOL.AI come in forms such as NLU, keyword, and regular expression. Among these forms, regular expressions describe expressions that follows a certain rule. They are used to inspect, search, or replace texts that match the rule. Common abbreviations of regular expressions are Regexp, Regex or RE.

Example

There are hundreds of ways to express the same meaning in any language. Also, there are different structures to achieve the similar meanings. These sentences can be expressed in certain forms of structures if we carefully examine and categorize them.

Take the following as an example: Large cup of coke, Medium cup of coke, Small cup of sprite and Medium cup of Fanta. We know that there are two basic features in these phrases: the size of the cup and the type of the drink.

If we put Large, Medium and Small into one group, and Coke, Sprite and Fanta to another, we then have an essential rule of thumb for this kind of sentences.

Size

Kind

Large

Medium

Small

Coke

Sprite

Fanta

With regular expressions, we can narrow these three sentences down to a rule as below:

(large|medium|small) cup of (coke|sprite|fanta)

Common circumstances for regular expressions

Regular expressions are often used for emails, phone numbers, birthdays, or any sort of structural character combinations.

Phone number: ^(\+1)?d{10}$

Social Security Number: ^\d{3}-\d{2}-\d{4}$

Email (gmail for example): ^.*@gmail.com$

Birthday: ^\d{4}-\d{2}-\d{2}$

When to use regular expressions

When we build our chatbots, we sometimes hope that certain combinations of words in a customer’s message would normally lead them to a certain intent, which further triggers a response from the dialogue. These combinations may subtly display a certain kind of combination rule.

This is where regular expressions come in handy. When encountering situations like this, you should sort the customers’ messages and use regular expressions to categorize, exclude, or set limitations to the messages. We will demonstrate an actual use case later.

Symbols

Basic rule: These symbols limit the number of occurrence of the character before it. We will introduce a couple common symbols below. For more symbols, there are many cheat sheets available online.

Symbol

Meaning

Example

Represent

.

Any character

.

a b c d

*

0 or more

a*

Ø, a, aa, aaa

+

1 or more

a+

a, aa, aaa, aaaa

?

0 or 1

a?

Ø, a

^ start of a string: Regular Expression: /^hello/ Example: hello! -> Match "hey hello" -> No Match

$ end of a string: Regular Expression: /+eat$/ Example: I want to eat -> Match “I want to eat food” -> No Match

| or e.g., a|b >>> a or b

() defines the priority of a scope, e.g., (apple|banana) >>> apple, banana

[ ] any element within it, e.g., [abc] >>> any of a or b or c.

{ } number range, e.g., .{2} >>> ab, bc, cd, dw

\d any number, e.g., \d{3} >>> 123, 456, 789

Example

Say for an online clothes shop, we want to train an intent that knows the different colors of the different types of clothes in stock.

As to colors and clothes types, there are many ways of expressing:

  • Red jeans, blue shirt, yellow shorts, green jacket, yellow jacket, green shorts

Different colors and types may rearrange into many possibilities. Therefore, we can use a regular expression to capture all the possibilities.

Color

Type

Red, Blue, Yellow, Green

Jeans, Shirt, Shorts, Jacket

Regular expression:

(red|blue|yellow|green) (jeans|shirt|shorts|jacket)

However, users may add in words between or around these keywords, for example:

  • Red pink shirt, green colored shorts…

In this case, the regular expression would be:

(red|blue|yellow|green) (.* )?(jeans|shirt|shorts|jacket)

To be more advanced, we would want to capture the description of intensity of the colors. That case, the regex would be:

(light|deep)(red|blue|yellow|green) (.* )?(jeans|shirt|shorts|jacket)

If you want to rule out sentences that ask for a certain style of something, the regex would be:

^I (need|want) the .* style$

Learning Resources

When designing regular expressions, along with knowing what the symbols mean, you also need to check if the expression you made actually captures what you want. The following websites can assist you:

  1. Regex testing: https://regex101.com/

  2. Regex testing: https://regexr.com