Regular Expressions (regex) Basics

The following table provides some basics for using Regular Expressions within the ViDi GUI.

Anchors

^

Specifies the start of a string or a line, such as ^0, would match strings that start with a 0.

Note: When the ^ character is used inside brackets, i.e. [^0], the ^ means does not start.

$

Specifies the end of a string or a line, such as 0$, would match strings that end with a 0.

Basic Matching

.

Matches any single character.

Note: To match a period, you must escape the dot by using a backslash. For example, you would enter: \.

\d

Matches any digit in the 0123456789 range.

The actual digits, 0-9, can also be expressed by typing the digit.

\w

Matches any letter, digit and the underscore character (_).

The actual characters, which are case sensitive, can be expressed by typing a character a-z or A-Z.

\s

Matches a whitespace.

Specific Characters

[...]

The square brackets [ ] are used to match specific characters, which are defined within the square brackets. For example, [a-z] matches one lowercase letter from a to z, or T[ao]p would match either Tap or Top. The brackets can be used in compound structures, such as [A-C][0-3][g-i], so that you would match strings of A0g, A0h, B1i, and C3g.

Repetitions

{...} The curly braces { } are used to denote repetitions. For example, t{2} returns two t's; [def]{3} returns three characters, which could be a d, e or an f; and .{1,4} matches between one and four of any character.

*

Denotes zero or more instance of a character or a digit. For example, a* would match zero or more a character(s).

+

Denotes one or more instance of a character. For example, [nop]+ would match one or more of the n, o, or p characters.

Optional Characters

?

The question mark ? is used to match either zero or one of the preceding character or group. For example, 12?3 will match either 123 or 13.

Note: To match a question mark, you must escape the question mark by using a slash. For example, you would enter: \?

Grouping

(...)

The parentheses are used to define groups of characters, so that the sub-pattern within a pair of parentheses constitutes a group. This can be very useful in extracting information from image filenames. For example, if you had curated your images to use a certain naming convention, like Good_0001.png, and you wanted to only return those images, you could use ^(Good.+)\.png$

You can also use the parentheses to capture nested groups. Using the example above, you could refine the search based on the digits, such as ^(Good(\d+))\.png$

Also, the quantifiers described above can also be used within parentheses to capture patterns. For example, if you wanted to capture possible areas greater than 1000, you could use (\d{4})x(\d{4})

The logical OR identifier | can be used to denote different possible sets of characters. For example, if you wanted to return "scratch", "dent" or "hole" in a set of filenames, you could use ^(Bad(\d+)(scratch|dent|hole))\.png$