Javascript Regular Expression

Someone from the Stack Overflow gave me a link to read up after solving my problem of displaying the reply from my python script. It’s about Regular Expression.  Here’s a summary and some notes to self on what I’ve learned from that page.

Regular Expressions are patterns used to match character combinations in strings. In Javascript, regular expressions are also objects.  These patterns are used with the exec and test methods of RegExp, and with the match, replace, search, and split methods of String

Regular expression can be constructed in one of two ways:

  1. Regular expression literal:
    var re = /ab+c/;
    -better performance
  2. Constructor function of RegExp object:
    var re = new RegExp("ab+c");
    -when the RE pattern will be changing / getting it from another source (user input)

How to write Regular Expression Pattern?

The pattern is made up of simple characters or a combination of simple and special characters.

Using Simple Patterns:
– constructed of characters (like abcd) for which you want to find a direct match.
– e.g. /abc/ matches character combination in strings only when exactly the characters ‘abc’ occur together and in that order.

Using Special Characters (like ~!@#$*):
– when search requires something more than a direct match (such as finding more than one occurrence, finding whitespace or special characters)
– some of the common special characters that i might use:
backslash \
-precede non-special character to indicate next character is special and not to be interpreted literally. e.g. \b – special word bourdary character
-precede special character to indicate next character is not special and should be interpreted literally. e.g \* will display *
^
-matches beginning of input
-other function please refer to the complete list on the page
$
-matches end of input
*
-matches preceding character 0 or more times.
+
-matches preceding character 1 or more times.
?
-matches preceding character 0 or 1 time.
Decimal point .
-matches any single chracter except newline character. what??
(x)
-matches ‘x’ and remembers the match.
-() is the capturing parentheses.
-i have a string “foo bar foo bar” – with /(foo) (bar) \1 \2/ , ‘(foo)’ and ‘(bar)’ will match first 2 words while \1 \2 will match the string last two words. what??
(?:x)
-matches x but does not remember the match.
-() is the non-capturing parentheses
x(?=y)
-matches ‘x’ only if ‘x’ is followed by ‘y’ -lookahead
x(?!y)
-matches ‘x’ only if ‘x’ is NOT followed by ‘y’ -negated lookahead
x|y
-matches either ‘x’ or ‘y’
{n}
-matches exactly n occurrences of the preceding character
{n,m}
-matches at least n and at most m occurrences of the preceding character. omitted if 0.
[xyz]
-character set. – pattern type matches any one of the characrers in the brackets.
-special character are not special inside the character set.
-hyphen can be used to specify range.
[^xyz]
-negated/complemented character set.
-matches anything not enclosed in the brackets.
-hyphen can be used too.
[\b]
-matches a backspace
\b
-matches a word boundary (a word character is not followed or preceeded by another word-character)
-when put in front of a character, it matches if the character has nothing in front
-when put behind a character, it matches when there is no character following that character
\B
-matches non-word boundary
-matches a position where the previous and next character are of the same type
-beginning and end of a string are considered non-words
\d
-matches a digit character
\D
-matches non digit character
\s
-matches a single white space character including space, tab, etc
and blah blah…

Working with Regular Expressions

Regular expressions are used with the RegExp methods test and exec and with the String methods match, replace, search, and split. These methods are explained in detail in the JavaScript Reference.

Method Description
exec A RegExp method that executes a search for a match in a string. It returns an array of information.
test A RegExp method that tests for a match in a string. It returns true or false.
match A String method that executes a search for a match in a string. It returns an array of information or null on a mismatch.
search A String method that tests for a match in a string. It returns the index of the match, or -1 if the search fails.
replace A String method that executes a search for a match in a string, and replaces the matched substring with a replacement substring.
split A String method that uses a regular expression or a fixed string to break a string into an array of substrings.

How I use it in my program

else if (/^Count=/.test(reply)) {
$('#status_table tr #'+eachStatus).empty().append(reply.replace(/Count=/, ''));
} else if (/^Sensor\s*=/.test(reply)){
$('#status_table tr #'+eachStatus).empty().append(reply.replace(/Sensor\s*=\s*/, ''));
}

/^Count=/ means those start with “Count=” or /^Sensor\s*=/ means those start with “Sensor=” with \s* means may or may not have white space behind character r.
.test(reply) tests for matched string and return true or false, hence using “if” at the front of expression.
Syntax: regexp.test([str])
reply.replace simply replace /Count=/ with '' (empty string)

UPDATE!

I’ve got a reply for my question! I’ll paste it down here. Also, there is another good reference source.

Parentheses (aka capture groups)

Parantheses are used to indicate a group of symbols in the regular expression that, when matched, are ‘remembered’ in the match result. Each matched group is labelled with a numbered order, as \1, \2, and so on. In the example /(foo) (bar) \1 \2/ we remember the match foo as \1, and the match bar as \2. This means that the string “foo bar foo bar” matches the regular expression because the third and fourth terms (the \1 and \2) are matching the first and second capture groups (i.e. (foo) and (bar)). You can use capture groups in javascript like this:

/id:(\d+)/.exec("the item has id:57") // => ["id:57", "57"]

Note that in the return we get the whole match, and the subsequent groups that were captured.

Decimal point (aka wildcard)

A decimal point is used to represent a single character that can have any value. This means that the regular expression /.n/ will match any two character string where the second character is an ‘n’. So /.n/.test("on") // => true, /.n/.test("an") // => true but /.n/.test("or") // => false. DrC brings up a good point in the comments that this won’t match a newline character, but I feel in order for that to be an issue you need to explicitly specify multiline mode.

Word boundaries

A word boundary will match against any non-word character that directly precedes, or directly follows a word (i.e. adjacent to a word character). In javascript the word characters are any alpahnumeric and the underscore (mdn), non word is obviously everything else! The trick for word boundaries is that they are zero width assertions, which means they don’t count as a character. That’s why /\w\b\w/ will never match, because you can never have a word boundary between two word characters.

Non-word boundaries

The opposite of a word boundary, instead of matching a point that goes from non-word to word, or word to non-word (i.e. the ends of a word) it will match points where it’s moving between the same types of character. So for our examples /\B../ will match the first point in the string that is between two characters of the same type and the next two characters, in this case it’s between the first ‘n’ and ‘o’, and the next two characters are “oo”. In the second example /y\B./ we are looking for the character ‘y’ followed by a character of matching type (so a word character), and the ‘.’ will match that second character. So “possibly yesterday” won’t match on the ‘y’ at the end of “possibly” because the next character is a space, which is a non word, but it will match the ‘y’ at the beginning of “yesterday”, because it’s followed by a word character, which is then included in the match by the ‘.’ in the regular expression.

Overall, regular expressions are popular in many languages and based off a sound theoretical basis, so there’s a lot of material on these characters. In general, Javascript is very similar to Perl’s PCRE regular expressions (but not exactly the same!), so the majority of your questions about javascript regular expressions would be answered by any PCRE regex tutorial (of which there are many).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s