Regular Expressions aka Regexes

String functions allow for searching for exact occurrences of strings, or string fragments in strings in order to program some reactions. Regular expressions allow for much more sophisticated searching in that they allow us to go beyond exact matches, and in stead look for patterns and react to those. Colloquially patterns are something that something else may be similar to, not necessarily look exactly like. Just as any other programming language JavaScript has a series of functions for that.

Regular expressions are a prominent part of several essential programs of the UNIX Operating System, the mother of all Linuxes, BSDs, OSX, etc. These OS'es have commandline facilities for usage. From there it spread into programming languages.

There are three uses for regular expressions: matching, which can also be used to extract information from a string; substituting new text for matching text; and splitting a string into an array of smaller chunks. [Tatr13].

A regular expression is a string delimited so that it is understood by the language functions catering for it. The choice of delimiter, the slash, /, is conventional.

Whatis Regex?

Well, it looks like this

let regex0 = /cat/;                     // a literal regex object
                                        // or
let regex1 = new RegExp('cat');         // a constructed regex object

The above is a regular expression, searching for the word cat. The first example is literal. Just as a literal string, it is immutable. The second is variable, and if we need that, we use that way to define it. Its use will be shown below in conjunction with the use of two very common special characters in regular expressions.

Example 13.4. Regex, First Example
'use strict';
let s = '';
let haystack = 'Avital named her cat Toulouse.';
let re0 = /cat/;
if (re0.test(haystack)) {               // returns true
    s = `I found the cat in "${haystack}"`;
} else {
    s = `No cat found in "${haystack}"`;
}
console.log(s);

haystack = 'The human cat they called him.';
re0 = /^cat/;
if (re0.test(haystack)) {               // returns false
    s = `I found the cat at the beginning of "${haystack}"`;
} else {
    s = `Not found at beginning of "${haystack}"`;
}
console.log(s);

haystack = 'She loves that cat';
re0 = /cat$/;
if (re0.test(haystack)) {               // returns true
    s = `I found the cat at the end of "${haystack}"`;
} else {
    s = `Not found at end of "${haystack}"`;
}
console.log(s);

The first example returns true because the string cat is found in the haystack. In the second example the caret (^) anchors the regular expression to the beginning of the haystack, and because there's no cat there, it returns false. The meta character ($) anchors the regular expression at the end of the haystack, finds its target and consequently returns true.

Regular expression syntax defines a series of meta characters, ie characters with a special meaning in regular expressions. They are:

. \ + * ? [ ^ ] $ ( ) { } = ! < > | :

If you need to include one of those in a search string, ie if you need to search for a $ for example you must escape it with a backslash. See these examples:

Example 13.5. An Example Searching for a Meta Character
'use strict';
/love\?/.test('What is love? He asked.');   // returns true
/http\:\/\//.test('http://x15.dk');         // returns true

Arguably it bit more paedagogical, but equivalent

Example 13.6. An Equivalent Example Searching for a Meta Character
'use strict';
let re0 = /love\?/;
re0.test('What is love? He asked.');    // returns true
let re1 = /http\:\/\//;
re1.test('http://x15.dk');              // returns true

Character Classes in Regex

Searching for any one of a group of characters you may formulate a regex class with either single characters, ranges of characters, or a combination of those.

Example 13.7. Searching With Character Classes
'use strict';
let re0 = /[abc123]/;
re0.test('b');                          // true

re0 = /[a-d1-5]/;
re0.test('d');                          // true

re0 = /[a-d1-5]/;
re0.test('9');                         // false

re0 = /[0-9a-zA-Z_]/;
re0.test('_$x');                        //true

If you need to groups characters according to types, regular expressions offer you character classes for your typing convenience. They represent a group of characters as opposed to one exact character.

\d
Represents any digit
\D
Represents anything but digits
\w
Represents any word character, ie letters, digits, or underscores
\W
Represents anything but word characters
\s
Represents any white space character, ie spaces, tabs, line feeds, carriage returns, or form feeds
\S
Represents anything but white space characters
.
Any character except newline

The following two expressions search for digits

/[0-9]/
/\d/

It is legal to use the character classes inside your self defined character classes:

/[nm\d]/

Searches for an n, an m, or any digit.

Example 13.8. Searching With Character Classes Representing Types
'use strict';
let re0 = /\d[A-Z]/;
re0.test('3D');                         // true
re0.test('CD');                         // false

re0 = /\S\S\S/;
re0.test('6&c');                        // true
re0.test('6 c');                        // false

re0 = /Pl..../;
re0.test('Please');                     // true

Matching Multiple Characters

Sometimes we need to match for a number of occurrences of for example digit, letters, or special characters. We may introduce multiplicity in the regexes as follows

*
From 0, zero, to many occurences
+
From 1, one, to many
?
Exists? 0, zero, or one
{n}
Occurs n times
{n,}
Occurs n or more times
{n,m}
Occurs between n and m times
Example 13.9. Searching With Multipliers
'use strict';
let re0 = /row,? your boat/;
re0.test('row, row, row your boat');

row,? is the word row followed by zero or one comma.

Subpatterns and Reusability

Subpatterns are placed in parenthesis, (). This allows multiplicity designators to be assigned to the pattern. Reusing the example from above, the pattern row, followed by a space at least once (+).

Example 13.10. Searching With Subpatterns
'use strict';
let re0 = /(row,? )+ your boat/;
re0.test('row, row, row your boat');

Subpatterns may be reused without typing them again. The first subpattern in a regex is implicitly numbered as #1, the second is number #2, etc. Reusing a subpattern is done by introducing a \1, or \2 and so forth. It is a strange choice of designator to use the backslash for this, because it has a negative side effect. When you reuse subpatterns, your regex must be quoted in single quotes. Otherwise the \1 will be interpreted as an escaped digit 1.

Let me use Dow10's example illustrating the use of subpatterns. Slightly adapted.

Example 13.11. Searching With Character Classes
'use strict';
let myPets = 'favoritePet=Blondie, Maverick=dog, Blondie=cat';
let rx = /favoritePet\=(\w+).*\1\=(\w+)/;
let matches = myPets.match(rx);
console.log(`My favorite pet is a ${matches[2]} called ${matches[1]}.`);
// Displays "My favorite pet is a cat called Blondie."

You find two subpatterns in the above example, the first matching the name in 'favoritePet=Blondie', the second one will find the species in 'Blondie=cat'. The '.*' find a number of characters before looking for Blondie via the first pattern. The second pattern is looking for the word characters after the last (=).

Example 13.12. Look for a Number in a String
'use strict';
const numero = function (s) {
    let r = /(\d+)/;
    let a = s.match(r);
    return Number(a[1]);
};

while (true) {
    let s = prompt('enter string with a number');
    let n = numero(s);
    console.log(n);
    if (n === 666)
        break;
}

Example 13.13. Look for a Number in a String - 2
'use strict';
let s = 'mælk=800 bla bla ost=123 bla bla rugbrød=14';
let r = /(\d+)/;
let a = s.match(r);
console.log(a);


r = /(\d+)/g;
a = s.match(r);
console.log(a);

Documentation has differences in the output array depending on the presence of g parameter.


Anchors

^ or \A
Anchors the regex at the begining of a string
$ or \z
Anchors the regex at the end of a string
\Z
Anchors the regex at the end of a string or just before a newline at the end of a string
\b
Anchors the regex at a word boundary.
Example 13.14. Searching With Character Classes
'use strict';
let re0 = /over/;
re0.test('My hovercraft is full of eels');      // true
let re1 = /\bover\b/;
re1.test('My hovercraft is full of eels');      // false
re1.test('One flew over the cuckoo’s nest');    // true

\B
Anchors the regex at anything but a word boundary
\G
Anchors the regex at the starting offset of a string

An Example Inspired by [Doy11]

Example 13.15. The Page regexDoyle.html
<!doctype html>
<html>
    <head>
        <meta charset='utf-8'/>
        <meta name='viewport' content='width=device-width, initial-scale=1.0'>
        <title class='title'></title>
        <style>
            form {
                width: 30em;
            }
            #url {
                width: 90%;
            }
        </style>
        <script src='regexDoyle.js'></script>
    </head>
    <body>
        <header><h1 class='title'></h1></header>
        <main>
            <h2>Enter a URL to scan:</h2>
            <form action='#' method='post'>
                <div>
                    <label for='url'>URL:</label>
                    <input type='url' id='url' placeholder='http://www.example.org/example.txt'/>
                    <p></p>
                    <label> </label>
                    <input type='button' id='btn' value='Find Links on that page'/>
                </div>
            </form>
            <p id='remote'></p>
        </main>
        <footer></footer>
    </body>
</html>

Click Here!


Example 13.16. The Code
'use strict';
const $ = function (foo) { return document.getElementById(foo); };
const ajaxobj = new XMLHttpRequest();

const getFile = function (ajax, url, callback) {
    console.log(`nml: ${url}`);
    try {
        ajax.addEventListener('load', function(ev) {    
            callback(ev);
        });
        ajax.open('get', url);
        ajax.send('');
    } catch(err) {
        window.alert(`WTF: \n${err.message}`);
    } 
}

const handler = function (e) {
    $('remote').innerHTML = '';
    let s = e.target.responseText;
    //console.log('nml s ' + s);
    
    let r = /<a\s*href=[\'"](.+?)[\'"].*?>/gi;
    let a = s.match(r);
    //console.log(`\nml ${a.length} ${a[1].length}\n ${a}`);
    
    let di = document.createElement('div');
    let h2 = document.createElement('h2');
    let h2t = document.createTextNode(`Linked URLs found at ${url}`);
    h2.appendChild(h2t);
    di.appendChild(h2);
    let ul = document.createElement('ul');
    for (let elm of a.slice(1)) {
        let li = document.createElement('li');
        let lit = document.createTextNode(elm);
        li.appendChild(lit);
        ul.appendChild(li);
    }
    di.appendChild(ul);
    $('remote').appendChild(di);
}

const getRemoteContent = function (e) {
    let url = $('url').value;
    e.preventDefault();
    getFile(ajaxobj, url, handler);
}

const showStarter = function () {
    const titles = document.getElementsByClassName('title');
    for (let title of titles)
        title.innerHTML = 'Get Linked URLs from Webpage';
    $('btn').addEventListener('click', getRemoteContent);
}

window.addEventListener("load", showStarter);                   // kick off JS