Regular expressions in JavaScript. Expressive JavaScript: Regular Expressions Working with regular expressions in javascript

💖 Do you like it? Share the link with your friends

Regular Expressions allow you to perform a flexible search for words and expressions in texts in order to delete, extract or replace them.

Syntax:

//First option for creating a regular expression var regexp=new RegExp( sample,modifiers); //Second option for creating a regular expression var regexp=/ sample/modifiers;

sample allows you to specify a character pattern for the search.

modifiers allow you to customize search behavior:

  • i- search without taking into account the case of letters;
  • g- global search (all matches in the document will be found, not just the first);
  • m- multi-line search.

Search for words and expressions

The simplest use of regular expressions is to search for words and expressions in various texts.

Here is an example of using search using modifiers:

//Let's set regular expression rv1 rv1=/Russia/; //Specify the regular expression rv2 rv2=/Russia/g; //Specify the regular expression rv3 rv3=/Russia/ig; //Bold indicates where matches will be found in the text when using //the expression rv1: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR. //Bold indicates where matches will be found in the text when using //the expression rv2: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR."; //Bold font indicates where matches will be found in the text when using //the expression rv3: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR.";

Special symbols

In addition to regular characters, regular expression patterns can use Special symbols(metacharacters). Special characters with descriptions are shown in the table below:

Special character Description
. Matches any character except the end of line character.
\w Matches any alphabetic character.
\W Matches any non-alphabetic character.
\d Matches characters that are numbers.
\D Matches characters that are not numbers.
\s Matches whitespace characters.
\S Matches non-whitespace characters.
\b Matches will only be found at word boundaries (beginning or ending).
\B Matches will be searched only on non-word boundaries.
\n Matches the newline character.

/* The reg1 expression will find all words starting with two arbitrary letters and ending with "vet". Since the words in the sentence are separated by a space, we will add a special character \s at the beginning and at the end */ reg1=/\s..vet\s/g; txt="hello covenant corduroy closet"; document.write(txt.match(reg1) + "
"); /* The reg2 expression will find all words starting with three arbitrary letters and ending with "vet" */ reg2=/\s...vet\s/g; document.write(txt.match(reg2) + "
"); txt1=" hi2hello hi 1hello "; /* The reg3 expression will find all words that start with "at" followed by 1 digit and end with "vet" */ var reg3=/at\dvet/g; document .write(txt1.match(reg3) + "
"); // The expression reg4 will find all the numbers in the text var reg4=/\d/g; txt2="5 years of study, 3 years of sailing, 9 years of shooting." document.write(txt2.match(reg4) + "
");

Quick view

Symbols in square brackets

Using square brackets [keyu] You can specify a group of characters to search for.

The ^ character before a group of characters in square brackets [^kwg] indicates that you need to search for all characters of the alphabet except the specified ones.

Using a dash (-) between characters in square brackets [a-z] You can specify a range of characters to search for.

You can also search for numbers using square brackets.

//Set the regular expression reg1 reg1=/\sko[tdm]\s/g; //Set a text string txt1 txt1=" cat braid code chest of drawers com carpet "; //Using the regular expression reg1, search for the string txt1 document.write(txt1.match(reg1) + "
"); reg2=/\sslo[^tg]/g; txt2=" slot elephant syllable "; document.write(txt2.match(reg2) + "
"); reg3=//g; txt3="5 years of study, 3 years of swimming, 9 years of shooting"; document.write(txt3.match(reg3));

Quick view

Quantifiers

Quantifier- this is a construction that allows you to specify how many times the preceding character or group of characters should appear in a match.

Syntax:

//Preceding character must occur x - times (x)//The preceding character must occur from x to y times inclusive (x,y)//The preceding character must appear at least x times (x,)//Specifies that the preceding character must occur 0 or more times * //Specifies that the preceding character must occur 1 or more times + //Specifies that the preceding character must occur 0 or 1 time ?


//Specify the regular expression rv1 rv1=/ko(5)shka/g //Specify the regular expression rv2 rv2=/ko(3,)shka/g //Specify the regular expression rv3 rv3=/ko+shka/g //Specify regular expression rv4 rv4=/ko?shka/g //Set the regular expression rv5 rv5=/ko*shka/g //Bold font shows where in the text matches will be found when using //the expression rv1: kshka cat kooshka koooshka kooooshka kooooshka kooooooshka kooooooshka //Bold indicates where in the text matches will be found when using //the rv2 expression: kshka cat kooshka kooooshka kooooshka kooooooshka kooooooshka kooooooshka//Bold indicates where in the text matches will be found when using //the expression rv3: kshka cat kooshka kooooshka kooooshka kooooshka kooooooshka kooooooshka//Bold indicates where in the text matches will be found when using //the rv4 expression: kshka cat kooshka koooshka kooooshka koooooshka koooooshka kooooooshka //Bold indicates where in the text matches will be found when using //the rv5 expression: kshka cat kooshka kooooshka kooooshka kooooshka kooooooshka kooooooshka

Note: if you want to use any special character (such as . * + ? or ()) like a regular character, you must put a \ in front of it.

Using parentheses

By enclosing part of a regular expression pattern in parentheses, you tell the expression to remember the match found by that part of the pattern. The saved match can be used later in your code.

For example, the regular expression /(Dmitry)\sVasiliev/ will find the string “Dmitry Vasiliev” and remember the substring “Dmitry”.

In the example below, we use the replace() method to change the order of words in the text. We use $1 and $2 to access stored matches.

Var regexp = /(Dmitry)\s(Vasiliev)/; var text = "Dmitry Vasiliev"; var newtext = text.replace(regexp, "$2 $1"); document.write(newtext);

Quick view

Parentheses can be used to group characters before quantifiers.

This article covers the basics of using regular expression in Javascript.

Introduction

What is a regular expression?

A JS regular expression is a sequence of characters that forms a search rule. This rule can then be used to search through text as well as replace it. In practice, a regular expression can even consist of a single character, but more complex search patterns are more common.

In Javascript, regular expressions are also objects. These are patterns used to match sequences of characters in strings. They are used in the exec() and test() methods of the RegExp object, and in the match(), replace(), search, and split() methods of the String object.

Example

var pattern = /example/i

/example/i is a regular expression. example is a template ( which will be used in the search). i is a modifier indicating case sensitivity.

Preparing a regular expression

JS regular expressions consist of a pattern and a modifier. The syntax will be something like this:

/pattern/modifiers;

The template specifies the search rule. It consists of simple characters like /abc/ or a combination of simple and special characters: /abc/ or /Chapter (d+).d/ .

Template table

Modifiers allow you to make queries case sensitive, global, and so on. They are used to conduct case-sensitive searches as well as global searches.

Modifier table

Now we are ready to apply JS regular expressions. There are two main ways to do this: using a regular expression object or a regular expression on a string.

Using a regular expression object

Create a regular expression object

This object describes a character pattern. It is used for pattern matching. There are two ways to construct a regular expression object.

Method 1: Using a regular expression literal that consists of a pattern enclosed in slashes, for example:

var reg = /ab+c/;

Regular expression literals trigger pre-compilation of the regular expression when the script is parsed. If the regular expression is constant, then use it to improve performance.

Method 2: Calling the constructor function of the RegExp object, for example:

var reg = new RegExp("ab+c");

Using a constructor allows the JS regular expression to be compiled while the script is running. Use this method, if the regular expression will change or you don't know the pattern in advance. For example, if you receive information from a user who enters a search query.

Regular Expression Object Methods

Let's take a look at a few common regular expression object methods:

  • compile() ( deprecated in version 1.5) – compiles the regular expression;
  • exec() - Performs a string match. Returns the first match;
  • test() - performs a match on a string. Returns true or false ;
  • toString() – returns the string value of the regular expression.

Examples

Using test()

The test() method is a regular expression of the RegExp object. It searches for a pattern string and returns true or false depending on the result. The following JS regular expression example shows how a string is searched for the character “ e”:

var patt = /e/; patt.test("The best things in the world are free!");

Since here in the line there is “ e”, the result of this code will be true .

Regular expressions do not have to be placed in a variable. The same query can be done in one line:

/e/.test("The best things in the world are free!");

Using exec()

It searches a string using a given search rule and returns the text found. If no matches were found, the result will be null .

Let's look at the method in action, using the example of the same symbol “ e”:

/e/.exec("The best things in the world are free!");

Since the line contains “ e”, the result of this code will be .e .

Applying a regular expression to a string

In Javascript, these expressions can also be used with two methods of the String object: search() and replace(). They are needed to perform search and replace in text.

  • search() method - uses an expression to search for a match, and returns information about the location of the match;
  • The replace() method returns a modified string with a replaced pattern.

Examples

Using a JS regular expression to perform a case-sensitive search for the phrase “ w3schools" in line:

var str = "Visit W3Schools"; var n = str.search(/w3schools/i);

The result in n will be 6.

The search method also takes a string as an argument. The string argument will be converted to a regular expression:

Using string to search for the phrase “ W3schools" in line.

In JavaScript, regular expressions are represented by RegExp objects. RegExp objects can be created using the RegExp() constructor, but more often they are created using a special literal syntax. Just as string literals are specified as characters enclosed in quotation marks, regular expression literals are specified as characters enclosed in a slash pair / .

/pattern/flags new RegExp("pattern"[, search options])

pattern- a regular expression for searching (more on replacement later), and flags - a string of any combination of characters g (global search), i (case is not important) and m (multi-line search). The first method is used often, the second - sometimes. For example, two such calls are equivalent.

Search options

When creating a regular expression, we can specify additional search options

Characters in JavaScript Regular Expressions

SymbolCorrespondence
Alphanumeric charactersCorrespond to themselves
\0 NUL character (\u0000)
\tTab (\u0009)
\nLine feed (\u000A)
\vVertical tab (\u000B)
\fPage translation (\u000C)
\rCarriage return (\u000D)
\xnnA character from the Latin set, specified by the hexadecimal number nn; for example, \x0A is the same as \n
\uxxxxUnicode character specified by hexadecimal number xxxx; for example, \u0009 is the same as \t
\cXThe control character "X", for example, the sequence \cJ is equivalent to the newline character \n
\ For regular characters - makes them special. For example, the expression /s/ simply looks for the character "s". And if you put \ before s, then /\s/ already denotes a space character. And vice versa, if the character is special, for example *, then \ will make it just a regular “asterisk” character. For example, /a*/ searches for 0 or more consecutive "a" characters. To find a with an asterisk "a*" - put \ in front of the special. symbol: /a\*/ .
^ Indicates the beginning of the input data. If the multiline search flag ("m") is set, it will also fire on the start of a new line. For example, /^A/ will not find the "A" in "an A", but will find the first "A" in "An A."
$ Indicates the end of the input data. If the multiline search flag is set, it will also work at the end of the line. For example, /t$/ will not find "t" in "eater", but it will find it in "eat".
* Indicates repetition 0 or more times. For example, /bo*/ will find "boooo" in "A ghost booooed" and "b" in "A bird warbled", but will find nothing in "A goat grunted".
+ Indicates repetition 1 or more times. Equivalent to (1,). For example, /a+/ will match the "a" in "candy" and all the "a" in "caaaaaaandy".
? Indicates that the element may or may not be present. For example, /e?le?/ will match "el" in "angel" and "le" in "angle." If used immediately after one of the quantifiers * , + , ? , or () , then specifies a "non-greedy" search (repeating the minimum number of times possible, to the nearest next pattern element), as opposed to the default "greedy" mode, which maximizes the number of repetitions, even if the next pattern element also matches. Additionally , ? used in the preview, which is described in the table under (?=) , (?!) , and (?:) .
. (Decimal point) represents any character other than a newline: \n \r \u2028 or \u2029. (you can use [\s\S] to search for any character, including newlines). For example, /.n/ will match "an" and "on" in "nay, an apple is on the tree", but not "nay".
(x)Finds x and remembers. This is called "memory brackets". For example, /(foo)/ will find and remember "foo" in "foo bar." The found substring is stored in the search result array or in the predefined properties of the RegExp object: $1, ..., $9. In addition, the parentheses combine what is contained in them into a single pattern element. For example, (abc)* - repeat abc 0 or more times.
(?:x)Finds x, but does not remember what it finds. This is called "memory parentheses". The found substring is not stored in the results array and RegExp properties. Like all brackets, they combine what is in them into a single subpattern.
x(?=y)Finds x only if x is followed by y. For example, /Jack(?=Sprat)/ will only match "Jack" if it is followed by "Sprat". /Jack(?=Sprat|Frost)/ will only match "Jack" if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" will appear in the search result.
x(?!y)Finds x only if x is not followed by y. For example, /\d+(?!\.)/ will only match a number if it is not followed by a decimal point. /\d+(?!\.)/.exec("3.141") will find 141, but not 3.141.
x|yFinds x or y. For example, /green|red/ will match "green" in "green apple" and "red" in "red apple."
(n)Where n is a positive integer. Finds exactly n repetitions of the preceding element. For example, /a(2)/ will not find the "a" in "candy," but will find both a's in "caandy," and the first two a's in "caaandy."
(n,)Where n is a positive integer. Finds n or more repetitions of an element. For example, /a(2,) will not find "a" in "candy", but will find all "a" in "caandy" and in "caaaaaaandy."
(n,m)Where n and m are positive integers. Find from n to m repetitions of the element.
Character set. Finds any of the listed characters. You can indicate spacing by using a dash. For example, - the same as . Matches "b" in "brisket" and "a" and "c" in "ache".
[^xyz]Any character other than those specified in the set. You can also specify a span. For example, [^abc] is the same as [^a-c] . Finds "r" in "brisket" and "h" in "chop."
[\b]Finds the backspace character. (Not to be confused with \b .)
\bFinds a (Latin) word boundary, such as a space. (Not to be confused with [\b]). For example, /\bn\w/ will match "no" in "noonday"; /\wy\b/ will find "ly" in "possibly yesterday."
\BIt does not indicate a word boundary. For example, /\w\Bn/ will match "on" in "noonday", and /y\B\w/ will match "ye" in "possibly yesterday."
\cXWhere X is a letter from A to Z. Indicates a control character in a string. For example, /\cM/ represents the Ctrl-M character.
\dfinds a number from any alphabet (ours is Unicode). Use to find only regular numbers. For example, /\d/ or // will match the "2" in "B2 is the suite number."
\DFinds a non-numeric character (all alphabets). [^0-9] is the equivalent for regular numbers. For example, /\D/ or /[^0-9]/ will match the "B" in "B2 is the suite number."
\sFinds any whitespace character, including space, tab, newline, and other Unicode whitespace characters. For example, /\s\w*/ will match "bar" in "foo bar."
\SFinds any character except whitespace. For example, /\S\w*/ will match "foo" in "foo bar."
\vVertical tab character.
\wFinds any word (Latin alphabet) character, including letters, numbers and underscores. Equivalent. For example, /\w/ will match "a" in "apple," "5" in "$5.28," and "3" in "3D."
\WFinds any non-(Latin) verbal character. Equivalent to [^A-Za-z0-9_] . For example, /\W/ and /[^$A-Za-z0-9_]/ will equally match "%" in "50%."

Working with Regular Expressions in Javascript

Working with regular expressions in Javascript is implemented by methods of the String class

exec(regexp) - finds all matches (entries in the regular pattern) in a string. Returns an array (if there is a match) and updates the regexp property, or null if nothing is found. With the g modifier - each time this function is called, it will return the next match after the previous one found - this is implemented by maintaining an offset index of the last search.

match(regexp) - find part of a string using a pattern. If the g modifier is specified, then match() returns an array of all matches or null (rather than an empty array). Without the g modifier, this function works like exec();

test(regexp) - the function checks a string for matching a pattern. Returns true if there is a match, and false if there is no match.

split(regexp) - Splits the string it is called on into an array of substrings, using the argument as a delimiter.

replace(regexp, mix) - the method returns a modified string in accordance with the template (regular expression). The first parameter to regexp can also be a string rather than a regular expression. Without the g modifier, the method in the line replaces only the first occurrence; with the modifier g - a global replacement occurs, i.e. all occurrences in a given line are changed. mix - replacement template, can accept the values ​​of a string, replacement template, function (function name).

Special characters in the replacement string

Replacement via function

If you specify a function as the second parameter, it is executed for each match. A function can dynamically generate and return a substitution string. The first parameter of the function is the found substring. If the first argument to replace is a RegExp object, then the next n parameters contain nested parentheses matches. The last two parameters are the position in the line where the match occurred and the line itself.

new RegExp(pattern[, flags])

regular expression ADVANCE

It is known that literal syntax is preferred(/test/i).

If the regular expression is not known in advance, then it is preferable to create a regular expression (in a character string) using the constructor (new RegExp).

But pay attention, since the “slash sign” \ plays the role of code switching, it has to be written twice in the string literal (new RegExp): \\

Flags

i ignore case when matching

g global matching, unlike local matching (by default, matches only the first instance of the pattern), allows matches of all instances of the pattern

Operators

What How Description Usage
i flag does reg. case insensitive expression /testik/i
g flag global search /testik/g
m flag allows matching against many strings that can be obtained from textarea
character class operator character set matching - any character in the range from a to z;
^ caret operator except [^a-z] - any character EXCEPT characters in the range from a to z;
- hyphen operator indicate the range of values, inclusive - any character in the range from a to z;
\ escape operator escapes any following character \\
^ start matching operator pattern matching must happen at the beginning /^testik/g
$ end-of-matching operator pattern matching should happen at the end /testik$/g
? operator? makes the character optional /t?est/g
+ operator + /t+est/g
+ operator + the symbol must be present once or more than once /t+est/g
* operator * the symbol must be present once or repeatedly or be absent altogether /t+est/g
{} operator() set a fixed number of character repetitions /t(4)est/g
{,} operator (,) set the number of repetitions of a symbol within certain limits /t(4,9)est/g

Predefined Character Classes

Predefined member Comparison
\t horizontal tab
\n Line translation
. Any character other than Line Feed
\d Any tenth digit, which is equivalent
\D Any character other than the tenth digit, which is equivalent to [^0-9]
\w Any character (numbers, letters and underscores) that is equivalent
\W Any character other than numbers, letters, and underscores, which is equivalent to [^A-Za-z0-9]
\s Any space character
\S Any character except space
\b Word boundary
\B NOT the boundary of the word, but its internal. Part

Grouping()

If you want to apply an operator such as + (/(abcd)+/) to a group of members, you can use parentheses () .

Fixations

The part of the regular expression enclosed in parentheses () is called fixation.

Consider the following example:

/^()k\1/

\1 is not any character from a , b , c .
\1 is any character that initiates match the first character. That is, the character that matches \1 is unknown until the regular expression is resolved.

Unfixed groups

Brackets () are used in 2 cases: for grouping and for denoting fixations. But there are situations when we need to use () only for grouping, since commits are not required, in addition, by removing unnecessary commits we make it easier for the regular expression processing mechanism.

So to prevent fixation Before the opening parenthesis you need to put: ?:

str = "

Hello world!
"; found = str.match(/<(?:\/?)(?:\w+)(?:[^>]*?)>/i); console.log("found without fix: ", found); // [ "
" ]

test function

Regexp.test()

The test function checks whether the regular expression matches the string (str). Returns either true or false .

Usage example:

Javascript

function codeF(str)( return /^\d(5)-\d(2)/.test(str); ) //console.log(codeF("12345-12ss")); // true //console.log(codeF("1245-12ss")); // false

match function

str.match(regexp)

The match function returns an array of values ​​or null if no matches are found. Check: if the regular expression does not have the g flag (to perform a global search), then the match method will return the first match in the string, and, as can be seen from the example, in an array of matches FIXATIONS fall(part of the regular expression enclosed in parentheses).

Javascript

str = "For information, please refer to: Chapter 3.4.5.1"; re = /chapter (\d+(\.\d)*)/i // with commits (without global flag) found = str.match(re) console.log(found); // ["Chapter 3.4.5.1", "3.4.5.1", ".1"]

If you provide the match() method with a global regular expression (with the g flag), then an array will also be returned, but with GLOBAL matches. That is, the recorded results are not returned.

Javascript

str = "For information, refer to: Chapter 3.4.5.1, Chapter 7.5"; re = /chapter (\d+(\.\d)*)/ig // without commits - globally found = str.match(re) console.log(found); // ["Chapter 3.4.5.1", "Chapter 7.5"]

exec function

regexp.exec(str)

The exec function checks whether a regular expression matches a string (str). Returns an array of results (with commits) or null . Each subsequent call to the exec method (for example, when using while) occurs (due to automatic update when exec the index of the end of the last search lastIndex) moves to the next global match (if the g flag is specified).

Javascript

var html = "
BAM! BUM!
"; var reg = /<(\/?)(\w+)([^>]*?)>/g; //console.log(reg.exec(html)); // ["
", "", "div", " class="test""] while((match = reg.exec(html)) !== null)( console.log(reg.exec(html)); ) /* [" ", "", "b", ""] [" ", "", "em", ""] ["
", "/", "div", ""] */

Without the global flag, the match and exec methods work identically. That is, they return an array with the first global match and commits.

Javascript

// match var html = "
BAM! BUM!
"; var reg = /<(\/?)(\w+)([^>]*?)>/; // without global console.log(html.match(reg)); // ["
", "", "div", " class="test""] // exec var html = "
BAM! BUM!
"; var reg = /<(\/?)(\w+)([^>]*?)>/; // without global console.log(reg.exec(html)); // ["
", "", "div", " class="test""]

replace function

str.replace(regexp, newSubStr|function)
  • regexp - reg. expression;
  • newSubStr - the string to which the found expression in the text is changed;
  • function - called for each match found with a variable list of parameters (recall that a global search in a string finds all instances of a pattern match).

The return value of this function serves as a replacement.

Function parameters:

  • 1 - Complete matched substring.
  • 2 - The meaning of bracket groups (fixations).
  • 3 - Index (position) of the match in the source string.
  • 4 - Source string.

The method does not change the calling string, but returns a new one after replacing the matches. To perform a global search and replace, use regexp with the g flag.

"GHGHGHGTTTT".replace(//g,"K"); //"KKKKKKKKKKK"

Javascript

function upLetter(allStr,letter) ( return letter.toUpperCase(); ) var res = "border-top-width".replace(/-(\w)/g, upLetter); console.log(res); //borderTopWidth

Regular Expressions is a language that describes string patterns based on metacharacters. A metacharacter is a character in a regular expression that describes some class of characters in a string, indicates the position of a substring, indicates the number of repetitions, or groups characters into a substring. For example, the metacharacter \d describes digits, and $ denotes the end of a line. A regular expression can also contain ordinary characters that describe themselves. The set and meaning of metacharacters in regular expressions is described by the PCRE standard, most of whose features are supported in JS.

Scope of regular expressions

Regular expressions are typically used for the following tasks:

  • Comparison. The goal of this task will be to find out whether a certain text matches a given regular expression.
  • Search. Using regular expressions, it is convenient to find the corresponding substrings and extract them from the text.
  • Replacement. Regular expressions often help not only to find, but also to replace a substring in the text that matches the regular expression.

Ultimately, using regular expressions you can, for example:

  • Check that the user data in the form is filled out correctly.
  • Find a link to an image in the text entered by the user so that it can be automatically attached to the message.
  • Remove html tags from the text.
  • Check code before compilation for simple syntax errors.

Features of regular expressions in JS. Regular Expression Literals

The main feature of regular expressions in JS is that there is a separate type of literal for them. Just as string literals are surrounded by quotation marks, regular expression literals are surrounded by slashes (/). Thus, JS code can contain expressions like:

console.log(typeof /tcoder/); // object

In fact, the regular expression that is defined in the line

var pattern = new RegExp("tcoder");

This creation method is usually used when you need to use variables in a regular expression, or create a regular expression dynamically. In all other cases, regular expression literals are used due to the shorter syntax and the absence of the need to additionally escape some characters.

Characters in regular expressions

All alphanumeric characters in regular expressions are not metacharacters and describe themselves. This means that the regular expression /tcoder/ will match the substring tcoder. In regular expressions, you can also specify non-alphabetic characters, such as newline (\n), tab (\t) and so on. All these symbols also correspond to themselves. Preceding an alphabetic character with a backslash (\) will make it a metacharacter, if there is one. For example, the alphabetic character "d" will become a metacharacter describing numbers if it is preceded by a slash (\d).

Character classes

Single characters in regular expressions can be grouped into classes using square brackets. The class created in this way corresponds to any of the symbols included in it. For example, the regular expression // the letters “t”, “c”, “o”, “d”, “e”, “r” will correspond.

In classes you can also specify a range of characters using a hyphen. For example, a class corresponds to a class. Note that some metacharacters in regular expressions already describe character classes. For example, the \d metacharacter is equivalent to the class . Note that metacharacters describing character classes can also be included in the classes. For example, the class [\da-f] corresponds to the numbers and letters “a”, “b”, “d”, “e”, “f”, that is, any hexadecimal character.

It is also possible to describe a character class by specifying characters that should not be included in it. This is done using the metacharacter ^. For example, the class [^\d] will match any character other than a number.

Repetitions

Now we can describe, say, decimal number of any given length, simply by writing in a row as many metacharacters \d as there are digits in this number. Agree that this approach is not very convenient. In addition, we cannot describe the range of required repetitions. For example, we cannot describe a number with one or two digits. Fortunately, regular expressions provide the ability to describe repetition ranges using metacharacters. To do this, after the symbol, simply indicate the range of repetitions in curly braces. For example, the regular expression /tco(1, 3)der/ the strings "tcoder", "tcooder" and "tcoooder" will match. If you omit the maximum number of repetitions, leaving a comma and a minimum number of repetitions, you can specify a number of repetitions greater than the specified one. For example, the regular expression /bo(2,)bs/ will match the strings “boobs”, “booobs”, “boooobs” and so on with any number of “o” letters, at least two.

If you omit the comma in the curly brackets and simply indicate one number, then it will indicate the exact number of repetitions. For example, the regular expression /\d(5)/ correspond to five-digit numbers.

Some repetition ranges are used quite often and have their own metacharacters to denote them.

Greedy repetitions

The above syntax describes the maximum number of repetitions, that is, from all possible numbers of repetitions, the number of which lies in the specified range, the maximum is selected. Such repetitions are called greedy. This means that the regular expression /\d+/ in the string yeah!!111 will match the substring “111”, not “11” or “1”, although the metacharacter “+” describes one or more repetitions.

If you want to implement non-greedy repetition, that is, select the minimum possible number of repetitions from the specified range, then simply put the “?” after the rep range. For example, the regular expression /\d+?/ in the string “yeah!!111” the substring “1” will match, and the regular expression /\d(2,)/ in the same line the substring “11” will match.

It is worth paying attention to an important feature of non-greedy repetition. Consider the regular expression /bo(2,)?bs/. In the line “i like big boooobs” it will be matched, as with greedy repetition, by the substring boooobs, and not boobs, as one might think. The fact is that a regular expression cannot match several substrings located in different places in the line in one match. That is, our regular expression cannot match the substrings “boo” and “bs” merged into one line.

Alternatives

In regular expressions, you can also use alternatives - to describe a set of strings that matches either one or the other part of the regular expression. Such parts are called alternatives and are separated using vertical line. For example, the regular expression /two|twice|\2/ either the substring “two”, or the substring “twice”, or the substring “2” can match. The chain of alternatives is processed from left to right until the first match and can only be matched by a substring that is described by only one alternative. For example, the regular expression /java|script/ in the string “I like javascript” only the substring “java” will match.

Groups

To treat multiple characters as a single unit when using repetition ranges, character classes, and everything in between, simply put them in parentheses. For example, the regular expression /true(coder)?/ the strings "truecoder" and "true" will match.

Links

In addition to the fact that parentheses combine characters in a regular expression into a single whole, the corresponding substring can be referenced by simply specifying after the slash the number of the left parenthesis from the pair of parentheses framing it. Brackets are numbered from left to right starting with one. For example, in the regular expression /(one(two)(three))(four)/\1 refers to one, \2 to "two", \3 to "three", \4 to "four". As an example of using such links, we give a regular expression /(\d)\1/, which corresponds to two-digit numbers with the same digits. An important limitation the use of backlinks is the impossibility of using them in classes, that is, for example, to describe a two-digit number with different numbers regular expression /(\d)[^\1]/ it is forbidden.

Unmemorable parentheses

Often you just want to group the symbols, but there is no need to create a link. In this case, you can write ?: immediately after the left grouping bracket. For example, in the regular expression /(one)(?:two)(three)/\2 will indicate "three".

Such parentheses are sometimes called non-remembering. They have another important feature, which we will talk about in the next lesson.

Specifying a position

In regular expressions, there are also metacharacters that indicate a certain position in the string. The most commonly used symbols are ^ and $, indicating the beginning and end of a line. For example, the regular expression /\..+$/ extensions in file names will match, and the regular expression /^\d/ the first digit in the line, if there is one.

Positive and negative forward checks

Using regular expressions, you can also describe a substring that is followed or not followed by a substring described by another pattern. For example, we need to find the word java only if it is followed by “script”. This problem can be solved using a regular expression /java(?=script)/. If we need to describe the substring “java” that is not followed by script, we can use a regular expression /java(?!script)/.

Let's collect everything we talked about above into one table.

Symbol Meaning
a|b Matches either a or i.
(…) Grouping brackets. You can also refer to the substring corresponding to the pattern in brackets.
(?:…) Only grouping, without the ability to link.
\n Link to a substring matching the nth pattern.
^ The beginning of the input data or the beginning of the line.
$ End of input or end of line.
a(?=b) Matches the substring described by pattern a only if it is followed by the substring described by pattern b.
a(?!b) Matches the substring described by pattern a only if followed by Not follows the substring described by pattern b.

Flags

And finally, the last element of regular expression syntax. Flags specify matching rules that apply to the entire regular expression. Unlike all other elements in regular expression syntax, they are written immediately after the regular expression literal, or passed in line as the second parameter to the object's constructor RegExp.

There are only three regular expression flags in JavaScript:

i– when specifying this flag, case is not taken into account, that is, for example, a regular expression \javascript\i will match the strings "javascript", "JavaScript", "JAVASCRIPT", "jAvAScript", etc.

m– this flag enables multi-line search. This means that if the text contains line feed characters and this flag is set, then the symbols ^ and $, in addition to the beginning and end of the entire text, will also correspond to the beginning and end of each line in the text. For example, the regular expression /line$/m matches the substring “line”, both in the string “first line” and in the string “one\nsecond line\ntwo”.

g– enables a global search, that is, a regular expression, if this flag is enabled, will match all substrings that match it, and not just the first, as is the case if this flag is not present.

Flags can be combined with each other in any order, that is \tcoder\mig, \tcoder\gim, \tocder\gmi etc., it's the same thing. The order of the flags also does not matter if they are passed in a line as the second argument to the object constructor RegExp, that is new RegExp("tcoder", "im") And new RegExp("tcoder", "im") just the same thing.

ZY

Regular expressions are very powerful and handy tool for working with strings, allowing you to reduce hundreds of lines of code to a single expression. Unfortunately, their syntax is sometimes too complex and difficult to read, and even the most experienced developer can forget what a rather complex regular expression he wrote a couple of days ago meant if he did not comment on it. For these reasons, sometimes it is still worth abandoning regular expressions in favor of regular methods for working with strings.



tell friends