-
Notifications
You must be signed in to change notification settings - Fork 22.4k
/
index.md
173 lines (130 loc) · 16.2 KB
/
index.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
title: Regular expressions
slug: Web/JavaScript/Reference/Regular_expressions
page-type: landing-page
browser-compat: javascript.regular_expressions
---
{{jsSidebar}}
A **regular expression** (_regex_ for short) allow developers to match strings against a pattern, extract submatch information, or simply test if the string conforms to that pattern. Regular expressions are used in many programming languages, and JavaScript's syntax is inspired by [Perl](https://www.perl.org/).
You are encouraged to read the [regular expressions guide](/en-US/docs/Web/JavaScript/Guide/Regular_expressions) to get an overview of the available regex syntaxes and how they work.
## Description
[_Regular expressions_](https://en.wikipedia.org/wiki/Regular_expression) are a important concept in formal language theory. They are a way to describe a possibly infinite set of character strings (called a _language_). A regular expression, at its core, needs the following features:
- A set of _characters_ that can be used in the language, called the _alphabet_.
- _Concatenation_: `ab` means "the character `a` followed by the character `b`".
- _Union_: `a|b` means "either `a` or `b`".
- _Kleene star_: `a*` means "zero or more `a` characters".
Assuming a finite alphabet (such as the 26 letters of the English alphabet, or the entire Unicode character set), all regular languages can be generated by the features above. Of course, many patterns are very tedious to express this way (such as "10 digits" or "a character that's not a space"), so JavaScript regular expressions include many shorthands, introduced below.
> [!NOTE]
> JavaScript regular expressions are in fact not regular, due to the existence of [backreferences](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Backreference) (regular expressions must have finite states). However, they are still a very useful feature.
### Creating regular expressions
A regular expression is typically created as a literal by enclosing a pattern in forward slashes (`/`):
```js
const regex1 = /ab+c/g;
```
Regular expressions can also be created with the {{jsxref("RegExp/RegExp", "RegExp()")}} constructor:
```js
const regex2 = new RegExp("ab+c", "g");
```
They have no runtime differences, although they may have implications on performance, static analyzability, and authoring ergonomic issues with escaping characters. For more information, see the [`RegExp`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#literal_notation_and_constructor) reference.
### Regex flags
Flags are special parameters that can change the way a regular expression is interpreted or the way it interacts with the input text. Each flag corresponds to one accessor property on the `RegExp` object.
| Flag | Description | Corresponding property |
| ---- | --------------------------------------------------------------------------------------------- | ----------------------------------------------- |
| `d` | Generate indices for substring matches. | {{jsxref("RegExp/hasIndices", "hasIndices")}} |
| `g` | Global search. | {{jsxref("RegExp/global", "global")}} |
| `i` | Case-insensitive search. | {{jsxref("RegExp/ignoreCase", "ignoreCase")}} |
| `m` | Allows `^` and `$` to match next to newline characters. | {{jsxref("RegExp/multiline", "multiline")}} |
| `s` | Allows `.` to match newline characters. | {{jsxref("RegExp/dotAll", "dotAll")}} |
| `u` | "Unicode"; treat a pattern as a sequence of Unicode code points. | {{jsxref("RegExp/unicode", "unicode")}} |
| `v` | An upgrade to the `u` mode with more Unicode features. | {{jsxref("RegExp/unicodeSets", "unicodeSets")}} |
| `y` | Perform a "sticky" search that matches starting at the current position in the target string. | {{jsxref("RegExp/sticky", "sticky")}} |
The `i`, `m`, and `s` flags can be enabled or disabled for specific parts of a regex using the [modifier](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Modifier) syntax.
The sections below list all available regex syntaxes, grouped by their syntactic nature.
### Assertions
Assertions are constructs that test whether the string meets a certain condition at the specified position, but not consume characters. Assertions cannot be [quantified](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Quantifier).
- [Input boundary assertion: `^`, `$`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Input_boundary_assertion)
- : Asserts that the current position is the start or end of input, or start or end of a line if the `m` flag is set.
- [Lookahead assertion: `(?=...)`, `(?!...)`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion)
- : Asserts that the current position is followed or not followed by a certain pattern.
- [Lookbehind assertion: `(?<=...)`, `(?<!...)`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookbehind_assertion)
- : Asserts that the current position is preceded or not preceded by a certain pattern.
- [Word boundary assertion: `\b`, `\B`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion)
- : Asserts that the current position is a word boundary.
### Atoms
Atoms are the most basic units of a regular expression. Each atom _consumes_ one or more characters in the string, and either fails the match or allows the pattern to continue matching with the next atom.
- [Backreference: `\1`, `\2`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Backreference)
- : Matches a previously matched subpattern captured with a capturing group.
- [Capturing group: `(...)`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Capturing_group)
- : Matches a subpattern and remembers information about the match.
- [Character class: `[...]`, `[^...]`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class)
- : Matches any character in or not in a set of characters. When the [`v`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets) flag is enabled, it can also be used to match finite-length strings.
- [Character class escape: `\d`, `\D`, `\w`, `\W`, `\s`, `\S`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class_escape)
- : Matches any character in or not in a predefined set of characters.
- [Character escape: `\n`, `\u{...}`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape)
- : Matches a character that may not be able to be conveniently represented in its literal form.
- [Literal character: `a`, `b`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character)
- : Matches a specific character.
- [Modifier: `(?ims-ims:...)`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Modifier)
- : Overrides flag settings in a specific part of a regular expression.
- [Named backreference: `\k<name>`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Named_backreference)
- : Matches a previously matched subpattern captured with a named capturing group.
- [Named capturing group: `(?<name>...)`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Named_capturing_group)
- : Matches a subpattern and remembers information about the match. The group can later be identified by a custom name instead of by its index in the pattern.
- [Non-capturing group: `(?:...)`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Non-capturing_group)
- : Matches a subpattern without remembering information about the match.
- [Unicode character class escape: `\p{...}`, `\P{...}`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape)
- : Matches a set of characters specified by a Unicode property. When the [`v`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets) flag is enabled, it can also be used to match finite-length strings.
- [Wildcard: `.`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Wildcard)
- : Matches any character except line terminators, unless the `s` flag is set.
### Other features
These features do not specify any pattern themselves, but are used to compose patterns.
- [Disjunction: `|`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Disjunction)
- : Matches any of a set of alternatives separated by the `|` character.
- [Quantifier: `*`, `+`, `?`, `{n}`, `{n,}`, `{n,m}`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Quantifier)
- : Matches an atom a certain number of times.
### Escape sequences
_Escape sequences_ in regexes refer to any kind of syntax formed by `\` followed by one or more characters. They may serve very different purposes depending on what follow `\`. Below is a list of all valid "escape sequences":
| Escape sequence | Followed by | Meaning |
| --------------- | ----------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `\B` | None | [Non-word-boundary assertion][WBA] |
| `\D` | None | [Character class escape][CCE] representing non-digit characters |
| `\P` | `{`, a Unicode property and/or value, then `}` | [Unicode character class escape][UCCE] representing characters without the specified Unicode property |
| `\S` | None | [Character class escape][CCE] representing non-white-space characters |
| `\W` | None | [Character class escape][CCE] representing non-word characters |
| `\b` | None | [Word boundary assertion][WBA]; inside [character classes][CC], represents U+0008 (BACKSPACE) |
| `\c` | A letter from `A` to `Z` or `a` to `z` | A [character escape][CE] representing the control character with value equal to the letter's character value modulo 32 |
| `\d` | None | [Character class escape][CCE] representing digit characters (`0` to `9`) |
| `\f` | None | [Character escape][CE] representing U+000C (FORM FEED) |
| `\k` | `<`, an identifier, then `>` | A [named backreference][NBR] |
| `\n` | None | [Character escape][CE] representing U+000A (LINE FEED) |
| `\p` | `{`, a Unicode property and/or value, then `}` | [Unicode character class escape][UCCE] representing characters with the specified Unicode property |
| `\q` | `{`, a string, then a `}` | Only valid inside [`v`-mode character classes][VCC]; represents the string to be matched literally |
| `\r` | None | [Character escape][CE] representing U+000D (CARRIAGE RETURN) |
| `\s` | None | [Character class escape][CCE] representing whitespace characters |
| `\t` | None | [Character escape][CE] representing U+0009 (CHARACTER TABULATION) |
| `\u` | 4 hexadecimal digits; or `{`, 1 to 6 hexadecimal digits, then `}` | [Character escape][CE] representing the character with the given code point |
| `\v` | None | [Character escape][CE] representing U+000B (LINE TABULATION) |
| `\w` | None | [Character class escape][CCE] representing word characters (`A` to `Z`, `a` to `z`, `0` to `9`, `_`) |
| `\x` | 2 hexadecimal digits | [Character escape][CE] representing the character with the given value |
| `\0` | None | [Character escape][CE] representing U+0000 (NULL) |
[CC]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class
[CCE]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class_escape
[CE]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape
[NBR]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Named_backreference
[UCCE]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape
[VCC]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class#v-mode_character_class
[WBA]: /en-US/docs/Web/JavaScript/Reference/Regular_expressions/Word_boundary_assertion
`\` followed by any other digit character becomes a [legacy octal escape sequence](/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features#escape_sequences), which is forbidden in [Unicode-aware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode).
In addition, `\` can be followed by some non-letter-or-digit characters, in which case the escape sequence is always a [character escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape) representing the escaped character itself:
<!-- Note: the {} need to be double-escaped, once for Yari -->
- `\$`, `\(`, `\)`, `\*`, `\+`, `\.`, `\/`, `\?`, `\[`, `\\`, `\]`, `\^`, `\\{`, `\|`, `\\}`: valid everywhere
- `\-`: only valid inside [character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class)
- `\!`, `\#`, `\%`, `\&`, `\,`, `\:`, `\;`, `\<`, `\=`, `\>`, `\@`, `` \` ``, `\~`: only valid inside [`v`-mode character classes](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class#v-mode_character_class)
The other {{Glossary("ASCII")}} characters, namely space character, `"`, `'`, `_`, and any letter character not mentioned above, are not valid escape sequences. In [Unicode-unaware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode), escape sequences that are not one of the above become _identity escapes_: they represent the character that follows the backslash. For example, `\a` represents the character `a`. This behavior limits the ability to introduce new escape sequences without causing backward compatibility issues, and is therefore forbidden in Unicode-aware mode.
## Specifications
{{Specifications}}
## Browser compatibility
{{Compat}}
## See also
- [Regular expressions](/en-US/docs/Web/JavaScript/Guide/Regular_expressions) guide
- {{jsxref("RegExp")}}