Unix regular expressions. Using Grep and Regular Expressions to Find Text Patterns in Linux. ▍Verifying email addresses

💖 Do you like it? Share the link with your friends

One of the most useful and feature-rich commands in the Linux terminal is the “grep” command. Grep is an acronym that stands for “global regular expression print” (that is, “search everywhere for strings matching a regular expression and print them out”). This means that grep can be used to see if input matches specified patterns.

This seemingly trivial program is very powerful when used correctly. Its ability to sort input based on complex rules makes it a popular link in many command chains.

This tutorial looks at some of the grep command's capabilities and then moves on to using regular expressions. All the techniques described in this guide can be applied to managing a virtual server.

Basics of use

In its simplest form, grep is used to find matches of letter patterns in a text file. This means that if grep is given a search word, it will print every line in the file that contains that word.

As an example, you can use grep to find lines containing the word "GNU" in version 3 of the GNU General Public License on an Ubuntu system.

cd /usr/share/common-licenses
grep "GNU" GPL-3
GNU GENERAL PUBLIC LICENSE





13. Use with the GNU Affero General Public License.
under version 3 of the GNU Affero General Public License into a single
...
...

The first argument, "GNU", is the pattern to search for, and the second argument, "GPL-3", is the input file to be found.

As a result, all lines containing the text pattern will be output. In some Linux distributions the pattern you are looking for will be highlighted in the output lines.

General options

By default, the grep command simply searches for strictly specified patterns in the input file and prints the lines it finds. However, grep's behavior can be changed by adding some additional flags.

If you need to ignore the case of the search parameter and search for both uppercase and lowercase variations of the pattern, you can use the "-i" or "--ignore-case" utilities.

As an example, you can use grep to search the same file for the word "license" written in uppercase, lowercase, or mixed case.

grep -i "license" GPL-3
GNU GENERAL PUBLIC LICENSE
of this license document, but changing it is not allowed.
The GNU General Public License is a free, copyleft license for
The licenses for most software and other practical works are designed
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to


"This License" refers to version 3 of the GNU General Public License.
"The Program" refers to any copyrightable work licensed under this
...
...

As you can see, the output contains "LICENSE", "license", and "License". If there was an instance of "LiCeNsE" in the file, it would also be output.
If you need to find all lines that do not contain the specified pattern, you can use the "-v" or "--invert-match" flags.

As an example, you could use the following command to search the BSD license for all lines that do not contain the word "the":

grep -v "the" BSD
All rights reserved.
Redistribution and use in source and binary forms, with or without
are met:
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS"" ​​AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

As you can see, the last two lines were output as not containing the word "the" because the "ignore case" command was not used.

It is always useful to know the line numbers where the matches were found. They can be found using the "-n" or "--line-number" flags.

If you apply this flag in the previous example, the following result will be displayed:

grep -vn "the" BSD
2:All rights reserved.
3:
4:Redistribution and use in source and binary forms, with or without
6:are met:
13: may be used to endorse or promote products derived from this software
14: without specific prior written permission.
15:
16:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS"" ​​AND
17:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

You can now refer to the line number when you need to make changes on each line that does not contain "the".

Regular Expressions

As mentioned in the introduction, grep stands for “global regular expression print”. A regular expression is a text string that describes a specific search pattern.

Different applications and programming languages ​​use regular expressions slightly differently. This tutorial covers only a small subset of ways to describe patterns for Grep.

Letter matches

In the above examples of searching for the words "GNU" and "the", very simple regular expressions were looked for that exactly matched the character string "GNU" and "the".

It is more correct to think of them as matches of strings of characters rather than as matches of words. Once you become familiar with more complex patterns, this distinction will become more significant.

Patterns that exactly match given characters are called "letter" patterns because they match the pattern letter by letter, character by character.

All alphabetic and numeric characters (and some other characters) match literally unless they have been modified by other expression mechanisms.

Anchor matches

Anchors are special characters that indicate the location in a string of the desired match.

For example, you can specify that the search only needs lines that contain the word “GNU” at the very beginning. To do this, you need to use the anchor “^” before the letter string.

This example only prints lines that contain the word "GNU" at the beginning.

grep "^GNU" GPL-3
GNU General Public License for most of our software; it applies also to
GNU General Public License, you may choose any version ever published

Likewise, the anchor "$" can be used after a literal string to indicate that the match is valid only if the character string being searched is at the end of the text string.

The following regular expression prints only those lines that contain "and" at the end:

grep "and$" GPL-3
that there is no warranty for this free software. For both users" and
The precise terms and conditions for copying, distribution and


alternative is allowed only occasionally and noncommercially, and
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
provisionally, unless and until the copyright holder explicitly and
receives a license from the original licensors, to run, modify and
make, use, sell, offer for sale, import and otherwise run, modify and

Match any character

The period (.) is used in regular expressions to indicate that any character can appear at the specified location.

For example, if you want to find matches that contain two characters and then the sequence "cept", you would use the following pattern:

grep "..cept" GPL-3
use, which is precisely where it is most unacceptable. Therefore, we
infringement under applicable copyright law, except executing it on a
tells the user that there is no warranty for the work (except to the

form of a separately written license, or stated as exceptions;
You may not propagate or modify a covered work except as expressly
9. Acceptance Not Required for Having Copies.
...
...

As you can see, the results include the words “accept” and “except”, as well as variations of these words. The pattern would also match the sequence “z2cept” if it were in the text.

Expressions in parentheses

By placing a group of characters within square brackets (""), you can indicate that any of the characters in the brackets can appear at that position.

This means that if you need to find strings containing "too" or "two", you can briefly indicate these variations using the following pattern:

grep "to" GPL-3
your programs, too.

Developers that use the GNU GPL protect your rights with two steps:
a computer network, with no transfer of a copy, is not conveying.

Corresponding Source from a network server at no charge.
...
...

As you can see, both variations were found in the file.

Putting characters in parentheses also provides several useful features. You can indicate that everything except the characters in brackets matches the pattern by starting the list of characters in brackets with the character “^”.

This example uses the ".ode" pattern, which must not match the "code" sequence.

grep "[^c]ode" GPL-3
1. Source Code.
model, to give anyone who possesses the object code either (1) a
the only significant mode of use of the product.
notice like this when it starts in an interactive mode:

It's worth noting that the second line output contains the word "code". This is not a regex or grep error.

Rather, this line was printed because it also contains the pattern-matching sequence "mode" found in the word "model". That is, the string was printed because it matched the pattern.

Another one useful feature brackets - the ability to specify a range of characters instead of entering each character separately.

This means that if you need to find every line that starts with a capital letter, you can use the following pattern:

grep "^" GPL-3
GNU General Public License for most of our software; it applies also to

License. Each licensee is addressed as "you". "Licenses" and


System Libraries, or general-purpose tools or generally available free
Source.

...
...

Due to some inherited sorting problems, it is better to use character classes for more accurate results POSIX standard instead of the character range used in the example above.
There are many character classes not covered in this manual; for example, to perform the same procedure as in the example above, you can use the character class "[:upper:]" in parentheses.

grep "^[[:upper:]]" GPL-3
GNU General Public License for most of our software; it applies also to
States should not allow patents to restrict development and use of
License. Each licensee is addressed as "you". "Licenses" and
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
System Libraries, or general-purpose tools or generally available free
Source.
User Product is transferred to the recipient in perpetuity or for a
...
...

Repeat pattern (0 or more times)

One of the most commonly used metacharacters is the "*" symbol, which means "repeat the previous character or expression 0 or more times."

For example, if you want to find every line with opening or closing parentheses that contain only letters and single spaces between them, you can use the following expression:

grep "(*)" GPL-3

distribution (with or without modification), making available to the
than the work as a whole, that (a) is included in the normal form of
Component, and (b) serves only to enable use of the work with that
(if any) on which the executable work runs, or a compiler used to
(including a physical distribution medium), accompanied by the
(including a physical distribution medium), accompanied by a
place (gratis or for a charge), and offer equivalent access to the
...
...

How to avoid metacharacters

Sometimes you may need to look for a literal period or a literal open parenthesis. Because these characters have a specific meaning in regular expressions, you need to "escape" them by telling grep that their special meaning is not needed in this case.

These characters can be escaped by using a backslash (\) before the character, which usually has special meaning.

For example, if you need to find a string that starts with a capital letter and ends with a period, you can use the expression below. The backslash before the last dot tells the command to "escape" it, so that the last dot represents a literal dot and has no "any character" meaning:

grep "^.*\.$" GPL-3
Source.
License by making exceptions from one or more of its conditions.
License would be to refrain entirely from conveying the Program.
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
SUCH DAMAGES.
Also add information on how to contact you by electronic and paper mail.

Advanced Regular Expressions

The Grep command can also be used with an extended regular expression language by using the -E flag or by calling the egrep command instead of grep.

These commands open up the capabilities of "extended regular expressions". Extended regular expressions include all the basic metacharacters, as well as additional metacharacters to express more complex matches.

Grouping

One of the simplest and most useful features that extended regular expressions provide is the ability to group expressions and use them as a single unit.

Parentheses are used to group expressions. If you need to use parentheses outside of extended regular expressions, they can be "escaped" using a backslash

grep "\(grouping\)" file.txt
grep -E "(grouping)" file.txt
egrep "(grouping)" file.txt

The above expressions are equivalent.

Alternation

Just as square brackets define different possible options matches a single character, interleaving allows you to specify alternative matches for strings of characters or sets of expressions.

The symbol is used to indicate alternation vertical line"|". Alternation is often used in grouping to indicate that one of two or more possible options should be considered a match.

In this example, you need to look for “GPL” or “General Public License”:

grep -E "(GPL|General Public License)" GPL-3
The GNU General Public License is a free, copyleft license for
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to
price. Our General Public Licenses are designed to make sure that you
Developers that use the GNU GPL protect your rights with two steps:
For the developers" and authors" protection, the GPL clearly explains
authors" sake, the GPL requires that modified versions be marked as
have designed this version of the GPL to prohibit the practice for those
...
...

Alternation can be used to choose between two or more options; To do this, you need to enter the remaining options into the selection group, separating each one using the vertical bar symbol “|”.

Quantifiers

In extended regular expressions, there are metacharacters that indicate how often a character is repeated, much like the metacharacter "*" indicates that the previous character or string of characters matches 0 or more times.

To match a character 0 or more times, you can use the "?" character. It will make the previous character or series of characters essentially optional.

In this example, by inserting the sequence “copy” into the optional group, matches “copyright” and “right” are displayed:

grep -E "(copy)?right" GPL-3
Copyright (C) 2007 Free Software Foundation, Inc.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
"Copyright" also means copyright-like laws that apply to other kinds of
...
...

The "+" character matches expressions 1 or more times. It works almost like the "*" symbol, but when using "+" the expression must match at least 1 time.

The following expression matches the string "free" plus 1 or more characters that are not whitespace:

grep -E "free[^[:space:]]+" GPL-3
The GNU General Public License is a free, copyleft license for
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
When we speak of free software, we are referring to freedom, not
have the freedom to distribute copies of free software (and charge for

freedoms that you received. You must make sure that they, too, receive
protecting users" freedom to change the software. The systematic
of the GPL, as needed to protect the freedom of users.
patents cannot be used to render the program non-free.

Number of matches repeated

If necessary, you can use braces(“( )”). These symbols are used to indicate the exact number, range, and upper and lower limits of the number of matches of an expression.

If you need to find all lines that contain a combination of three vowels, you can use the following expression:

grep -E "(3)" GPL-3
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
receive it, in any medium, provided that you conspicuously and
give under the previous paragraph, plus a right to possession of the
covered work so as to satisfy simultaneously your obligations under this
If you need to find all words consisting of 16-20 characters, use the following expression:
grep -E "[[:alpha:]](16,20)" GPL-3
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
c) Prohibiting misrepresentation of the origin of that material, or

conclusions

In many cases, the grep command is useful for finding patterns within files or in a hierarchy file system. It saves a lot of time, so it's worth familiarizing yourself with its parameters and syntax.

Regular Expressions are even more multifunctional and can be used in many popular programs. For example, many text editors use regular expressions to search and replace text.

Moreover, advanced programming languages ​​use regular expressions to execute procedures on specific pieces of data. Knowing how to work with regular expressions comes in handy when solving common computer-related problems.

Tags: ,

Regular expressions are a very powerful tool for searching text by pattern, processing and modifying strings, which can be used to solve many problems. Here are the main ones:

  • Text input check;
  • Search and replace text in a file;
  • Batch renaming of files;
  • Interaction with services such as Apache;
  • Checking a string for matching a pattern.

This is far from full list, regular expressions allow you to do much more. But for new users they may seem too complicated, since they use a special language to create them. But given the capabilities provided, every system administrator should know and be able to use Linux regular expressions.

In this article, we'll look at bash regular expressions for beginners so that you can understand all the features of this tool.

There are two types of characters that can be used in regular expressions:

  • ordinary letters;
  • metacharacters.

Common characters are the letters, numbers, and punctuation marks that make up any string. All texts are made up of letters and you can use them in regular expressions to find the desired position in the text.

Metacharacters are something else, they are what give regular expressions their power. With metacharacters you can do much more than just search for a single character. You can search for combinations of symbols, use a dynamic number of symbols, and select ranges. All special characters can be divided into two types: replacement characters, which replace regular characters, or operators, which indicate how many times a character can be repeated. The regular expression syntax would look like this:

regular_character special character_operator

special_replacement_character special character_operator

  • \ - alphabetic special characters begin with a backslash, and it is also used if you need to use a special character in the form of any punctuation mark;
  • ^ - indicates the beginning of the line;
  • $ - indicates the end of the line;
  • * - indicates that the previous character may be repeated 0 or more times;
  • + - indicates that the previous character should be repeated one or more times;
  • ? - the previous character can occur zero or once;
  • (n)- indicates how many times (n) the previous character should be repeated;
  • (N,n)- the previous character can be repeated from N to n times;
  • . - any character except line feed;
  • - any character specified in brackets;
  • x|y- symbol x or symbol y;
  • [^az]- any character except those indicated in brackets;
  • - any character from the specified range;
  • [^a-z]- any character that is not in the range;
  • \b- denotes a word boundary with a space;
  • \B- means that the character must be inside a word, for example, ux will match uxb or tuxedo, but will not match Linux;
  • \d- means that the symbol is a number;
  • \D- non-digital symbol;
  • \n- line feed character;
  • \s- one of the space characters, space, tab, and so on;
  • \S- any character except space;
  • \t- tab character;
  • \v- vertical tab character;
  • \w- any alphabetic character, including underscore;
  • \W- any alphabetic character, except underscore;
  • \uXXX- Unicdoe symbol.

It is important to note that you must use a slash before alphabetic special characters to indicate that a special character comes next. The reverse is also true, if you want to use a special character that is used without a slash as a regular character, then you will have to add a slash.

For example, you want to find the line 1+ 2=3 in the text. If you use this string as a regular expression, you won't find anything, because the system interprets the plus as a special character that indicates that the previous unit should be repeated one or more times. So it needs to be escaped: 1 \+ 2 = 3. Without escaping, our regular expression would only match the string 11=3 or 111=3 and so on. There is no need to put a line in front of equal, because it is not a special character.

Examples of using regular expressions

Now that we've covered the basics and you know how everything works, all that remains is to consolidate the knowledge you've gained about linux grep regular expressions in practice. Two very useful special characters are ^ and $, which indicate the beginning and end of a line. For example, we want to get all users registered in our system whose name starts with s. Then you can use a regular expression "^s". You can use the egrep command:

egrep "^s" /etc/passwd

If we want to select lines based on the last character in the line, we can use $ for this. For example, let's select all system users, without a shell, records about such users end in false:

egrep "false$" /etc/passwd

To display usernames that begin with s or d, use this expression:

egrep "^" /etc/passwd

The same result can be obtained by using the "|" symbol. The first option is more suitable for ranges, and the second is more often used for regular or/or:

egrep "^" /etc/passwd

Now let's select all users whose name is not three characters long. The username ends with a colon. We can say that it can contain any alphabetic character, which must be repeated three times, before the colon:

egrep "^\w(3):" /etc/passwd

conclusions

In this article we covered Linux regular expressions, but that was just the basics. If you dig a little deeper, you will find that you can do a lot more interesting things with this tool. Taking the time to master regular expressions will definitely be worth it.

To conclude, a lecture from Yandex about regular expressions:

One of the most useful and feature-rich commands in the Linux terminal is the “grep” command. The name is an acronym English phrase“search Globally for lines matching the Regular Expression, and Print them” (search everywhere for lines matching the regular expression and print them). The "grep" command scans the input stream line by line, looking for matches and outputs (filters) only those lines that contain text that matches the given pattern - regular expression.

Regular expressions are a special formal language for searching and manipulating substrings in text, based on the use of metacharacters. Now almost everything modern languages programming programs have built-in support for regular expressions for text processing, however, historically, the popularization of this approach was largely facilitated by the world of UNIX and in particular the ideas embedded in the commands “grep”, “sed”, etc. The philosophy of “everything is a file” completely permeates UNIX and ownership tools for working with text files is one of the required skills for every Linux user.

SAMPLE

GIST | A simple search for all lines that contain the text "Adams". When formatting this and subsequent examples, we will adhere to the following order: command line parameters at the top, standard streams at the bottom, stdin input on the left and stdout output on the right.

The "grep" command has an impressive number of options that you can specify when running it. You can do a lot of useful things with these options, and you don't even need to be well versed in regular expression syntax.

OPTIONS

Let's start with the fact that “grep” can not only filter standard input stdin, but also search through files. By default, grep will only search files in the current directory, but with the very useful --recursive option, you can tell grep to search recursively starting from a given directory.

GIST | By default, the grep command is case sensitive. The following example shows how you can search without case being sensitive, for example “Adams” and “adams” are the same thing:

Ignore-case "adams"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 John Adams, 1797-1801

GIST | The search is the opposite (sometimes they say inverted search), that is, all lines will be displayed except those that have an occurrence of the specified pattern:

Invert-match "Adams"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 George Washington, 1789-1797 Thomas Jefferson, 1801-1809

GIST | Options, of course, can and should be combined with each other. For example, a search in reverse with displaying the serial numbers of lines with occurrences:

Line-number --invert-match "Adams"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 1:George Washington, 1789-1797 3:Thomas Jefferson, 1801-1809

GIST | Coloring. Sometimes it is convenient when the word we are looking for is highlighted in color. All this is already in “grep”, all that remains is to include:

Line-number --color=always "Adams"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 2:John Adams, 1797-1801

GIST | We want to select all errors from the log file, but we know that the next line after the error may contain helpful information, then it is convenient to output several lines from the context. By default, grep will only print the line where the match was found, but there are several options to make grep print more. To output multiple lines (in our case two) after an entry:

Color=always -A2 "Adams"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 James Madison, 1809-1817 James Monroe, 1817-1825 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 James Madison, 1809-1817

GIST | Likewise for additional output of multiple lines before the entry:

Color=always -B2 "James"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 James Madison, 1809-1817 James Monroe, 1817-1825 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 James Madison, 1809-1817 James Monroe , 1817-1825

GIST | However, most often you need to display a symmetric context; there is an even shorter notation for this. Let's print two lines both above and below the entry:

Color=always -C2 "James"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 James Madison, 1809-1817 James Monroe, 1817-1825 John Quincy Adams, 1825-1829 Andrew Jackson, 1829-1837 Martin Van Buren, 1837-1841 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 James Madison, 1809-1817 James Monroe, 1817-1825 John Quincy Adams, 1825-1829 Andrew Jackson, 1829-1837

GIST | When you search for qwe, by default "grep" will also output qwe123, 345qwerty and similar combinations. Let's find only those lines that turn off the entire word:

Word-regexp --color=always "John"

John Fitzgerald Kennedy, 1961-1963 Lyndon Baines Johnson, 1963-1969 John Fitzgerald Kennedy, 1961-1963

GIST | And finally, if you just want to know the number of lines with matches of one single number, but not display anything else:

Count --color=always "John"

John Fitzgerald Kennedy, 1961-1963 Lyndon Baines Johnson, 1963-1969 Richard Milhous Nixon, 1969-1974 2

It's worth noting that most options have a counterpart, for example --ignore-case can be reduced to the shorter form -i, etc.

BASIC REGULAR EXPRESSIONS

All regular expressions consist of two types of characters: standard text characters called literals, and special characters called metacharacters. In the previous examples, the search was carried out using literals (exact match of letters), but what follows will be much more interesting. Welcome to the world of regular expressions!

The caret ^ and dollar signs $ have special meanings in a regular expression. They are called “anchors”. Anchors are special characters that indicate the location in a string of the desired match. When the search reaches an anchor, it checks to see if there is a match, and if so, it continues following the pattern. without adding anything to the result.

GIST | The caret anchor is used to indicate that the regular expression needs to be tested from the beginning of the line:

Color=always "^J"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 John Adams, 1797-1801

GIST | Similarly, the dollar anchor should be used at the end of the pattern to indicate that the match is only valid if the character string being searched is at the end of the text string and not otherwise:

Color=always "9$"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 Thomas Jefferson, 1801-1809

GIST | Any character. The dot character is used in regular expressions to indicate that absolutely any character can appear at the specified location:

Color=always "0.$"

GIST | Shielding. If you need to find exactly the dot symbol, then escaping will help. An escape character (usually a backslash) preceding a character such as a dot turns the metacharacter into a literal:

Color=always "\."

George Washington. 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 George Washington. 1789-1797

GIST | Character classes. Regular expressions can use ranges and character classes. To do this, square brackets are used when creating the template. By placing a group of characters (including characters that would otherwise be interpreted as metacharacters) within square brackets, you can indicate that any of the characters in the brackets can appear at that position:

Color=always "0"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 John Adams, 1797-1801 Thomas Jefferson, 1801-1809

GIST | Range. These are two characters separated by a hyphen, for example 0-9 (decimal digits) or 0-9a-fA-F (hexadecimal digits):

Color=always ""

George Washington, ??? John Adams, 1797-1801 Thomas Jefferson, 1801-1809 John Adams, 1797-1801 Thomas Jefferson, 1801-1809

GIST | Negation. If the first character of the expression in square brackets is a caret, then the remaining characters are taken as a set of characters that should not be present at the given position of the regular expression:

Color=always "[^7]$"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 John Adams, 1797-1801 Thomas Jefferson, 1801-1809

GIST | POSIX character classes. There is a certain set of pre-prepared character classes that you can use in regular expressions. There are about a dozen of them, just quickly look through the manual to understand the purpose of each. For example, let's filter only hexadecimal digits:

Color=always "^[[:xdigit:]]*$"

4.2 42 42abc 42 42abc

GIST | Repeat (0 or more times). One of the most commonly used metacharacters is the asterisk symbol, which means “repeat the previous character or expression zero or more times”:

Color=always "^*$"

George Washington, ??? John Adams, 1797-1801 Thomas Jefferson, 1801-1809 George Washington, ???

There are basic regular expressions BRE (basic regular expressions) and extended regular expressions ERE (extended regular expressions). BRE recognizes the following metacharacters ^ $ . * and all other characters are treated as literals. Have the following metacharacters been added to ERE () ( ) ? + | and related functions. Well, in order to completely confuse everyone, they came up with this thing in “grep” - the characters () ( ) in BRE are treated as metacharacters if they are escaped with a backslash, while in ERE, placing a backslash in front of any metacharacters leads to the fact that they are treated like literals.

ADVANCED REGULAR EXPRESSIONS

GIST | Disjunction. Just as square brackets specify different possible matches for a single character, a disjunction allows you to specify alternative matches for strings of characters or expressions. The vertical bar symbol is used to indicate disjunction:

Extended-regexp --color=always "George|John"

George Washington, 1789-1797 John Adams, 1797-1801 Thomas Jefferson, 1801-1809 George Washington, 1789-1797 John Adams, 1797-1801

GIST | Match zero or one time. In extended regular expressions, there are several additional metacharacters that indicate how often a character or expression is repeated (similar to how the asterisk metacharacter indicates matches of 0 or more times). One such metacharacter is the question mark, which makes the previous character or expression essentially optional:

Extended-regexp --color=always "^(Andrew)?John"

John Adams, 1797-1801 Andrew Johnson, 1865-1869 Lyndon Baines Johnson, 1963-1969 John Adams, 1797-1801 Andrew Johnson, 1865-1869

GIST | Match one or more times. For this purpose, a metacharacter in the form of a plus sign is provided. It works almost like the asterisk symbol, except that the expression must match at least once:

Extended-regexp --color=always "^[[:alpha:] ]+$"

John Adams Andrew Johnson, 1865-1869 Lyndon Baines Johnson, 1963-1969 John Adams

GIST | Match the specified number of times. You can use curly braces for this. These metacharacters are used to indicate the exact number, range, and upper and lower limit of the number of matches of an expression:

Extended-regexp --color=always "(1,3)\.(1,3)\.(1,3)\.(1,3)"

42 127.0.0.1 127.0.0.1

The grep command is so useful, feature-rich, and easy to use that once you know it, you can't imagine working without it.

grep stands for 'global regular expression printer'. grep cuts the lines you need from text files which contain user-specified text.

grep can be used in two ways - on its own or in combination with streams.

grep is very extensive in functionality due to the large number of options it supports, such as: searching using a string pattern or RegExp regular expression pattern or perl based regular expressions, etc.

Due to its different functionality The grep tool has many options including egrep (Extended GREP), fgrep (Fixed GREP), pgrep (Process GREP), rgrep (recursive GREP) etc. But these options have minor differences from the original grep.

grep options

$ grep -V grep (GNU grep) 2.10 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+

There are modifications of the grep utility: egrep (with extended regular expression processing), fgrep (which treats $*^|()\ symbols as literals, i.e. literally), rgrep (with recursive search enabled).

    egrep is the same as grep -E

    fgrep is the same as grep -F

    rgrep is the same as grep -r

    grep [-b] [-c] [-i] [-l] [-n] [-s] [-v] restricted_regex_BRE [file...]

The grep command matches lines in source files against the pattern specified by limited_regex. If no files are specified, standard input is used. Typically, each successfully matched string is copied to standard output; if there are several source files, the file name is given before the found line. grep uses a compact, non-deterministic algorithm. Restricted regular expressions (expressions that have strings of characters with their meanings and use a limited set of alphanumeric and special characters) are perceived as templates. They have the same meaning as regular expressions in ed.

To escape the characters $, *, , ^, |, (), and \ from shell interpretation, it is easiest to enclose the constrained_regex in single quotes.

Options:

B Prefaces each line with the block number in which it was found. This can be useful when searching for blocks by context (blocks are numbered starting from 0). -c Prints only the number of lines containing the pattern. -h Prevents the file name containing the matched line from being printed before the line itself. Used when searching across multiple files. -i Ignores case when making comparisons. -l Prints only the names of the files containing the matching strings, one per line. If a pattern is found on multiple lines of a file, the file name is not repeated. -n Prints before each line its number in the file (lines are numbered starting from 1). -s Suppresses messages about non-existent or unreadable files. -v Prints all lines except those containing a pattern. -w Searches the expression as a word, as if it were surrounded by metacharacters \< и \>.

grep --help

Usage: grep [OPTION]... PATTERN [FILE]... Searches for PATTERN in each FILE or standard input. By default, PATTERN is a simple regular expression (BRE). Example: grep -i "hello world" menu.h main.c Selecting the type of regular expression and its interpretation: -E, --extended-regexp PATTERN - extended regular expression (ERE) -F, --fixed-regexp PATTERN - strings fixed length, separated by a newline character -G, --basic-regexp PATTERN - simple regular expression (BRE) -P, --perl-regexp PATTERN - Perl regular expressions -e, --regexp=PATTERN use PATTERN to search - f, --file=FILE take PATTERN from FILE -i, --ignore-case ignore case difference -w, --word-regexp PATTERN must match all words -x, --line-regexp PATTERN must match entire line -z, --null-data lines are separated by a null byte rather than a line end character Miscellaneous: -s, --no-messages suppress error messages -v, --revert-match select unmatched lines -V, - -version print version information and exit --help show this help and exit --mmap for backwards compatibility, ignored Output control: -m, --max-count=NUM stop after the specified NUM matches -b, --byte- offset print the byte offset along with the output lines -n, --line-number print the line number along with the output lines --line-buffered flush the buffer after each line -H, --with-filename print the file name for each match -h , --no-filename do not start output with the file name --label=LABEL use LABEL as the file name for standard input -o, --only-matching show only the part of the line that matches the PATTERN -q, --quiet, - -silent suppress all normal output --binary-files=TYPE assume that the binary file has a TYPE of binary, text, or without-match. -a, --text same as --binary-files=text -I same as --binary-files=without-match -d, --directories=ACTION how to handle directories ACTION can be read ), recurse (recursively) or skip (skip). -D, --devices=ACTION how to handle devices, FIFOs and sockets ACTION can be read or skip -R, -r, --recursive same as --directories=recurse --include=F_PATTERN process only files matching under F_TEMPLATE --exclude=F_TEMPLATE skip files and directories matching F_TEMPLATE --exclude-from=FILE skip files matching the template files from FILE --exclude-dir=TEMPLATE directories matching PATTERN will be skipped -L, - -files-without-match print only FILE names without matches -l, --files-with-matches print only FILE names with matches -c, --count print only the number of matching lines per FILE -T, --initial-tab align tab (if necessary) -Z, --null print byte 0 after the FILE name Context management: -B, --before-context=NUM print the NUMBER of lines of the preceding context -A, --after-context=NUM print the NUMBER of lines of the subsequent context -C, --context[=NUMBER] print the NUMBER of context lines -NUMBER is the same as --context=NUMBER --color[=WHEN], --colour[=WHEN] use markers to distinguish matching lines; WHEN can be always, never or auto -U, --binary do not remove CR characters at the end of the line (MSDOS) -u, --unix-byte-offsets show offset as if there were none CR-s (MSDOS) Instead of “egrep”, it is supposed to run “grep -E”. "grep -F" is assumed instead of "fgrep". It is better not to run as “egrep” or “fgrep”. When FILE is not specified, or when FILE is -, then standard input is read. If fewer than two files are specified, -h is assumed. If a match is found, the exit code will be 0, and 1 if not. If errors occur, or if the -q option is not specified, the exit code will be 2. Report errors to: Please report errors in translation to: GNU Grep home page: Help for working with GNU programs:

The grep utility is a very powerful tool for searching and filtering text information. This article shows several examples of its use that will allow you to appreciate its capabilities.
The main use of grep is to search for words or phrases in files and output streams. You can search by typing in command line query and search area (file).
For example, to find the string “needle” in the hystack.txt file, use the following command:

$ grep needle haystack.txt

As a result, grep will display all occurrences of needle that it encounters in the contents of the haystack.txt file. It's important to note that in this case, grep is looking for a set of characters, not a word. For example, strings that include the word “needless” and other words that contain the sequence “needle” will be displayed.


To tell grep that you are looking for a specific word, use the -w switch. This key will limit the search to only the specified word. A word is a query delimited on both sides by any whitespace, punctuation, or line breaks.

$ grep -w needle haystack.txt

It is not necessary to limit the search to just one file; grep can search across a group of files, and the search results will indicate the file in which the match was found. The -n switch will also add the line number in which the match was found, and the -r switch will allow you to perform a recursive search. This is very convenient when searching among files with program source codes.

$ grep -rnw function_name /home/www/dev/myprogram/

The file name will be listed before each match. If you need to hide file names, use the -h switch, on the contrary, if you only need file names, then specify the -l switch
In the following example, we will search for URLs in the IRC log file and show the last 10 matches.

$ grep -wo http://.* channel.log | tail

The -o option tells grep to print only the pattern match rather than the entire line. Using pipe, we redirect the output of grep to the tail command, which by default outputs the last 10 lines.
Now we will count the number of messages sent to the irc channel by certain users. For example, all the messages I sent from home and work. They differ in nickname, at home I use the nickname user_at_home, and at work user_at_work.

$ grep -c "^user_at_(home|work)" channel.log

With the -c option, grep only prints the number of matches found, not the matches themselves. The search string is enclosed in quotes because it contains special characters that can be recognized by the shell as control characters. Please note that quotation marks are not included in the search pattern. The backslash "" is used to escape special characters.
Let's search for messages from people who like to “scream” in the channel. By “scream” we mean messages written in blondy-style, in all CAPITAL letters. To exclude random hits of abbreviations from the search, we will search for words of five or more characters:

$ grep -w "+(5,)" channel.log

For a more detailed description, you can refer to the grep man page.
A few more examples:

# grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin

Displays lines from the /etc/passwd file that contain the string root.

# grep -n root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 12:operator:x:11:0:operator:/root:/sbin/nologin

In addition, the line numbers that contain the searched line are displayed.

# grep -v bash /etc/passwd | grep -v nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin :/sbin/halt news:x:9:13:news:/var/spool/news: mailnull:x:47:47::/var/spool/mqueue:/dev/null xfs:x:43:43: X Font Server:/etc/X11/fs:/bin/false rpc:x:32:32:Portmapper RPC user:/:/bin/false nscd:x:28:28:NSCD Daemon:/:/bin/false named:x:25:25:Named:/var/named:/bin/false squid:x:23:23::/var/spool/squid:/dev/null ldap:x:55:55:LDAP User: /var/lib/ldap:/bin/false apache:x:48:48:Apache:/var/www:/bin/false

Checks which users are not using bash, excluding those user accounts that have nologin specified as their shell.

# grep -c false /etc/passwd 7

Counts the number of accounts that have /bin/false as their shell.

# grep -i games ~/.bash* | grep -v history

This command displays lines from all files in the current user's home directory whose names begin with ~/.bash, excluding those files whose names include the string history, so as to exclude matches found in ~/.bash_history in which can specify the same string in upper or lower case. Please note that the search for the word “games” is carried out; you can substitute any other word instead.
grep command and regular expressions

Unlike the previous example, we will now display only those lines that begin with the line “root”:

# grep ^root /etc/passwd root:x:0:0:root:/root:/bin/bash

If we want to see which accounts haven't used the shell at all, we look for lines ending with a ":" character:

# grep:$ /etc/passwd news:x:9:13:news:/var/spool/news:

To check if the PATH variable in your ~/.bashrc file is exported, first select the lines with "export" and then look for lines starting with the line "PATH"; in this case, MANPATH and other possible paths will not be displayed:

# grep export ~/.bashrc | grep "PATH" export PATH="/bin:/usr/lib/mh:/lib:/usr/bin:/usr/local/bin:/usr/ucb:/usr/dbin:$PATH"

Character classes

The expression in square brackets is a list of characters enclosed within the characters [" and "]"". It matches any single character specified in this list; if the first character of the list is "^", then it matches any character that is NOT in the list. For example, the regular expression "" matches any single digit.

Within an expression in square brackets, you can specify a range consisting of two characters separated by a hyphen. Then the expression matches any singleton that, according to the sorting rules, falls inside these two characters, including these two characters; this takes into account the collation and character set specified in the locale. For example, when the default locale is C, the expression "" is equivalent to the expression "". There are many locales in which sorting is done in dictionary order, and in these locales "" is generally not equivalent to "", in which, for example, it may be equivalent to the expression "". To use the traditional interpretation of the expression specified in square brackets, you can use the C locale by setting it to environment variable LC_ALL value "C".

Finally, there are specially named character classes, which are specified inside expressions in square brackets. For more information about these predefined expressions, see the man pages or grep command documentation.

# grep /etc/group sys:x:3:root,bin,adm tty:x:5: mail:x:12:mail,postfix ftp:x:50: nobody:x:99: floppy:x:19: xfs:x:43: nfsnobody:x:65534: postfix:x:89:

The example displays all lines that contain either the character "y" or the character "f".
Universal characters (metacharacters)

Use "." to match any single character. If you want to get a list of all English words, taken from a dictionary, containing five characters starting with “c” and ending with “h” (useful for solving crossword puzzles):

# grep " " /usr/share/dict/words catch clash cloth coach couch cough crash crush

If you want to display lines that contain a period character as a literal, then specify the -F option in the grep command. Symbols "< " и «>" means the presence of an empty line before and, accordingly, after the specified letters. This means that the words in the words file must be written accordingly. If you want to find all the words in the text according to the specified patterns without taking into account empty lines omit the " characters< " и «>", for more precise search only words use the -w switch.

To similarly find words that can have any number of characters between the “c” and “h,” use an asterisk (*). The example below selects all words starting with "c" and ending with "h" from the system dictionary:

# grep " " /usr/share/dict/words caliph cash catch cheesecloth cheetah --output omitted--

If you want to find the literal asterisk character in a file or output stream, use single quotes to find it. The user in the example below first tries to look for an "asterisk" in the /etc/profile file without using quotes, which results in nothing being found. When quotes are used, the result is output:

# grep * /etc/profile # grep "*" /etc/profile for i in /etc/profile.d/*.sh ; do



tell friends