Regular expressions (Regexp)is one of the advanced concept we require to write efficient shell scripts and for effective system administration. Basically regular expressions are divided in to 3 types for better understanding. 1)Basic Regular expressions 2)Interval Regular expressions (Use option -E for grep and -r for sed) 3)Extended Regular expressions (Use option -E for grep and -r for sed) Some FAQ’s before starting Regular expressions
What is a Regular expression? A regular expression is a concept of matching a pattern in a given string. Which commands/programming languages support regular expressions? vi, tr, rename, grep, sed, awk, perl, python etc.
Basic Regular Expressions
Basic regular expressions: This set includes very basic set of regular expressions which do not require any options to execute. This set of regular expressions are developed long time back. ^ –Caret/Power symbol to match a starting at the beginning of line. $ –To match end of the line * –0 or more occurrence of previous character. . –To match any character [] –Range of character [^char] –negate of occurrence of a character set –Actual word finding –Escape character Lets start with our Regexp with examples, so that we can understand it better.
^ Regular Expression
Example 1: Find all the files in a given directory ls -l | grep ^- As you are aware that the first character in ls -l output, – is for regular files and d for directories in a given folder. Let us see what ^- indicates. The ^ symbol is for matching line starting, ^- indicates what ever lines starts with -, just display them. Which indicates a regular file in Linux/Unix. If we want to find all the directories in a folder use grep ^d option along ls -l as shown below ls -l | grep ^d How about character files and block files? ls -l | grep ^c ls -l | grep ^b We can even find the lines which are commented using ^ operator with below example grep ‘^#’ filename How about finding lines in a file which starts with ‘abc’ grep ‘^abc’ filename We can have number of examples with this ^ option.
$ Regular Expression
Example 2: Match all the files which ends with sh ls -l | grep sh$ As $ indicates end of the line, the above command will list all the files whose names end with sh. how about finding lines in a file which ends with dead grep ‘dead$’ filename How about finding empty lines in a file? grep ‘^$’ filename
* Regular Expression
Example 3: Match all files which have a word twt, twet, tweet etc in the file name. ls -l | grep ‘twe*t’ How about searching for apple word which was spelled wrong in a given file where apple is misspelled as ale, aple, appple, apppple, apppppple etc. To find all patterns grep ‘ap*le’ filename Readers should observe that the above pattern will match even ale word as * indicates 0 or more of previous character occurrence.
. Regular Expression
Example 4: Filter a file which contains any single character between t and t in a file name. ls -l | grep ‘t.t’ Here . will match any single character. It can match tat, t3t, t.t, t&t etc any single character between t and t letters. How about finding all the file names which starts with a and end with x using regular expressions? ls -l | grep ‘a.*x’ The above .* indicates any number of characters Note: .* in this combination . indicates any character and it repeated(*) 0 or more number of times. Suppose you have files as.. awx awex aweex awasdfx a35dfetrx etc.. it will find all the files/folders which start with a and ends with x in our example.
[] Square braces/Brackets Regular Expression
Example 5: Find all the files which contains a number in the file name between a and x ls -l | grep ‘a[0-9]x’ This will find all the files which is a0xsdf asda1xsdfas .. .. asdfdsara9xsdf etc. So where ever it finds a number it will try to match that number. Some of the range operator examples for you. [a-z] –Match’s any single char between a to z. [A-Z] –Match’s any single char between a to z. [0-9] –Match’s any single char between 0 to 9. [a-zA-Z0-9] – Match’s any single character either a to z or A to Z or 0 to 9 [!@#$%^] — Match’s any ! or @ or # or $ or % or ^ character. You just have to think what you want match and keep those character in the braces/Brackets.
[^char] Regular Expression
Example6: Match all the file names except a or b or c in its filenames ls | grep ’[^abc]‘ This will give output all the file names except files which contain a or b or c.
word Regular expression
Example7: Search for a word abc, for example I should not get abcxyz or readabc in my output. grep ‘<abc>’ filename
Escape Regular Expression
Example 8:Find files which contain [ in its name, as [ is a special charter we have to escape it grep “[” filename or grep ‘[[]‘ filename Note: If you observe [] is used to negate the meaning of [ regular expressions, so if you want to find any specail char keep them in [] so that it will not be treated as special char. Note: No need to use -E to use these regular expressions with grep. We have egrep and fgrep which are equal to “grep -E”. I suggest you just concentrate on grep to complete your work, don’t go for other commands if grep is there to resolve your issues.
*Remaining Types of Regular Expressions will be discussed in the next post………..