In Today's post, I will cover Repetitions, Grouping and capturing
import java.io.*; import java.util.*; import java.text.*; import java.math.*; import java.util.regex.*; public class Regex_Pattern { public static void main(String[] args) { Regex_Test tester = new Regex_Test(); tester.checker("__________"); //here you will paste the regEx pattern } } class Regex_Test { public void checker(String Regex_Pattern){ Scanner Input = new Scanner(System.in); String Test_String = Input.nextLine(); Pattern p = Pattern.compile(Regex_Pattern); Matcher m = p.matcher(Test_String); int Count = 0; while(m.find()){ Count += 1; } System.out.format("Number of matches : %d",Count); } }
Matching {x} Repetitions
The tool {x} will match exactly x repetitions of character/character class/groups.This is also known as Quantifiers.Pattern : \w{2}
Test_String: do
Pattern : [abc]{2}
Test_String: abcabc (2 matches)
Matching {x,y} Repetitions
The tool {x} will match between x and y both inclusive repetitions of character/character class/groups.This is also known as Quantifiers.Pattern : \w{1,4}\d{4,}
Test_String: abcb45677777abcd
Matching {x,y} Repetitions
The tool {x,y} will match between x and y both inclusive repetitions of character/character class/groups. This is also known as Quantifiers.Pattern: [xyz]{5,} (It will match the character x, y, or z 5 or more times.)
Test_String: xyzxyz
Pattern : \w{1,4}\d{4,}
Test_String: abcb45677777abcd
Matching Zero Or More Repetitions
The * symbol matches 0 or more of the preceding tokenOR you can say matches 0 or more repetitions of character/character class/group
Pattern : \w{1,4}\d*
Test_String: abcb45677777
Matching One Or More Repetitions
The + symbol matches 1 or more of the preceding tokenOR you can say matches 1 or more repetitions of character/character class/group
Pattern : Ab+s
Test_String: As Abbbbss
Matching Word Boundaries
\b
\b matches a word boundary position between a word character and non-word character or position(start/end of the string )
Three different positions qualify for word boundaries :
Three different positions qualify for word boundaries :
- Before the first character in the string, if the first character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
- After the last character in the string, if the last character is a word character.
\B
Not word boundary. Matches any position that is not a word boundary
Pattern : \bcat\b
Test_String: A cat
Test_String: Acat
Pattern: \Bcat\b
Test_String: Acat
Capturing & Non-Capturing Groups
()
Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference. This allows us to apply quantifiers to that groupThese parentheses also create a numbered capturing. It stores the part of the string matched by the part of regex inside parentheses.
(?:)
Groups multiple tokens together without creating a capture group
Groups multiple tokens together without creating a capture group
Pattern : abc(.*)ij
Test_String: abcefgijklmn
Pattern : (?:ha)+
Test_String: hahaha haa hah! (3 matches)
Alternative Matching
|This | symbol acts like a boolean OR matches the expression before or after | . This is also known as Alternation.
When used inside a character class, it will match characters; when used inside a group, it will match entire expressions (i.e., everything to the left or everything to the right of the vertical bar). We must use parentheses to limit the use of alternations.
When used inside a character class, it will match characters; when used inside a group, it will match entire expressions (i.e., everything to the left or everything to the right of the vertical bar). We must use parentheses to limit the use of alternations.
Pattern : b(a|e|i)d
Test_String: bad bud bod bed bid (3 matches)
Note: Use \\ instead of using \ in java
In the next post, I will discuss on Backreferences and Assertions
Post a Comment
Post a Comment
You are welcome to share your thoughts with us!