Till now we learned about basic regex syntax, character class, grouping. We are almost done with the basics of Regex. Today in this post We will understand the backreferences and assertions part which is a little bit tricky.

Before that please ensure you are thorough with the concept discussed in the last two posts.
Here are the links Part1, Part2
REGEX_PART3

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Regex_Pattern {    

    public static void main(String[] args) {
        
        Regex_Test tester = new Regex_Test();
        tester.checker("__________"); //here you will paste the regEx pattern
    
    }
}

class Regex_Test {

    public void checker(String Regex_Pattern){
    
        Scanner Input = new Scanner(System.in);
        String Test_String = Input.nextLine();
        Pattern p = Pattern.compile(Regex_Pattern);
        Matcher m = p.matcher(Test_String);
        int Count = 0;
        while(m.find()){
            Count += 1;
        }
        System.out.format("Number of matches : %d",Count);
    }   
    
}
Let's Get Started.

Matching Same Text Again & Again

Here the story of backreferencing begins. Syntax is "\group_number" which is also known as numeric reference.Let's understand with an example.

Pattern: (\w)a\1
Test_String: hah dad bad dab gag gab
So \group_number matches the results of a capture group. For example, \1 matches the result of the first capture group similarly \4 will matches the 4th capture group. In the above pattern (\w) is the first capture group and matches any word character next "a" then \1 matches the results of the first capture group. Let's take another example.

Pattern:(\w)(\w)d\2\1
Test_String: madam

Backreferences To Failed Groups

Capturing group that matches nothing

Pattern: (m?)a\1
Test_String: a
Test_String: mam
Here m? is optional and matches nothing and (m?) is successfully matched and capture nothing and \1 successfully matches the nothing by the group

Capturing group that didn't participate in the match at all

Pattern:(b)?o\1
Test_String: o (no match)
In most regex flavors (excluding JavaScript), (b)?o\1 fails to match o.
Here, (b) fails to match at all. Since the whole group is optional because of  '?' (matches 0 or 1 of the  preceding token) the regex engine does proceed to match o. Now \1 which references a group that did not participate in the match attempt at all. So backreference fails here.
You can solve this hackerrank problem for this topic.

Forward References

It is useful when we are in a situation of repeated groups. Then there may arise a case in which the regex engine evaluates the backreference after the group has been matched already.
Let's solve a hackerank problem to understand this problem.

Question: You have a test string S.
Your task is to write a regex which will match S with the following condition(s):
1. S consists of tic or tac.
2. tic should not be an immediate neighbor of itself.
3. The first tic must occurs only when tac has appeared at least twice before.

Valid String : tactactictactic , tactactic
Invalid String : tactactictactictictac , tactictac

Approach: From the 2nd point It is clear that there is a repeating pattern of tactic. But from the 3rd point, we confirm starting string should be tactactic and then repeating pattern. Now try to write the pattern from own then see the solution. Hint use ^ and $

Pattern: ^(\2tic|(tac))+$

Explanation: Two  groups first_group is second_group  AND  tic OR  second_group
and second_ group is  tac

Positive Lookahead

Syntax: regex_1(?=regex_2)
Matches a group after the main expression without including it in the result. Lookahead only asserts whether a match is possible or not.
Pattern:\d(?=em)
Test_String:1em 2pt 3em 4px

Now solve this hackerrank problem,
Question: You have a test string S. Write a regex that can match all occurrences of o followed immediately by 'oo' in S.
Answer: o(?=oo)

Negative Lookahead

Syntax: regex_1(?!regex_2)
Specifies a group that can not match after the main expression (if it matches, the result is discarded). Lookahead only asserts whether a match is possible or not.
Pattern:\d(?!em)
Test_String:1em 2pt 3em 4px

Now solve this hackerrank problem,
Question: You have a test string S.Write a regex which can match all characters which are not immediately followed by that same character.If S= goooo, then regex should match goooo. Because the first g is not follwed by g and the last o is not followed by o.
Answer: (.)(?!\1)

Positive Lookbehind

Syntax: (?<=regex_2)regex_1
Matches a group before the main expression without including it in the result. Lookbehind is excluded from the match (do not consume matches of regex_2), but only assert whether a match is possible or not.
Pattern:(?<=[a-z])[aeiou]
Test_String:he1o

Now solve this hackerrank problem,
Question: You have a test string . Write a regex which can match all the occurences of digit which are immediately preceded by odd digit.
Answer: (?<=[13579])\d

Negative Lookbehind

Syntax: (?<!regex_2)regex_1
Specifies a group that can not match before the main expression (if it matches, the result is discarded). Lookbehind is excluded from the match (do not consume matches of regex_2), but only assert whether a match is possible or not.
Pattern:(?<![a-z])[aeiou]
Test_String:he1o

Now solve this hackerrank problem,
Question: You have a test string . Write a regex which can match all the occurences of characters which
are not immediately preceded by vowels (a, e, i, u, o, A, E, I, O, U).
Answer: (?<![aeiouAEIOU]).

Note: Use \\ instead of using \ in java

Hope You all understand the basic concept of Regular expression.
In the next post, we will solve the applications related problem from the Hackerrank platform.