Original

Z algorithm - string searching


Assume that we have a string s of length n. Z algorithm runs in linear time O(n) to calculate the lengths of the longest substrings starting from s[i] (i=0, 1...n-1) which is also a prefix of s[0..n-1].

Steps


z is an array that stores the lengths of the common prefixes between each suffix s[i:] and the original string s. Z algorithm loops through i = 1 to n-1 to calculate z[i]. We use L and R (left and right) to keep track of the substring with max R (s[L...R]) which is also the prefix of s.

1. Let L = R = 0, z[0] = n initially.

2. Loop through i = 1 to n-1

1) if i > R:

Use brute force to find common prefix between s and s[i:] and update z[i], L and R.

2) if i <= R:

We've already known that s[L...R] == s[0...R-L]. Therefore, instead of comparing s and s[i:] directly, we can first compare s[L...R] and s[i...R] to decide weather it is neccessary to further compare s[R-i+1:] with s[R+1:] if s[L...R] == s[0..R-L] == s[i...R].

To compare s[L...R] and s[i...R], there is no need to start from scratch. Because the relationship between s[L...R] and s[i...R...] is similar to the relationship between s[0:] and s[i-L:]. We can use the value of z[i-L] that we have already calculated to simplify the calculation of z[i].

PS: The idea here seems like the state transition in dynamic programming to me.

a) If z[i] < R-i+1: z[i] = z[k]

b) If z[i] >= R-i+1: Further compare s[R+1:] and s[R-i+1] until they mismatch. Update L and R.


Illustration:

Application of Z algorithm to string searching


Z algorithm can be applied to string searching by connecting two strings with a dummy character.

For example: find "abc" in "adsjdabcsbdbabc"

  • s = "abc$adsjdabcsbdbabc"

  • The problem will be converted to calculate z[4...] of s and find where z[i] == 3.

Python 3 implementation


def Z(s):
    n = len(s)
    z = [n] * n
    l = r = 0
    for i in range(1, n):
        if i > r:
            l = r = i
            while r < n and s[r-i] == s[r]:
                r += 1
            z[i] = r-l
            r -= 1
        else:
            k = i-l
            if z[k] < r-i+1:
                z[i] = z[k]
            else:
                l = i
                while r < n and s[r-i] == s[r]:
                    r += 1
                z[i] = r-l
                r -= 1
    print(z)


References:

[1] GeeksforGeeks Z algorithm (Linear time pattern searching Algorithm)

0
  • By Yuechun Wang
  • Published 2020-09-23
  • The materials on this website may be freely copied and distributed for noncommercial personal use only so long as our copyright notice and website address is included.
  • Comments

    Geechi
    Good work!
    Yuechun
    Author
    @ Geechi
    Thank you (●'◡'●)