# Pattern Matching for Strings

Formally learned in CS240.

Pattern Matching Definition

Search for a string (pattern) in a large body of text

- $T[0..n−1]$ – The
text(or haystack) being searched within- $P[0..m−1]$ – The
pattern(or needle) being searched for- Strings over
alphabet$Σ$- Return smallest $i$ such that $P[j]=T[i+j]for0≤j≤m−1$
- This is the first occurrence of $P$ in $T$
- If $P$ does not occur in $T$, return
`FAIL`

Applications:

- Information Retrieval (text editors, search engines)
- Bioinformatics
- Data Mining

### How to solve

The easiest way to solve this is brute forcing, by checking every possible guess. But worst case performance is $Θ((n−m)⋅m)$.

How to improve?

- Do extra
**preprocessing**on the pattern $P$- Karp-Rabin Algorithm
- Boyer-Moore Algorithm
- Deterministic finite automata (DFA), KMP Algorithm
- We
**eliminate guesses**based on completed matches and mismatches.

- Do extra
**preprocessing**on the text $T$- Suffix Tree
- Suffix Array
- We
**create a data structure**to find matches easily.