algorithms

Algorithms Handbook

View on GitHub

Longest common substring

Problem

Find the largest substring that two strings have in common.

Solution

We’ll use dynamic programming. First, build a grid:

Walkthrough

Consider the strings “fish” and “hish”:

You can cheat a little because you already know what the solution should be (ish) and then try to find the formula to keep filling the grid.

Alt text

Here’s the formula for filling in each cell:

Alt text

if word_a[i] == word_b[j]:
    cell[i][j] = cell[i-1][j-1] + 1
else:
  cell[i][j] = 0

It contrast with the knapsack problem, he solution is the largest number in the grid, and it may not be the last cell.

Longest common subsequence example

What if you want to know if “fosh” is closer to “fish” or “fort”. The longest substring length is the same: two letters. Nevertheless, “fosh” is closer to “fish”.

Alt text

In this case you need to calculate the longest common subsequence.

Here’s the final grid:

Alt text

Here’s the formula for filling in each cell:

Alt text

if word_a[i] == word_b[j]:
    cell[i][j] = cell[i-1][j-1] + 1
else:
  cell[i][j] = max(cell[i-1][j], cell[i][j-1])