# Subsequence

In mathematics, a **subsequence** is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. For example, the sequence is a subsequence of obtained after removal of elements , , and . The relation of one sequence being the subsequence of another is a preorder.

The subsequence should not be confused with substring which can be derived from the above string by deleting substring . The substring is a refinement of the subsequence.

## Common subsequence

Given two sequences *X* and *Y*, a sequence *Z* is said to be a *common subsequence* of *X* and *Y*, if *Z* is a subsequence of both *X* and *Y*. For example, if

- and

then a common subsequence of *X* and *Y* could be

This would *not* be the *longest common subsequence*, since *Z* only has length 3, and the common subsequence has length 4. The longest common subsequence of *X* and *Y* is .

## Applications

Subsequences have applications to computer science,^{[1]} especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.

Take two sequences of DNA containing 37 elements, say:

`SEQ`_{1}= ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA`SEQ`_{2}= CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

The longest common subsequence of sequences 1 and 2 is:

`LCS`_{(SEQ1,SEQ2)}=**CGTTCGGCTATGCTTCTACTTATTCTA**

This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:

`SEQ`_{1}= A**CG**G**T**G**TCG**T**GCTATGCT**GA**T**G**CT**G**ACTTAT**A**T**G**CTA**`SEQ`_{2}=**CGTTCGGCTAT**C**G**TA**C**G**TTCTA**TT**CT**A**T**G**ATT**T**CTA**A

Another way to show this is to *align* the two sequences, *i.e.*, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) in one sequence when two elements in the same column differ:

`SEQ`_{1}= ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-`| || ||| ||||| | | | | || | || | || | |||``SEQ`_{2}= -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA

Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.

## Theorems

- Every infinite sequence of real numbers has an infinite monotone subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem).
- Every bounded infinite sequence in
**R**^{n}has a convergent subsequence (This is the Bolzano–Weierstrass theorem). - For every integers
*r*and*s*, every finite sequence of length at least (*r*− 1)(*s*− 1) + 1 contains a monotonically increasing subsequence of length*r**or*a monotonically decreasing subsequence of length*s*(This is the Erdős–Szekeres theorem).

## See also

- Subsequential limit - the limit of some subsequence.
- Limit superior and limit inferior
- Longest increasing subsequence problem

## Notes

- ↑ In computer science,
*string*is often used as a synonym for*sequence*, but it is important to note that*substring*and*subsequence*are not synonyms. Substrings are*consecutive*parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: Gusfield, Dan (1999) [1997].*Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology*. USA: Cambridge University Press. p. 4. ISBN 0-521-58519-8.

*This article incorporates material from subsequence on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.*