Sequence Database
See also:
In the field of bioinformatics, a sequence database is a large collection of DNA, protein,
or other sequences stored on a computer. A database can include
sequences from only one organism, as in databases including all the
proteins in Saccharomyces cerevisiae, or it can include sequences from all organisms whose DNA has been sequenced.
Search issues
Sequence databases can be searched using a variety of methods. The
most common is probably searching for a sequence similar to a certain
target protein or gene whose sequence is already known to the user. The
BLAST program is a method of this type.
Many inputs create inconsistencies
A major problem with all the large genetic sequence databases is
that records are deposited in them from a wide range of sources, from
individual researchers to large genome sequencing centers. As a result,
the sequences themselves, and especially the biological annotations
attached to these sequences, vary tremendously in quality. Also there
is much redundancy, as multiple labs often submit numerous sequences
that are identical, or nearly identical, to others in the databases.
Many annotations are based not on laboratory experiments, but on the
results of sequence similarity searches for previously-annotated
sequences. Of course, once a sequence has been annotated based on
similarity to others, and itself deposited in the database, it can also
become the basis for future annotations. This leads to the transitive annotation problem because there may be several such annotation transfers by sequence similarity between a particular database record and actual wet lab
experimental information. Therefore, one must always regard the
biological annotations in major sequence databases with a considerable
degree of skepticism, unless they can be verified by reference to
published papers describing high-quality experimental data, or at least
by reference to a human-curated sequence database.
For more information see the following links:
External links
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Sequence Database"
|
|