Alignment:
Representation of two or more protein or nucleotide sequences where homologous
amino acids or nucleotides are in the same columns while missing aminoacids or
nucucleotides are replaced by gaps
BankIt: A
computer programs developed by NCBI ( national)for submitting your own
sequences to GenBank.
BLAST(Basic
Local Alignment Search Tool): Homology search programs. Program for comparing a
sequence with all the sequences contained in the database.
BLOSUM (Block
Scoring Matrix), a popular substitution
matrix for aligning protein sequences.
Biological Databases:
Clade: Group
of related species and their common ancestors.
Cladogram:
Phylogenetic tree showing the relationship between the species.
Clustal W :
Multiple sequence alignment program
DALI:
Structure database search program.
DDBJ : DNA
Data Bank of Japan
Dendrogram:
Phylogenetic tree. The clustal W guide tree is often referred to as a
dendrogram. It is a file with a .dnd extension.
Dot plot:
Method for representing the similarity between two sequences without using an
alignment.
EBI:
European Bioinformatics Institute. The European homologue of NCBI in US.
EMBL:
European Molecular Biology Laboratory. This acronym often refers to the
nucleotide database the laboratory maintains.
ENTREZ: The
NCBI database querying system that’s similar to to SRS at the EBI. Best known
in the context of the PubMed or Medline bibliographical databases.
Ensembl:
Human /mouse genome databse.
E VALUE:
Expectation value. Given a database and the score of a hit, the E value tells
you how many times you could expected such a result just by chance. In sequence
analysis, good E value must be very low.
ExPasy: A
server maintained by the Swiss Institute of Bioinformatics. Expasy is the home
of SWISS PORT, the annonated protein database.
Fasta: One
of the first popular programs for searching databases. By extension, FASTA has
become the name of the sequence format used by the FATA program.
GenBank: The
main nucleotide database, which is co maintained by NCBI, the EMBL, and DDBJ.
GenScan:
Gene prediction software program.
Gibbs
sampler.: Local multiple sequence alignment method that uses a stochastic
algorithm.
Global
alignment: An alignment of two sequences where no aminoacid or nucleotide is
discarded. They are all either aligned with other amino aids/ nucleotides or
aligned with gaps.
KEGG: Kyoto
Encyclopedia of Genes and Genomes. A world famous Japanese database on genomes
and biochemical pathways.Metabolic pathways databases
InterPro:
Protein domain database.
Lalign: A
popular tool for finding ten or more best local alignments between two
sequences.
Medline: a
collection of bibliographic references maintained by the NCBI.
Mfold: RNA
structure prediction software program.
Multiple
Alignment: An alignment of more than two sequences.
NCBI:
National Center for Biotechnological Information. A component of the U S
National institute of Health dedicated to bioinformatic research, software
development, and the service and maintemce of leading public resources such as
the GenBank (sequences) and PubMed (bibliograpgy) databases. The United states
homologue of the EBI in Europe.
Nj: Neighbor
Joining. It is the most popular method for reconstructing phylogenetic trees.
Nucleotide Sequence Databases: GenBank, EMBL, DDBJ
ORF: Open
Reading Frame. A part of DNA sequences without stop codons.
OMIM:
Genetic disease database.
Pairwise
alignment: Alignment of two ( a pair of )sequences.
PAM (Point
Accepted mutation). A popular substitution matrix for aligning proteins.
Parismony :
A technique for reconstructing phylogenetic trees.
PDB: Protein
Data Bank. A database that contains every available three dimensional
structure. It contains mostly proteins but also a few DNA and RNA structures.
Pfam:
Protein family, a collection of profiles for detecting domains and proteins
families.
Phylip:
Everything on phylogeny. A powerful package for building phylogenetic trees.
PIR: Protein
information Resources. An annotated protein database similar to SWISS PORT. It
is the name of a sequences format similar to FASTA.
PROSITE: A
popular collection of protein domains and patterns.
PuBMed:
NCBI’S efficient implementation of Medline bibliographical database produces by
us National Library of Medicine.
Query:
Question asked when searching a database.
RasMol:
Popular software package for visualizing three dimensional structures.
SRS:
Sequence retrieval system. The system used at the EBI to search databases with
keywords. It is similar to Entrez at NCBI.
SWISS PORT:
One of the most extensive annotated protein databases available.
Tcoffee: A
package for computing, evaluating and combining multiple sequence alignment.
TrEMBL:
Traslated EMBL, which contains all the
putative protein sequences contained in the nucleotide databases.
UPGMA:
Unweighted Pair Group Method with arithmetic mean. A method for
reconstructing phylogenetic trees.