Glossary of Bioinformatics Terms - BLAST BLOSUM E value KEGG FASTA RasMol Tcoffee upgma

Alignment: Representation of two or more protein or nucleotide sequences where homologous amino acids or nucleotides are in the same columns while missing aminoacids or nucucleotides are replaced by gaps

BankIt: A computer programs developed by NCBI ( national)for submitting your own sequences to GenBank.

BLAST(Basic Local Alignment Search Tool): Homology search programs. Program for comparing a sequence with all the sequences contained in the database.

BLOSUM (Block Scoring Matrix), a popular substitution matrix for aligning protein sequences.

PAM vs BLOSUM

Biological Databases:

Clade: Group of related species and their common ancestors.

Cladogram: Phylogenetic tree showing the relationship between the species.

Clustal W : Multiple sequence alignment program

DALI: Structure database search program.

DDBJ : DNA Data Bank of Japan

Dendrogram: Phylogenetic tree. The clustal W guide tree is often referred to as a dendrogram. It is a file with a .dnd extension.

Dot plot: Method for representing the similarity between two sequences without using an alignment.

EBI: European Bioinformatics Institute. The European homologue of NCBI in US.

EMBL: European Molecular Biology Laboratory. This acronym often refers to the nucleotide database the laboratory maintains.

ENTREZ: The NCBI database querying system that’s similar to to SRS at the EBI. Best known in the context of the PubMed or Medline bibliographical databases.

Ensembl: Human /mouse genome databse.

E VALUE: Expectation value. Given a database and the score of a hit, the E value tells you how many times you could expected such a result just by chance. In sequence analysis, good E value must be very low.

ExPasy: A server maintained by the Swiss Institute of Bioinformatics. Expasy is the home of SWISS PORT, the annonated protein database.

Fasta: One of the first popular programs for searching databases. By extension, FASTA has become the name of the sequence format used by the FATA program.

GenBank: The main nucleotide database, which is co maintained by NCBI, the EMBL, and DDBJ.

GenScan: Gene prediction software program.

Gibbs sampler.: Local multiple sequence alignment method that uses a stochastic algorithm.

Global alignment: An alignment of two sequences where no aminoacid or nucleotide is discarded. They are all either aligned with other amino aids/ nucleotides or aligned with gaps.

KEGG: Kyoto Encyclopedia of Genes and Genomes. A world famous Japanese database on genomes and biochemical pathways.Metabolic pathways databases

InterPro: Protein domain database.

Lalign: A popular tool for finding ten or more best local alignments between two sequences.

Medline: a collection of bibliographic references maintained by the NCBI.

Mfold: RNA structure prediction software program.

Multiple Alignment: An alignment of more than two sequences.

Pairwise and Multiple Sequence alignment

Pairwise alignment and Multiple Sequence alignment

NCBI: National Center for Biotechnological Information. A component of the U S National institute of Health dedicated to bioinformatic research, software development, and the service and maintemce of leading public resources such as the GenBank (sequences) and PubMed (bibliograpgy) databases. The United states homologue of the EBI in Europe.

Nj: Neighbor Joining. It is the most popular method for reconstructing phylogenetic trees.

Nucleotide Sequence Databases: GenBank, EMBL, DDBJ

ORF: Open Reading Frame. A part of DNA sequences without stop codons.

OMIM: Genetic disease database.

Pairwise alignment: Alignment of two ( a pair of )sequences.

PAM (Point Accepted mutation). A popular substitution matrix for aligning proteins.

Parismony : A technique for reconstructing phylogenetic trees.

PDB: Protein Data Bank. A database that contains every available three dimensional structure. It contains mostly proteins but also a few DNA and RNA structures.

Pfam: Protein family, a collection of profiles for detecting domains and proteins families.

#Bioinformatics

Phylip: Everything on phylogeny. A powerful package for building phylogenetic trees.

PIR: Protein information Resources. An annotated protein database similar to SWISS PORT. It is the name of a sequences format similar to FASTA.

PROSITE: A popular collection of protein domains and patterns.

PuBMed: NCBI’S efficient implementation of Medline bibliographical database produces by us National Library of Medicine.

Query: Question asked when searching a database.

RasMol: Popular software package for visualizing three dimensional structures.

SRS: Sequence retrieval system. The system used at the EBI to search databases with keywords. It is similar to Entrez at NCBI.

SWISS PORT: One of the most extensive annotated protein databases available.

Tcoffee: A package for computing, evaluating and combining multiple sequence alignment.

TrEMBL: Traslated EMBL, which contains all the putative protein sequences contained in the nucleotide databases.

UPGMA: Unweighted Pair Group Method with arithmetic mean. A method for reconstructing phylogenetic trees.

Glossary of Bioinformatics Terms - BLAST BLOSUM E value KEGG FASTA RasMol Tcoffee upgma

Post a Comment

Contact form