Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Bioinformatics_lectures / lecture5.pptx
Скачиваний:
6
Добавлен:
21.02.2016
Размер:
1.42 Mб
Скачать

The different types of Databases in Bioinformatics

2) Database:

Organisation:

Availability:

flat files

Publicly available, no restriction

Relational databases

Available, but with copyright

Object-oriented databases

Accessible, but not downloadable

….

Academic, but not freely available

 

 

Commercial

Curators:

Large, public institution (EMBL, NCBI)

Quasi-academic institute (Swiss institute of Bioinformatics, TIGR,…)

Academic group or scientist

Commercial company

Identifiers and Accession numbers

Identifier: string of letters and digits that generally is “understandable”

Example: TPIS_CHICK (Triose Phosphate Isomerase from chicken (gallus gallus) ) in SwissProt

The identifier can change (based on the curator)

Accession code: a string of letters and digits that uniquely identifies an entry in its database.

The accession number for TPIS_CHICK in Swissprot is P00940

Accession number should not changed!!

An accession number

An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence.

Examples (all for retinol-binding protein, RBP4):

X02775

GenBank genomic DNA sequence DNA

NT_030059

Genomic contig

Rs7079946

dbSNP (single nucleotide polymorphism)

RNA

N91759.1An expressed sequence tag (1 of 170) NM_006744 RefSeq DNA sequence (from a transcript)

NP_007635

RefSeq protein

protein

AAC02945

GenBank protein

 

Q28369

 

SwissProt protein

 

1KT7

Protein Data Bank structure record

 

Nucleotide Sequence Databases

3 main databases

EMBL: www.ebi.ac.uk/embl

GenBank: www.ncbi.nlm.nih.gov/GenBankDDBJ: www.ddbj.nig.ac.jp

The 3 databases are synchronized on a daily basis, and the accession numbers are consistent.

There are no legal restriction in the usage of these databases. However, there are some patented sequences in the database

EMBL and DDBJ

Collaborative effort with NCBI GenBank

Searchable databases of gene information

EMBL

Gene Expression and Protein Sequences

UniProt Knowledgebase

A complete annotated protein sequence database

Macromolecular Structure Database

European Project for the management and distribution of data on macromolecular structures

ArrayExpress

Gene expression data

IntAct

Provides a freely available, open source database system and analysis tools for protein interaction data

DDBJ

Tools to compare nucleic acid sequences and amino acid sequences

Fasta, blastn, tblastx

nucleotide : nucleotide

Fastx and blastx

nucleotide : amino acid

Tfasta, tfastx,blastp, tblastn

amino acid : nucleotide

ww.ncbi.nlm.nih.gov

Соседние файлы в папке Bioinformatics_lectures