Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Bioinformatics_lectures / lecture9.pptx
Скачиваний:
1
Добавлен:
21.02.2016
Размер:
655.21 Кб
Скачать

STOCHASTIC LOGIC PROGRAMS

Generalisation of HMMs

Probabilistic logic programs

More expressive language than LPs

Quantative rather than qualitative

Express arbitrary intervals over probability distributions

Issues in learning SLPs

Structure estimation

Parameter estimation

Applications

More appropriate for biochemical networks

AUTOMATED THEORY FORMATION

Descriptive learning technique

Which can also be used for prediction tasks

Cycle of activity

Form concepts, make hypotheses, explain hypotheses, evaluate concepts, start again,

15 production rules for concepts

7 methods to discover and extract conjectures

Uses third party software to prove/disprove (maths)

25 heuristic measures of interestingness

OTHER MACHINE LEARNING METHODS

Genetic algorithms

To perform ILP search (Alireza)

Bayes nets

Introduction of hidden nodes (Philip)

Kernel methods

Relational kernels for SVMs and regression (Huma)

Action Languages

Stochastic (re)actions (Hiraoki)

BIOINFORMATICS OVERVIEW

“Bioinformatics is the study of information content and information flow in biological systems and proceses” (Michael Liebman)

Not just storage and analysis of huge DNA sequences

“Bioinformaticians have to be a Jack of all trades and a master of one” (Charlie Hodgman, GSK)

Highly collaborative

biology, mathematics, statistics, computer science, biochemistry, physics, chemistry, medicine, …

FROM SEQUENCE TO STRUCTURE

attcgatcgatcgatcgatcaggcgcgcta

Cgagcggcgaggacctcatcatcgatcag…

MRPQAPGSLVDPNEDELRMAPWYWGRISREEA

KSILHGKPDGSFLVRDALSMKGEYTLTLMKDG

CEKLIKICHMDRKYGFIETDLFNSVVEMINYY

KENSLSMYNKTLDITLSNPIVRAREDEESQPH

GDLCLLSNEFIRTCQLLQNLEQNLENKRNSFN

AIREELQEKKLHQSVFGNTEKIFRNQIKLNES

FMKAPADA……

There is a computer program…?

PROBLEM NUMBER ONE

From protein sequence to protein function

HGP data needs to be interpreted

Genome split into genes, which code for a protein

Biological function of protein dictated by structure

Structure of many proteins already determined

By X-ray crystallography

Best idea so far: given a new gene sequence

Find sequence most similar to it with known structure

And look at the structure/function of the protein

Other alternatives

Use ML techniques to predict where secondary structures will occur (e.g., hairpins, alpha-helices, beta-sheets)

PROBLEM NUMBER TWO

Drug companies lose millions

Developing drugs which turn out to be toxic

Predictive Toxicology

Determine in advance which will be toxic

Approach 1: Mapping molecules to toxicity

Using ML and statistical techniques

Approach 2:

Producing metabolic explanations of toxic effects

Using probabilistic logics to represent pathways

And learning structures and parameters over this

OTHER AIMS OF BIOINFORMATICS

Organisation of Data

Cross referencing

Data integration is a massive problem

Analysing data from

High-throughput methods for gene expression

Ask Yike about this!

Produce Ontologies

And get everyone to use them?

SOME CURRENT

BIOINFORMATICS PROJECTS

SGC

The Substructure Server

SGC and SHM

Discovery in medical ontologies

SHM

Studying biochemical networks (£400k, BBSRC)

Closed loop learning (£200k, EPSRC)

The Metalog project (£1.1 million, DTI)

APRIL 2 (£400k, EC)

A SUBSTRUCTURE SERVER

Lesson from Automated Theorem Proving

Best (most complex) methods not most used

Other considerations: ease of use, stability, simplicity, e.g., Otter

Aim: provide a simple predictive toxicology program

Via a server with a very simple interface

Sub-projects

Find substructures in many positives, few negatives: Colton

Simple Prolog program, writing Java version, use ILP??

Put program on server: Anandathiyagar (MSc.)

Distribute process over our Linux cluster: Darby (MEng.)

Babel preprocessor (50+ repns), Rasmol back-end: ???

Соседние файлы в папке Bioinformatics_lectures