- •LECTURE 9
- •MACHINE LEARNING OVERVIEW
- •LEARNING
- •MAINTAINING A BALANCE
- •A PARTIAL CHARACTERISATION OF LEARNING TASKS
- •MAINTAINING A BALANCE
- •MAINTAINING A BALANCE
- •INDUCTIVE LOGIC PROGRAMMING
- •EXAMPLE LEARNED LP
- •STOCHASTIC LOGIC PROGRAMS
- •AUTOMATED THEORY FORMATION
- •OTHER MACHINE LEARNING METHODS
- •BIOINFORMATICS OVERVIEW
- •FROM SEQUENCE TO STRUCTURE
- •PROBLEM NUMBER ONE
- •PROBLEM NUMBER TWO
- •OTHER AIMS OF BIOINFORMATICS
- •SOME CURRENT
- •A SUBSTRUCTURE SERVER
- •THE SUBSTRUCTURE SERVER
- •USING MEDICAL ONTOLOGIES
- •GENE ONTOLOGY DISCOVERY
- •STUDYING BIOCHEMICAL NETWORKS
- •CLOSED LOOP MACHINE LEARNING
- •FUTURE DIRECTIONS FOR MACHINE LEARNING IN BIOINFORMATICS
- •BIOCHEMICAL PATHWAYS
STOCHASTIC LOGIC PROGRAMS
Generalisation of HMMs
Probabilistic logic programs
More expressive language than LPs
Quantative rather than qualitative
Express arbitrary intervals over probability distributions
Issues in learning SLPs
Structure estimation
Parameter estimation
Applications
More appropriate for biochemical networks
AUTOMATED THEORY FORMATION
Descriptive learning technique
Which can also be used for prediction tasks
Cycle of activity
Form concepts, make hypotheses, explain hypotheses, evaluate concepts, start again,
…
15 production rules for concepts
7 methods to discover and extract conjectures
Uses third party software to prove/disprove (maths)
25 heuristic measures of interestingness
OTHER MACHINE LEARNING METHODS
Genetic algorithms
To perform ILP search (Alireza)
Bayes nets
Introduction of hidden nodes (Philip)
Kernel methods
Relational kernels for SVMs and regression (Huma)
Action Languages
Stochastic (re)actions (Hiraoki)
BIOINFORMATICS OVERVIEW
“Bioinformatics is the study of information content and information flow in biological systems and proceses” (Michael Liebman)
Not just storage and analysis of huge DNA sequences
“Bioinformaticians have to be a Jack of all trades and a master of one” (Charlie Hodgman, GSK)
Highly collaborative
biology, mathematics, statistics, computer science, biochemistry, physics, chemistry, medicine, …
FROM SEQUENCE TO STRUCTURE
attcgatcgatcgatcgatcaggcgcgcta
Cgagcggcgaggacctcatcatcgatcag…
MRPQAPGSLVDPNEDELRMAPWYWGRISREEA
KSILHGKPDGSFLVRDALSMKGEYTLTLMKDG
CEKLIKICHMDRKYGFIETDLFNSVVEMINYY
KENSLSMYNKTLDITLSNPIVRAREDEESQPH
GDLCLLSNEFIRTCQLLQNLEQNLENKRNSFN
AIREELQEKKLHQSVFGNTEKIFRNQIKLNES
FMKAPADA……
There is a computer program…?
PROBLEM NUMBER ONE
From protein sequence to protein function
HGP data needs to be interpreted
Genome split into genes, which code for a protein
Biological function of protein dictated by structure
Structure of many proteins already determined
By X-ray crystallography
Best idea so far: given a new gene sequence
Find sequence most similar to it with known structure
And look at the structure/function of the protein
Other alternatives
Use ML techniques to predict where secondary structures will occur (e.g., hairpins, alpha-helices, beta-sheets)
PROBLEM NUMBER TWO
Drug companies lose millions
Developing drugs which turn out to be toxic
Predictive Toxicology
Determine in advance which will be toxic
Approach 1: Mapping molecules to toxicity
Using ML and statistical techniques
Approach 2:
Producing metabolic explanations of toxic effects
Using probabilistic logics to represent pathways
And learning structures and parameters over this
OTHER AIMS OF BIOINFORMATICS
Organisation of Data
Cross referencing
Data integration is a massive problem
Analysing data from
High-throughput methods for gene expression
Ask Yike about this!
Produce Ontologies
And get everyone to use them?
SOME CURRENT
BIOINFORMATICS PROJECTS
SGC
The Substructure Server
SGC and SHM
Discovery in medical ontologies
SHM
Studying biochemical networks (£400k, BBSRC)
Closed loop learning (£200k, EPSRC)
The Metalog project (£1.1 million, DTI)
APRIL 2 (£400k, EC)
A SUBSTRUCTURE SERVER
Lesson from Automated Theorem Proving
Best (most complex) methods not most used
Other considerations: ease of use, stability, simplicity, e.g., Otter
Aim: provide a simple predictive toxicology program
Via a server with a very simple interface
Sub-projects
Find substructures in many positives, few negatives: Colton
Simple Prolog program, writing Java version, use ILP??
Put program on server: Anandathiyagar (MSc.)
Distribute process over our Linux cluster: Darby (MEng.)
Babel preprocessor (50+ repns), Rasmol back-end: ???