| Daniel Fredouille address : The The Robert Gordon University School of Computing Saint Andrew street Aberdeen AB25 1HG United Kingdom Phone: +44 (0) 1224 262574 Email : df
comp.rgu.ac.uk |
|
|
| Welcome on my professional page. |
General introduction:My research concerns the domain of machine learning, and is applied to the field of bioinformatics.
To be more precise, I am a specialist of grammatical inference, in other words the discovery of grammatical models enabling to characterize sets of sequences. I am mainly working to obtain such models to characterize sets of proteins sharing a common biological function.
Apart from my speciality (grammatical inference), I am also very interested in:
- Biological networks
- Protein structure prediction
- Knowledge formalisation for its introduction in machine learning tools
- (Inductive) Logic Programming
- Compression measures (MML and MDL encoding)
Machine learning is the area of study concerned with how a computational system can acquire knowledge from its experiences and observations. When dealing more precisely with Grammatical Inference, the knowledge is represented as a formal grammar, and the main observations are strings belonging or not to the grammar we want to acquire.The link with the domain application of bioinformatics is the following: biological sequences like DNA, RNA or proteins are specialized machines in cells, each of which fulfils its own task. They comprise a linear sequence of units, where each unit is a member of a small set of known chemical residues. Thus RNA, DNA or proteins can be represented by strings of letters from a well defined chemical alphabet.
My research aims at characterizing a set of biological sequences sharing the same biological function using as models Formal Grammars. The aim of this characterization is to be used by molecular biologists such as to :
- Discover new biological sequences with the same biological function,
- Obtaining information on how the biological function is realized.
The characterization is formalized by a model which has to be automatically discovered thanks to examples of sequences which possess the function. The usual models of the bioinformatics community used to represent biological sequences are Prosite motifs, Profiles, or (Hidden) Markov Models. Databases of models for family of proteins are available on the Web (see for instance the Web site "Prosite" - http://www.expasy.org/prosite/ - for profiles and motifs and the web site of Pfam - http://www.sanger.ac.uk/Software/Pfam/index.shtml - for Markov Models). These databases are precious tools for molecular biologists.
Amongst models enabling sequences classification, some can be seen as "black boxes": they return a classification without any explanation on the way this classification is made (this is the case for instance when using neural networks). We are more interested in inferring explicit models, i.e. which can be interpreted by an expert of the domain of application. The most explicit models used in bioinformatics for the classification of sequences are Prosite motifs. Prosite motifs are in fact restrictions of the much more general and powerful formalism of Formal Grammars that we consider.
Efficient Biological Grammar Acquisition:To see former projects, click here.![]()
My current work in the project Efficient Biological Grammar Acquisition deals with Context Free Grammar inference using Inductive Logic Programming. It takes place at the School of Computing, Robert Gordon University, under the direction of Chris Bryant.
Keywords : Machine learning, bioinformatics, Context-Free grammatical inference, Context-Free grammar parsing, Inductive Logic Programming.
Summary : This project (official web site) is concerned with the automatic acquisition of context-free grammars from sets of biological sequences. The aim of the project is to develop an efficient method of acquiring biological grammars in collaboration with GlaxoSmithKline. We envisage a method which uses 1) efficient techniques for discovering nonterminals which are potentially pertinent to the subsequent induction of a biological grammar and 2) a parser which increases the speed at which biological grammars may be acquired.
Products :
[FB05] + associated software. (concerns part (2) of the project, the choice of the parser to use)
[BF05] + associated software. (concerns part (2) of the project, the parser used during inference)
Introducing bias in regular grammatical inference:![]()
![]()
In collaboration with the French INRIA-IRISA-Symbiose research team, this projects aims at improving automata/regular grammar inference algorithms by providing tools to introduce bias along the inference.
Keywords : Machine learning, Regular Grammar Inference, Background Knowledge, Automata.
Summary : This project is concerned with the automatic acquisition of regular grammars (often represented by automaton) from sequences and background knowledge. The aim of the project is to develop the tools needed to introduce background knowledge along this kind of inference. It deals with the representations used to store the background knowledge given by an expert and with algorithms to integrate this knowledge along the inference process [CFKH04].
Products : [CFKH04]
I am advisor on Selpi's Phd, concerning active learning, ILP, and biological applications. See her web page: http://www.comp.rgu.ac.uk/staff/ses/ for more information.
I am advisor on Thierry Manner's Phd, concerning MDL/MML encoding to evaluate grammars representing biological sequences (http://www.comp.rgu.ac.uk/staff/tm/).Student project supervision
Different projects are being implemented and tested under my supervision, please contact me for more information. Current projects are:
- Inferring Prosite-like motifs from positive examples only and using Genetic Algorithms.
- Design of a graphical user interface for calling biological sequences analysis softwares.
If you want more detail on grammatical inference in general, see the web site of the community http://eurise.univ-st-etienne.fr/gi/. If you are more interested in my research topic you can consult my thesis. However it is in French. See the publication section of this page for English publications.
For any questions or critics, proposition of collaborations, ..., just contact me by e-mail:
dfcomp.rgu.ac.uk.