Daniel Fredouille
address :
  The The Robert Gordon University
  School of Computing
  Saint Andrew street
  Aberdeen AB25 1HG
  United Kingdom
Phone: +44 (0) 1224 262574
Email : df comp.rgu.ac.uk
daniel fredouille
rgu

Welcome on my professional page.

Presentation
Publications
Software
Curriculum vitae (updated 15/11/2005)

Presentation

Research interests:
My research concerns the domain of machine learning, and is applied to the field of bioinformatics.
To be more precise, I am a specialist of grammatical inference, in other words the discovery of grammatical models enabling to characterize sets of sequences. I am mainly working to obtain such models to characterize sets of proteins sharing a common biological function.

Apart from my speciality (grammatical inference), I am also very interested in:
General introduction:
Machine learning is the area of study concerned with how a computational system can acquire knowledge from its experiences and observations. When dealing more precisely with Grammatical Inference, the knowledge is represented as a formal grammar, and the main observations are strings belonging or not to the grammar we want to acquire.

The link with the domain application of bioinformatics is the following: biological sequences like DNA, RNA or proteins are  specialized machines in cells, each of which fulfils its own task. They comprise a linear sequence of units, where each unit is a member of a small set of known chemical residues. Thus RNA, DNA or proteins can be represented by strings of letters from a well defined chemical alphabet.

My research aims at characterizing a set of biological sequences sharing the same biological function using as models Formal Grammars. The aim of this characterization is to be used by molecular biologists such as to :

The characterization is formalized by a model which has to be automatically discovered thanks to examples of sequences which possess the function. The usual models of the bioinformatics community used to represent biological sequences are Prosite motifs, Profiles, or (Hidden) Markov  Models. Databases of models for family of proteins are available on the Web (see for instance the Web site "Prosite" - http://www.expasy.org/prosite/ - for profiles and motifs and the web site of Pfam -  http://www.sanger.ac.uk/Software/Pfam/index.shtml - for Markov Models). These databases are precious tools for molecular biologists.

Amongst models enabling sequences classification, some can be seen as "black boxes": they return a classification without any explanation on the way this classification is made (this is the case for instance when using neural networks). We are more interested in inferring explicit models, i.e. which can be interpreted by an expert of the domain of application. The most explicit models used in bioinformatics for the classification of sequences are Prosite motifs. Prosite motifs are in fact restrictions of the much more general and powerful formalism of Formal Grammars that we consider.

Projects I am involved in:
Efficient Biological Grammar Acquisition: rgu
My current work in the project Efficient Biological Grammar Acquisition deals with Context Free Grammar inference using Inductive Logic Programming. It takes place at the School of Computing, Robert Gordon University, under the direction of Chris Bryant.
Keywords : Machine learning, bioinformatics, Context-Free grammatical inference, Context-Free grammar parsing, Inductive Logic Programming.
Summary : This project (official web site) is concerned with the automatic acquisition of context-free grammars from sets of biological sequences. The aim of the project is to develop an efficient method of acquiring biological grammars in collaboration with GlaxoSmithKline. We envisage a method which uses 1) efficient techniques for discovering nonterminals which are potentially pertinent to the subsequent induction of a biological grammar and 2) a parser which increases the speed at which biological grammars may be acquired.
Products :
[FB05]
+ associated software. (concerns part (2) of the project, the choice of the parser to use)
[BF05] + associated software. (concerns part (2) of the project, the parser used during inference)

Introducing bias in regular grammatical inference:
rgu irisa
In collaboration with the French INRIA-IRISA-Symbiose research team, this projects aims at improving automata/regular grammar inference algorithms by providing tools to introduce bias along the inference.
Keywords : Machine learning, Regular Grammar Inference,  Background Knowledge, Automata.
Summary : This project is concerned with the automatic acquisition of regular grammars (often represented by automaton) from sequences and background knowledge. The aim of the project is to develop the tools needed to introduce background knowledge along this kind of inference. It deals with the representations used to store the background knowledge given by an expert and with algorithms to integrate this knowledge along the inference process [CFKH04].
Products : [CFKH04]
To see former projects, click here.

PhD supervision
I am advisor on Selpi's Phd, concerning active learning, ILP, and biological applications. See her web page: http://www.comp.rgu.ac.uk/staff/ses/ for more information.
I am advisor on Thierry Manner's Phd, concerning MDL/MML encoding to evaluate grammars representing biological sequences (http://www.comp.rgu.ac.uk/staff/tm/).

Student project supervision
Different projects are being implemented and tested under my supervision, please contact me for more information. Current projects are:
More information:
If you want more detail on grammatical inference in general, see the web site of the community http://eurise.univ-st-etienne.fr/gi/. If you are more interested in my research topic you can consult my thesis. However it is in French. See the publication section of this page for English publications.

For any questions or critics, proposition of collaborations, ..., just contact me by e-mail:
df comp.rgu.ac.uk.

Publications

In english

2005
[BF05] C.H. Bryant; D. Fredouille, - A Parser for the Efficient Induction of Biological Grammars, ILP'05, 2005 (.ps.gz). Slides available (.ppt). Late breaking paper format. Please have a look at the associated web page for datasets and softwares.

[FB05] D. Fredouille; C.H. Bryant, - Speeding up Parsing of Biological Context-Free Grammars, CPM'05, 2005 (.ps.gz). Slides available (.ppt). Please have a look at the associated web page for datasets and softwares.

2004
[CFKH04] Coste, F., Fredouille, D., Kermorvant, C., & de la Higuera, C. - Introducing Domain and Typing Bias in Automata Inference., ICGI-04, 2004 (.ps.gz). Slides available (.ppt)

2003
[CF03a] Coste (F.) et Fredouille (D.) - What is the search space for the inference of non deterministic, unambiguous and deterministic automata ?, technical report INRIA, 2003. RR-4907. Available for free on the INRIA Web site.

[CF03c] Coste (F.) et Fredouille (D.) - Unambiguous automata inference by means of state-merging methods. ECML'03, 2003 (.ps.gz), complementary experiments available (.ps.gz), benchmarks available (.tar.gz). Electronic version of the paper available for sept. 2004, slides available (.ppt).

2000
[CF00] Coste (F.) et Fredouille (D.) - Efficient ambiguity detection in C-NFA, a step toward inference of nondeterministic automata. ICGI-00, 2000 (.ps.gz pdf) and experiments material (tar.gz).

In french

2004
[CKIFD04] Coste, F., Kerbellec, G., Idmont, B., Fredouille, D., & Delamarche, C. (2004). Apprentissage d'automates par fusions de paires de fragments significativement similaires et premières expérimentations sur les protéines MIP. JOBIM'04 , 2004 (Learning automata by merging similar pairs of fragments and first experiments on the MIP protein family) (.doc).

2003
[CF03d] Fredouille (D.) - Inférence d'automates finis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique. PhD thesis, University of Rennes I, France (.ps.gz .pdf) and slides of the presentation (.ppt). (Inferring non deterministic finite automata, thanks to ambiguity managment, owing to application in bioinformatics)

[CF03b] Coste (F.) et Fredouille (D.) - Introduction de connaissances structurelles et langagières pour l'apprentissage d'automates. CAp’03, 2003 (.ps.gz pdf)  (Introducing structural and syntactic knowledge for automata inference).

2001
[CF01] Coste (F.) et Fredouille (D.) - Inférence d’AFNs : restriction de l’espace de recherche aux automates non ambigus. CAp’01, 2001 (.ps.gz pdf) (Inferring NFAs: a restriction of the search space to unambiguous automata).

2000
[FM00] Fredouille (F.) et Miclet (L.) – Expériences sur l’inférence de langages par spécialisation. CAp’00, 2000 (Experiments on inferring automata by specializing hypothesis).