Curriculum Vitae: Daniel FREDOUILLE

Professional address: The Robert Gordon University
School of Computing
Saint Andrew street
Aberdeen AB25 1HG, United Kingdom
Professional phone: +44 (0) 1224 262574
E-mail: df[at]comp[dot]rgu[dot]ac[dot]uk
Web: http://www.comp.rgu.ac.uk/staff/df/

Field: Computer science, Machine Learning with applications in Bioinformatics.

Specialty: Inference of grammars and automata, Symbolic machine learning and Inductive Logic Programming.

Looking for: a temporary or permanent position as Researcher, Lecturer, Engineer or Developer in a research team, in the region of Montréal. Free from May-June 2006.

Overview of formations and experiences

My main formation is computer science. I obtained various degrees including mainly an engineer degree and a PhD in computer science. The PhD was obtained in a bioinformatics research team (SYMBIOSE team, IRISA, INRIA/CNRS, France). My working experience is twofold:

I am interested in applications of computing enabling to better understand living beings. More particularly, I would like to work on applying machine learning techniques to the field of bioinformatics.

My specialty is grammatical inference, in other words the (automatic) discovery of grammatical models enabling to characterize sets of sequences. I am mainly working to obtain such models to characterize sets of proteins sequences sharing a common biological function. The models can then be used to predict the function of biological sequences which have not yet been associated with such a function.

This CV presents my qualifications with respect to the following topics: diplomas, research projects, publications, supervision, teaching, software development, and other activities/qualifications.

Diplomas

1999-2003 PhD in Computer Science, competence in bioinformatics. IRISA laboratory, University of Rennes I.
1998-1999 DEA in Computer Science (diploma for research). University of Rennes I - IFSIC, Rennes, France.
1996-1999 Engineer Diploma in Computer Science, software and computer systems formation, secondary formation in electronics. Engineering school ENSSAT, Lannion, France.
1995-1996 DUETI (university diploma for technological studies in a foreign country), computer science and electronics. University of Sherbrooke, Québec, Canada.
1993-1995 DUT (University diploma for technological studies), formation in electrical engineering and computer science. University of Annecy le Vieux, France.
1993 Baccalaureat E (high school graduation in science and technology), Lycée Saint Exupéry, Bellegarde, France.

Research

My research concerns the automatic acquisition (or inference) of grammars from biological sequences sharing a common biological function. We call such grammars biological grammars. A biological grammar can be seen as a set of rules deciding whether a sequence of letters (representing a DNA, RNA or protein molecule) possess a given biological function. The aim of the inference process is to obtain grammars which can be used:

Point 1 is achieved by analyzing sequences of unknown function with the obtained grammars: if the grammar accepts the sequence this will be interpreted as a good indicator that the sequence share the function represented by the grammar. Point 2 is based on the fact that grammars are not black boxes: the rules are explicit and therefore can be analyzed by molecular biologists. Because the inferred grammars represent common points between the sequences given to the inference process, this analysis is a first step to an explanation of how the function is carried by these sequences.

Grammar inference from biological sequences is a hard task (in term of algorithmic complexity). During my career I had the opportunity to tackle it using different machine learning techniques: Inductive Logic Programming, Automata inference, Genetic Programming. The different projects I have been working on are detailed in the table next page, followed by a description of the projects I am or have been supervising, and finally by a list of publications.

Project: Efficient Biological Grammar Acquisition

Time Period: From March 2004.
Position: Research Fellow, founded by UK EPSRC grant
Supervisor: Chris Bryant (chb[at]comp[dot]rgu[dot]ac[dot]uk)
Summary: The aim of the project is to improve, and then apply, Inductive Logic Programming (ILP) techniques to automatically acquire grammars representing the function of a set of biological sequences (here proteins). This work is done in collaboration with the GlaxoSmithKline multinational which provides the data and the expertise on this data.
Publications: [FB05, BF05]

PhD:Inferring Non-Deterministic Finite Automata, Using Ambiguity Management, for Potential Applications in Bioinformatics.

Time period:From September 1999 to January 2004.
Position: PhD student founded by the French Ministry of Research from Sept. 1999 to Sept. 2002, part-time lecturer from Oct. 2002 to march 2004.
Supervisor:
Jacques Nicolas (jnicolas[at]irisa[dot]fr) and François Coste (fcoste[at]irisa[dot]fr), research team Symbiose, IRISA, University of Rennes I, France.
Summary: This project considers the inference of grammars under the form of non deterministic finite automata. The inference of these models has not been studied much, however they can be considered as more interesting to represent sets of biological sequences than the deterministic automata usually considered in the regular grammatical inference field. The work we realized brought to the field:
  • An efficient method to take counter-examples into account [CF00].
  • The formalization of the search space for a hierarchy of subclasses of nondeterministic automata [CF03a].
  • The reduction of the search space to unambiguous automata together with an inference algorithm for these automata [CF01, CF03c].
  • Tools for the formalization and the introduction of expert knowledge into inference [CF03b, CFKH04].
Results on inference with inference on biological data are available [CKIFD04].

Other research projects

Research at the Swiss Institute of Bioinformatics, Geneva (Nov.-Dec. 2003)
Supervisor: Robin Gras, formerly at the Swiss Institute of Bioinformatics.
Summary: the project consisted in a feasibility study of automata inference using Genetic Programming techniques.

French DEA project (1999)
Supervisor: Laurent Miclet (miclet[at]enssat[dot]fr), engineer school ENSSAT, University of Rennes I, France. Summary: We studied the impact of inferring automata using a general to specific search instead of the usual specific to general approach [FM00].

Publications

[BF05] C.H. Bryant; D. Fredouille, - A Parser for the Efficient Induction of Biological Grammars, ILP'05 late breaking papers track, 2005.
[FB05] D. Fredouille; C.H. Bryant, - Speeding up Parsing of Biological Context-Free Grammars, CPM'05, 2005.
[CFKH04] Coste, F., Fredouille, D., Kermorvant, C., & de la Higuera, C. - Introducing Domain and Typing Bias in Automata Inference., ICGI-04, 2004.
[CKIFD04] Coste, F., Kerbellec, G., Idmont, B., Fredouille, D., & Delamarche, C. (2004). Apprentissage d'automates par fusions de paires de fragments significativement similaires et premières expérimentations sur les protéines MIP. JOBIM'04, 2004
[CF03a] Coste (F.) et Fredouille (D.) -  What is the search space for the inference of non deterministic, unambiguous and deterministic automata ?, technical report RR-4907 INRIA 2003
[CF03b] Coste (F.) et Fredouille (D.) - Introduction de connaissances structurelles et langagières pour l'apprentissage d'automates. CAp’03, 2003
[CF03c] Coste (F.) et Fredouille (D.) - Unambiguous automata inference by means of state-merging methods. ECML'03, 2003
[CF01] Coste (F.) et Fredouille (D.) - Inférence d’AFNs : restriction de l’espace de recherche aux automates non ambigus. CAp’01, 2001
[CF00] Coste (F.) et Fredouille (D.) - Efficient ambiguity detection in C-NFA, a step toward inference of nondeterministic automata. ICGI-00, 2000
[FM00] Fredouille (F.) et Miclet (L.) – Expériences sur l’inférence de langages par spécialisation. CAp’00, 2000

Supervision

PhDs supervision:
Advisor for the PhD theses of Selpi (from Sept. 2004) and Thierry Manner (from Sept. 2005) concerning respectively active learning of biological networks using Inductive Logic Programming (ILP), and inference of biological grammars with ILP based on  compression scores.
Master's degree student's supervision:
From 2004, supervision of four projects (3 to 6 months projects each). Two of these projects concern inference of Prosite patterns (a restricted form of grammar) from biological data, thus using genetic programming. The two others concern the creation of a graphical user interface to bioinformatics tools.
Engineer student supervision:
During the summer 2001, Supervision of a student from the INSA engineering school. Creation of a database for the address book of the "Genopôle Ouest" (French bioinformatics institution) and its Web interface under php3.

Teaching experience

The following table summarizes courses I have given. For each course, the format is the following:
Subject of the course: number of hours lecturing (number of groups); number of hours for giving tutorials (number of groups), Formation level, Institution – Formation name.”

The formation level is indicated by “Bac+x”, meaning students in a formation leading to a diploma with x years after baccalaureate/high school graduation. The names of institutions and of formations are French abbreviations. For more details, on the formations, the web site of the institutions can be consulted.

Year

Courses

2003-2004 Graph Algorithms: lectures 12 hours (2 groups); tutorial 32 hours (2 groups), Bac+5, IFSIC – DICC engineer formation.
Initiation to computer science for biologists: lectures 16 hours (1 group), tutorial 16 hours (2 groups), Bac+2 students, biology IFSIC - DESS.
Initiation to computer science for biologists: lectures 16 hours (1 group), tutorial 16 hours (1 group), Bac+4 students, University of Rennes I - Master in biochemistry.
2002 2003 Compilation: tutorial 20 hours (2 groups), Bac+2 students, IFSIC - DESS CCI.
Database: tutorial 16 hours (2 groups), Bac+4 students, IFSIC, IUP-MIAGE 1.
Algorithms and complexity: lectures 24 hours (2 groups) and tutorial 16 hours (2 groups), Bac+4 students, IFSIC, IUP-MIAGE 1.
Graph Algorithms: lectures 14 hours (2 groups) and tutorial 28 hours (2 groups), Bac+5 engineer students, IFSIC, DICC 2.
Initiation to computer science: tutorial 16 hours (1 group), Bac+2 students, IFSIC, DEUG 1.
2001-2002 Bioinformatics: lectures - tutorial 8 hours (2 groups), further education for adults, staff development in INRA and CNRS research institutions.
2000-2001 Artificial Intelligence: tutorial 24 hours (2 groups), Bac+5 engineer students, IFSIC, DICC 2.
Initiation to computer science: tutorial 16 hours (1 group), Bac+2 students, IFSIC, DEUG 1.
1999-2000 Initiation to computer science: lectures 12 hours (1 group) and tutorial 32 hours (2 groups), Bac+2 students, IFSIC, DEUG 1.
1998 1999 Algorithms for graphic display: tutorial 20 hours (2 groups), Bac+5 engineer students ENSSAT 2.
Total: 354 hours (102 lecturing hours, 252 tutoring hours)

Software development

From March 2004 Context: software associated with the "efficient biological grammar acquisition" project
Location: Robert Gordon University, Aberdeen, UK.
Object: implementation of Context-Free Grammar parsers. Programming languages: C++, C, Prolog and Python.
Oct. 1999 to Feb. 2004 Context: Applications linked to the PhD thesis.
Location: IRISA laboratory, Rennes, France.
Object: Software platform for grammatical inference algorithms. Programming language C++.
1995-1996 Context: Project associated to the DUETI diploma.
Location: University of Sherbrooke, Québec, Canada.
Object: Implementation and comparison of audio signal compression algorithms. Programming language C.
May to June 1995 Context: Probationary period for the DUT diploma.
Location : Firm ICN, Informatique et Commandes Numériques (computer science and numerical commands), Saint Jean de Maurienne, France.
Object: Creating an interface between a numerical command and a PC. Programming languages C and Pascal.

Other activities/qualifications

Representative and associative activities

2004-2005 Representative for Research Fellows and Research Assistants at the School of Computing research committee, the Robert Gordon University.
2002 Representative for PhD students at the scientific committee of the University of Rennes I, Rennes, France.
2001-2002 Representative for PhD students at the doctoral school MATISSE, Rennes, France.
1999-2000 Vice-chairman of ADOC, the association of PhD students of doctoral school MATISSE, Rennes, France.
1996-1997 Secretary of AEAE, the student association of the engineering school ENSSAT, Lannion, France.

Languages:

Miscellaneous:

Hobbies

Outdoors activities: camping and climbing.
Other hobbies: choral (4 part harmonies barbershop, and traditional songs from various countries).