Search www.comp.rgu.ac.uk for:

Dr Chris Bryant

Photo Publications

Brief Curriculum Vitae

Research Interests

From 1994 onwards, I have been investigating how symbolic machine learning may be applied to real-world applications, mainly in the fields of molecular biology. The scope of machine learning can be broadly defined as those computer programs which improve their performance at some task through experience [T.M.Mitcell,97]. My research interests include:

Research Projects

The following is a brief description of the projects I have worked on, together with links to the corresponding project pages where you will find more details.

Efficient Biological Grammar Acquisition

I was the principal investigator on the recently completed EPSRC Efficient Biological Grammar Acquisition project (GR/S68682, £ 110K). GlaxoSmithKline was an industrial collaborator on the project. Linguistic methods have provided some interesting results in biology. However hand-crafting grammars is difficult and, because it requires human expertise, expensive. Thus, given the enormous volume of data arising from genome projects, there is a need to automate the acquisition of grammars from sets of biological sequences. Prior to this project, the speed at which inductive logic programming (ILP) systems could generate biological grammars had been a bottleneck. We have developed a more efficient method of acquiring such grammars using ILP: time gains larger than 5-fold were observed in 80% of the experiments. We have also proposed methods for obtaining different sets of background knowledge and then studied the impact of these sets on inference results. All but one of our proposed sets of background knowledge have a statistically significant positive impact on the predictive power of inferred rules, either directly or through interactions with other sets.

Learning which uORFs Regulate Gene Expression

Regulation of gene expression is central to biology. However, a holistic regulatory mechanism of gene expression is still far beyond current knowledge in biology. This is mainly because very little is known about regulatory elements. This project concerns one of these elements, namely the upstream Open Reading Frames (uORFs), in the yeast Saccharomyces cerevisiae. Our approach applies a combination of ILP and bioinformatics tools to data integrated from several resources.

Closed Loop Machine Learning

This a collaborative enterprise aimed at partially automating some aspects of scientific work. These aspects include the processes of forming hypotheses, devising trials to discriminate between these competing hypotheses, physically performing these trials and then using the results of these trials to converge upon an accurate hypothesis. We have developed ASE-Progol, a potential component of the reasoning carried out by an ``artificial scientist''. ASE-Progol is an Active Learning system which uses inductive logic programming to construct hypothesised first-order theories and uses a CART-like algorithm to select trials for eliminating ILP derived hypotheses. In simulated yeast growth tests ASE-Progol was used to rediscover how genes participate in the aromatic amino acid pathway of the yeast Saccharomyces cerevisiae. The cost of the chemicals consumed in converging upon a hypothesis with an accuracy of around 88% was reduced by five orders of magnitude when trials were selected by ASE-Progol rather than being sampled at random. While the naive strategy of always choosing the cheapest trial from the set of candidate trials led to lower cumulative costs than ASE-Progol, both the naive strategy and the random strategy took significantly longer to converge upon a final hypothesis than ASE-Progol. For example to reach an accuracy of 80%, ASE-Progol required 4 days while random sampling required 6 days and the naive strategy required 10 days. Following the completion of the Closed Loop Machine Learning project, ASE-Progol was incorporated into The Robot Scientist (see Nature 427(6971):247-252, 2004).

Using Machine Learning to Discover Diagnostic Sequence Motifs

This main aim of this project was to investigate whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the inductive logic programming (ILP) system CProgol was used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). This was the first real-world scientific application of the positive-only learning framework of the ILP system Progol and the first attempt to acquire a grammar for a biological domain using ILP. Performance was measured using a new cost function, Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity.

IMPRESS project

The main aim of IMPRESS was to create new methods for automated validation of requirements specifications using techniques from the field of machine learning. I conducted experiments to assess the suitability of the theory revision tool Forte for the project and subsequently developed a theory revision system for logic programs which identifies the parts of a program that need refining and then automatically revises these parts. I contributed towards a method for generating a proof tree from an instance and a general logic program i.e. one which includes negative literals. This method was different to previous work in the field in that negative literals were first unfolded and then transformed using De Morgan's laws, so that the tree explicitly included negative clauses.

Doctoral Project

My doctoral programme of research included the application of machine learning techniques to analytical chemistry. Three systems were applied to data from a database of published enantioseparations by high-performance liquid chromatography performed on commercially available chiral stationary phases (CSPs). The aim was to induce generalisations that recommend particular CSP chiral selectors based on the structural features of an enantiomer pair.

Research Students

Teaching

My teaching materials can be accessed from here.

Administrative Responsibilities

Contact Details

RGU School of Computing, St. Andrew Street, Aberdeen, AB25 1HG, United Kingdom.

Research group: Computational Intelligence

This page is maintained by Chris Bryant.
Last updated on 14 March 2008.