This page contains software I developed and available
under the term of the GNU public license.
The software with a "coming soon" notice are already
implemented but not completely debugged/tested. If you are interested, you
can ask them by e-mailing me, and it will probably motivate me to run more
intensive tests on them to provide you a version as safe as possible.
If something does not work, just ask me, perhaps I forgot to update the
software on the web.
The following sections are available:
General data structures
implemented in C
I needed
vectors,
dictionaries and
profilers in C,
so I implemented them (you will also find a
boolean type and
msgassert
function).
The most achieved structures are certainly the ones related to vectors.
The others are more basics implementations of their respective concepts.
I used them less, so it is more likely for them to contain bugs.
Documentation has been generated through
Doxygen, and can be found
here. The best way to go
through it is certainly the module index.
If you are trying to find
lists, hash tables, sets and maps in C,
have a look at the following web page
http://users.footprints.net/~kaz/kazlib_doc/.
Boolean:
Msgassert:
- Description: a home made assertion function which enables the printing
of a message before exiting.
- Files: msgassert.h
- Documentation
Vectors:
Forewords:
For the following, most of the functionalities have been
inspired by the C++ vector class. Structures vector.c and vector2D.c are container
of elements of any type. To enable this genericity, elements are accessed
through void* pointers, and the type is given dynamically (at initialisation
time) as the size of the stored structure in byte.
For those who think that, for stored structures of small size
(e.g. integers), this would not be efficient, it looks like the -O2 option
of gcc does a good job on that and on my small test sets, the vector structure
was as fast as using an array statically allocated.
The structure vector_bool.c is a specialisation of vector.c
to store booleans more efficiently.
- vectors of any element type (requires bool.h)
- Description: array of elements with random
O(1) access to existing elements and log(N) adjunction of elements at the
end (a little like the vector structure of C++ STL).
- Files: vector.h vector.c
test_vector.c
- Documentation
- vectors of boolean (requires bool.h)
- vectors of vectors of any element type (requires bool.h, vector.h,
vector.c)
Dictionary:
Profiler:
Grammars
I implemented a context-free grammar data structure in C, this structure
is used for the parsers of the
next subsection, but it
can also be used as a stand alone representation of a grammar.
To the grammar structure itself, you can add an interface, enabling to give
nice names to terminals and nonterminals.
Parsers
Software developed for the
Efficient
Biological Grammar Acquisition project.
C Parser for deterministic regular grammars
- Description: This is a very simple matrix implementation of Deterministic
Finite Automata with transitions labelled by characters. It provides structures
and methods to parse simple text being given a grammar under the form of
a deterministic automaton (very restricted, but very useful).
- Files: dfa_parser.h dfa_parser.c test_dfa_parser.c
- Documentation:
C Parser for context-free grammars
The parsers we developed have three main differences with the ones
that can be found usually:
- They are recognizer: they do not return one ore more
parse tree but only yes or no if the string is accepted or rejected. This
enable some speed accelerations. However, If you are interested in parse
trees, they could be easily reconstructed from the internal data structures.
I have to modify these files one day for that aim, let me know if you are
interested, It could give me some motivations :)
- They are not parser compiler but parsers: The usual approach
to parse a grammar is to generate a parser program from the grammar, compile
this program and run it with your input string(s). If your grammar often
changes, this solution is not adapted since it implies compiling the grammar
each time (which can be long). Here, the grammar is just an input file to
the parser.
- They are optimised with respect to biological grammar parsing (see
the web pages below for explanations).
Implementations of parsers actually available are:
- C implementation of the Depth-First Top-Down and Chart parsers. Please
refer to the following web pages:
- Depth-First Top-Down (more efficient than in CPM'05): ILP'05
- Chart and Depth-First Top-Down parsers: CPM'05
Link of a parser in the Prolog system Yap
The depth first top-down parser available on this page can be called from
Prolog programs using the
Yap
interpreter. Please refer to the following we page for this software:
ILP'05.
Other parsers / parser compilers and software related to parsing:
If you are interested about other software to parse context-free grammars,
here is a list of web sites / software implemented by other people and
related to parsing.
- Parsing the whole class of context-free grammars:
- The Accent
parser generator.
- http://www.kohsuke.org/compling/tomita/:
An C implementation of the Tomita parser, and a C++ implementation. But
these parsers have still has some bugs, and the implementation is very slow.
Look at them only if you are interested by this specific algorithm.
- Any Prolog program is able to parse the whole class of context
free grammar using the good representation (yes, even left-recursive grammars).
But it is easier if you have a Definite Clause Grammar representation for
it (you are then restricted to non left-recursive grammars).
- Parsing subclasses of context-free grammars:
- Parser generator for linear, directional methods:
- lex-yacc:
parser generator for C.
- Bison: GNU version of yacc.
- Zyacc:
parser generator for LALR(1) grammars.
- SableCC: LALR parser
generator for Java.
- ANTLR: LL parser generator
for C++, java.
- PRECC: parser generator.
- Gold: parser
generator (I think LALR(1)) independent from programming language.
- ProGrammar:
parser generator and environment for developping parsers.
- Other subclasses of context-free grammars
- Other softwares related to Context-Free Grammars and parsing:
- genRgenS:
generator of random strings with respect to a CFG.
- The Muggleton random logic programs, can be used to generate
strings from logic programs, these logic programs can be representing grammars.
Some implementations available in Inductive Logic Programming systems like
CProgol or Aleph.