4.4 Discovery and Presentation of the Narrative Structure
4.4.1 Detecting and Presenting the Main Characters
As mentioned in 4.2.3, we used a large list of names of men and women in the Bible (e-Resource 2000) to assist the detection of character names in the e-Book.
The identification of the character names in the text is similar like identifying stop-words in text retrieval process. We wrote a Java program to process the e-Book chapter files. For each chapter, we scanned it line by line; for each word, we compared it with the stop-word list and the character name lists to identify whether it was a stop word, a content word, a name of a man or a name of a woman; if it was a content word, we also stem it for further retrieval process. We used Porter's stemming algorithms (Porter 1980). We then created an inverted file (Baeza-Yates and Ribeiro-Neto 1999; Frakes and Baeza-Yates 1992) for each chapter. An inverted file is like an index in the back of a book that lists index terms alphabetically together with the page numbers where they can be found. Instead of page numbers, the inverted file structure lists a document (chapter) identifier, together with the positions of the term in the document, the term frequency, the attribute (e.g., gender for names) of the term, etc.
When a name is searched for, we retrieve the name from the index stored in the inverted file to get location information of the word. We provide statistics data of the name, for example, how many times the name appeared in the Bible, the section, the book and the chapter. This could be an indication of the importance of the character since usually a most mentioned name is the most important character in a book.
Also using the index, the user interface provided a list of women and a list of men in the current chapter (right column in Figure 18, 4.3.2), which is dynamically updated when chapter is changed. Names can be highlighted once searched.