logo.jpg (7978 bytes)

wpe1F.jpg (5803 bytes)

I made these projects while studying at uni. NLP is a fairly unexplored area in computing with many opportunities becoming available as technology improves.
XML Text Summariser

PERL


The perl program ‘summary’ takes 2 arguments, input XML file and a summarisation percentage (of total sentences).

eg. % summary 9405001.sent 10 (produces a summary 10% of original size)

The program uses keywords, cue phrases, bonus/stigma words, stop words and position in paragraph to perform the summarisation.

File included are:

  • summary – Perl main program script.
  • stemming.pm – Perl implementation of Porter’s Algorithm for word stemming.
  • stop_words – A list of stop words used by summary.

Full program documentation available with download.

Download here [15kB]


Data Extraction

PERL


The program uses queues to determine possible answers for each field – such as keywords, position in text, text format, zoning, etc. (For complete details see documentation or source code). This project received top marks.

Program overview:

The basic program flow is as follows:

  1. Check program arguments and directories.
  2. Then for each file:
  3. Extract mail header and subject line.
  4. Split the file between paragraphs.
  5. Then for each paragraph:
  6. Try to extract relevant fields using the subroutines
  7. Fields not found, run a backup routine to find more possible answers
  8. Write file to results directory in format as specified.

Download here [361kB]

Morphology

Prolog


This program simulates the morphographemic processes of productive English morphology, The program tries to cover epenthesis, gemination, elision, k-insertion and i-to-e replacement.

The program works both ways - adding morphemes and removing them.

Documentation available with download.

Download here [8kB]

Definative Clause Grammar recogniser

Prolog


The program contains a definative clause grammar for a range of sentences to talk about flight movements.

Sentences are parsed to check if they meet the required grammar constraints.

To see full description on linguistic phenomena covered by the program and complete documentation download the archive.

Download here [12kB]

Natural Language database query system

Prolog


This program builds a natural language interface to a database containing information on scheduled flights.

A definative clause grammar is used to cover various types of questions one would ask about air flights.

eg. How many flights are there from Melbourne to Sydney between noon and 3am?
eg What flights can I take to Sydney from Melbourne on Saturdays before 10:00?

Full program description and documentation available with download.

Download here [43kB]