Latest update Dec 22, 2022.

Assignment 3 - Some Simple Natural Language Processing

The task of this laboration is to implement some extremely rudimentary natural language processing. Your program will read strings from the console or a text file, interpret these strings as lists of words, and parse the lists of words as sentences. The sentences are to be represented in symbolic form, in order to make possible some high-level processing of them.

More specifically, we have the following restrictions:

Your system should not be case-sensitive: it must recognize words written in any mix of upper- and lower-case letters. However, sentences should be output in proper format, with the first letter in the first word capitalized and the rest in lower-case. Claims, when written by your system, should be terminated by a period, and questions with a question mark. Input sentences, however, should be recognized as claims or questions regardless of how they are terminated.

You decide the rules for how to transform a string into a list of words. However, your decision must be sensible, and you must be able to motivate it.

Now write a program that does the following:

  1. Reads a line from either the console, or a text file with lines containing sentences (your solution must be able to handle both, without recompilation or such),
  2. Transforms it into a list of words,
  3. Tries to interpret the list of word as either a claim or a question,
  4. If interpreted as claim, prints the corresponding question on the screen,
  5. If interpreted as question, prints the corresponding claim on the screen,
  6. If failing to interpret the list of words, then terminates. Otherwise repeats from 1.

Example: the input "kaka söker maka" should be interpreted as a claim, and the program should print the corresponding question "Söker kaka maka?". Vice versa, given a question the program should print the corresponding claim.

In all cases the program should terminate in an orderly fashion, with a relevant exit message. Interrupt-based termination (failwith, try - catch, etc.) will not be accepted, and the same goes for the error handling. Instead you should implement a value-based handling of errors and program terminatination, see the slides for Lecture F6.

Think carefully about how to structure your solution into different functions, and how to represent sentences. Just using strings throughout will not be accepted, instead you should have a high-level representation that captures the structure of the sentences.

Hints: System.Console.ReadLine() : string reads a line from the console. The properties .ToLower() and .ToUpper() on strings can also come in handy.


Björn Lisper
bjorn.lisper (at) mdu.se