Formal Language

Formal language can describe sets of strings with mathematical rigour.

Some terminology:

  • alphabet = finite set of symbols
    • Some examples of alphabets include the set of binary digits , the set of decimal digits and the set of uppercase and lowercase letters
  • word = finite sequence of symbols drawn from a given alphabet
    • the empty word has 0 symbols
  • formal language = set of words over some alphabet

Some examples of formal languages are:

  • the set of binary strings of length 32 (i.e. MIPS words)
  • the set of prime numbers when written in decimal
  • the set of strings of letters that begin with the letter
  • The set of valid Scala programs
Make sure to understand the difference between symbol, word, alphabet, and language.

We can classify a language into a number of language classes:

  • Finite languages are finite sets of words.
  • Regular languages are sets of words that can be specified using the techniques that we will learn in this module.
  • Context-Free Language are sets of words that can be specified using more powerful techniques that we will learn in the next module.
  • Decidable languages are sets of words for which there exists an algorithm that decides membership, i.e., decides whether or not some given word is in the set. Decidable and undecidable languages are studied in depth in CS 341

Parsing and Scanning

Scanning = partitioning an input program (string of characters as a Regular Language) into a sequence of tokens, done through the use of DFAs Parsing = using a Parsing = using a Context-Free Language to recognize the tree structure of the program (ex: determining that a procedure is nested in another)