author: Michael Mulder
title: Ye Olde Blazon Compiler
keywords: domain-specific language, compiler construction
topics: Case studies and Applications , Languages , Software Technology
committee: Vadim Zaytsev
started: November 2020
end: January 2021

Description

If you like compiler construction and medieval heraldry, read on…

When software engineers are making compilers in practice, they are not always following the path prescribed by compiler books: from a textual representation of an algorithm to executable machine code. In fact, making such compilers is quite niche work in which one either becomes an expert and makes one's entire carreer out of it, or not comes into any contact with it at all. However, what is much more popular and widespread, is compilers or translators of software languages [2,3,4], which is a broader term covering besides programming languages also modelling languages like UML, markup languages like MediaWiki, query languages like SQL, and so forth, up to and including domain-specific languages [1,6], which are specific to a particular domain, like banking, equation solving, data science, rendering, etc [4,5,6]. Quite often these languages, in order to fit their intended end users' needs, have to be somewhat unconventional, like having a graphical notation instead of or in addition to a textual one, or being integrated into an existing framework, or being semi-structured.

Semi-structured languages are those that impose some structure on their sentences, but do not define them fully. (Think JSON instead of SQL). For example, if a structured language would only allow the user to write something like “while (x>0) x = f(x);”, and refuse the same sentence even if only a semicolon is missing from it, a semi-structued language can permit sentences like “as a user, I can backup my data”. where "as a" and "I can" carry some special meaning, but the rest can be anything, even though the supporting infrastructure will definitely utilise the noun that comes after "as a" to add it to the list of all possible users/stakeholders, and perform other checks and actions. Examples of semi-structured languages are legalese (where freeform English text is filled with verbs like "must" and "should not" and nouns like "the customer" or "the tenant" which carry concrete meaning to judges and lawyers) or the language for user stories (used as an example in the last sentence).

So, if heraldry is considered as a problem domain, then as it turns out, there is a language for blazonry, linking textual semi-structured descriptions of coats of arms to their visual representation. The language is based on English, but with an uncommon word order in some places, and with French words dominating its vocabulary. In this language, for example, "Azure, a bend or" means that the shield is a special tint of blue and it is split diagonally with a broad golden line running from the top left to the bottom right corner. Similarly, "Azure, a bend sinister or" means the same shield with a golden line going from the top right to the bottom left corner, because "sinister" is a modifier word that horizontally flips the figure ("charge") that precedes it. For animal figures, "displayed" means spread wings, "passant" means walking, etc.

The project comprises the following steps:

  • Domain analysis done on the domain of heraldry/blazon, to extract the core vocabulary for tinctures, divisions, charges, attitudes, etc.
  • Design of the blazon language, based on the domain model (minor simplifications are allowed).
  • Construction of a semi-structured parser that extracts as much blazon information from a user-given text as possible, and gives guiding error messages about unclear parts.
  • Construction of a static checker that rejects certain combinations (e.g., "nowed", or knotted, is an "attitude" that can only be used on something with a long tail — a bear or a fish cannot be knotted; "volant", or flying, can only refer to birds or dragons — a deer or a tree cannot spead their wings; "dormant", or sleeping, can only be used on live subjects — a crown or a severed hand cannot be sleeping; etc).
  • Construction of a code generator that produces an SVG file combining existing elements from a built-in library into one picture according to the best possible interpretation of the given blazon description (the elements themselves do not need to be made, Wikimedia Commons already has hundreds of those in open access).
  • Lightweight validation of the result by composing several blazon specifications from existing sources, and uploading compiled results to Wikimedia Commons.

To conclude, this project will allow you to go through all phases of a typical real software (language) engineering process, but on a small feasible scale, while being guided by someone with first hand experience in both industrial software engineering and compiler construction. The implementation platform is up to the student, but since this project will ideally result in a live website and run in a browser, it is advisable to choose a language that is strongly typed and compiles to JavaScript, like Dart, Kotlin, TypeScript, Elm or nim.

Supervision will be done by dr. V. Zaytsev aka grammarware in English, Dutch, Russian, or any combination thereof, chosen by the student.

References

  1. SLEBoK, domain-specific language, 2017.
  2. SLEBoK, software language, 2017
  3. SLEBoK, software language engineering, 2017.
  4. R. Lämmel, Software Languages: Syntax, Semantics, and Metaprogramming, 2018.
  5. F. Tomassetti, V. Zaytsev, Reflections on the Lack of Adoption of Domain Specific Languages, OOPSLE@STAF, 2020.
  6. M. Völter, S. Benz, C. Dietrich, B. Engelmann, M. Helander, L.C.L. Kats, E. Visser, G. Wachsmuth, DSL Engineering: Designing, Implementing and Using Domain-Specific Languages, 2013.