title: Improving domain connectivity through graph analysis
company: ING
keywords: Code generation, graph analysis, connectivity, path-finding
topics: Algorithms and Data Structures , Case studies and Applications
committee: Hajo Broersma ,
Arend Rensink

Description

ING

We are a large, international financial services provider based in The Netherlands. As we are a highly digitalized company, we also have lots of IT systems that we design, create, implement and maintain. As you can imagine this quite an effort and we always seek ways to improve this.We aim at reducing our turnaround time, having a means to improve our time-to-market. At the end of the day, this can be a way to save time and money, freeing resources to be spent at more interesting engineering challenges.

Context

A class of software systems are the so-called middleware components, pieces of software that act as a bridge between other systems. Interestingly, these components also share a lot of commonalities.

As part of a research project we have a means of generating software components limited to these middleware components. In order to do so, we require large domain models to be available. The more detailed the models, the easier it is to generate the aforementioned software components. These models are essentially (hyper)graphs, in which vertices denote type information and hyperedges describe operations that are possible with these types. This makes the graph a reflection of a (partial) business domain.

Any business has separate “subdomains” that are either barely related, or not at all related, to other subdomains. For example, a legal department might be almost completely unrelated to the finance domain. In the aforementioned hypergraph this effect can arise as having various disconnected components. Or, when there are just a small number of connections between “subdomains”, by having a number of clusters within the graph (possibly found with graph clustering algorithms).

Challenge

We use this hypergraph to automatically discover paths throughout given a desired start and requirement. If such a path exists, we can generate a middleware component. We are interested in the case where we cannot find a path, for example because there is no connection between two disconnected components or because two clusters are not connected tight enough. This poses a problem, since software generation then becomes impossible.

However, if we can add additional edges to the graph that could be sufficient to provide us with a path. Looking from a conceptual perspective, this sometimes is also a viable option.

Assignment

We would like you to provide us with an algorithm to make best-effort suggestions in introducing edge(s) between disconnected components or graph clusters, making aforementioned pathfinding a more viable solution. These suggestions should take the semantics of the types (and therefore implicit business concepts) into account. We want a set of suggestions about where it might be viable to introduce new hyperedges in order to increase the connectedness of the graph.

This algorithm could be steered by a (number of) heuristic(s) that can be used to evaluate the graph and its traversal context, in order to make suggestions as to where new edges can be introduced. Of course, this happens on an abstract level, but because this model is derived from real domain context, we would like to use this to prioritize work on existing systems as to extend capabilities of these systems.

As an example, a related problem would be a nationwide power grid where we need to expand the capacity of the grid. Due to geographical constraints like a river, it might be either too costly or even impossible to introduce arbitrary new powerlines. So, a heuristic must be used in order to suggest “best” opportunities to extend the network.