Below you can find a list of topics that I am currently offering for MSc thesis work. Some of these topics can potentially become a topic for BSc thesis or a topic for a PhD. In general, we are interested in the following areas: Description Logics, Ontologies, Web Ontology Language OWL, Graph Databases, Query Languages for Graph Databases, Logic Programming, Answer Set Programming, Knowledge Graphs, Constraint Languages (for Graph-structured Data), W3C SHACL, Semantic Web, Reasoning about Actions & Change, Knowledge Representation & Reasoning (KR&R), Automata over Infinite Words and Trees, Combining Machine Learning with Symbolic Methods, Explainability of Reasoning Outcomes.
If you find one of the below topic interesting, or if you want to propose a new topic related to the above areas, then you are welcome to reach us by email to mantas.simkus@tuwien.ac.at.
Recent AI research has produced two powerful yet orthogonal families of methods for modelling and performing sophisticated computational tasks on graph-structured data, which is becoming increasingly popular and prominent (see Knowledge Graphs, Graph Databases, the Semantic Web standards). The first family is the so-called Graph Neural Networks (GNNs), which is a Machine Learning technique for various classification and prediction tasks on graphs. The second family are various logics based on the so-called two-variable fragment (FO2) of first-order logic. This family, which we call 2VLs, includes several prominent formalisms designed to model and reason about graph-structured data, like verious Description Logics (DLs), W3C Web Ontology Language (OWL), and W3C Shapes Constraint Language (SHACL). While GNNs and 2VLs share the underlying graph data model and purpose, the are also complementary in several ways: (a) explainability of 2VLs vs. black box nature of GNNs, (b) exact reasoning in 2VLs vs. approximate inferences in GNNs, (c) computational intractability of 2VLs vs. efficiency of GNNs, and (d) manual representation of domain knowledge in 2VLs vs. automated learning of representations in GNNs.
The main goal is this project is to obtain methods and techniques for reconciling 2VLs and GNNs, aiming to obtain hybrid frameworks that have the benefits and avoid the drawbacks of the two ingredients. Specifically, the first concrete goal is to find ways in which GNNs can be used to efficiently approximate 2VL-based logic reasoning on graph-structured data, i.e. speed up reasoning while loosing some precision guarantees. The second concrete goal is to develop methods that allow a logic reasoner to consult a GNN in order to guide the automated reasoning process. This is particularly interesting for inference problems in 2VLs that are expensive due to non-deterministic computations.
Machine Learning techniques based on Reinforcement Learning (RL) have shown the potential to yield extremely capable AI systems (e.g., by playing Go and Atari games at a professional level), and their capabilities are likely to have a profound impact on society. A key task in the development of a RL system is to design a reward function for an AI agent. However, currently reward functions are often developed and represented in ad hoc ways (e.g. using general purpose programming languages), which makes it difficult to inspect them, to explain them and ensure that they properly align with the values and expectation of the ultimate users of the system. Moreover, as the agent’s capabilities and freedoms expand, the number of possible states of the world that need to be taken into account in the reward function explodes exponentially. This complexity makes it difficult for the developers to exhaustively and adequately penalize all dangerous agent’s actions.
The main goal of this project is to develop new knowledge representation languages and associated reasoning techniques that are specifically geared towards representing and reasoning about reward functions, thus providing a safer and more convenient alternative to the ad hoc error-prone representations currently employed. The developed logic-based methods should allow to write down reward functions that can be inspected for defects by means of automated inference, as well as shared and reused by multiple parties (thus facilitating the establishment of trust).
Virtual Knowledge Graphs (VKGs), also known as Ontology-based Data Access (OBDA), represent a powerful approach to data integration [Xiao et al., 2019; Schneider and Simkus, 2020]. In OBDA, an ontology serves as a conceptual model for the domain of interest, offering a unified view of heterogeneous data sources. Leveraging Description Logics (DLs), OBDA systems use automated reasoning to infer missing information and address data incompleteness.
A key component of OBDA is the mapping framework, which links data sources to the ontology. In traditional OBDA, information flow is unidirectional: data flows from sources to the ontology through mappings, and reasoning occurs solely at the ontology level without modifying the original data sources.
This work explores an enhanced OBDA framework that enables bidirectional information flow, where inferences made at the ontology level are propagated back to the data sources. This approach could provide a more powerful data integration framework, though it may introduce additional computational challenges. Our objectives include formalizing this bidirectional OBDA framework, studying query answering in this setting, and developing a prototype system to demonstrate its feasibility.
[Schneider and Simkus,2020] Schneider,T. and Simkus,M.(2020).Ontologies and datamanagement: A brief survey. Künstliche Intell., 34(3):329–353.
[Xiao et al., 2019] Xiao, G., Ding, L., Cogrel, B., and Calvanese, D. (2019). Virtual Knowledge Graphs: An Overview of Systems and Use Cases. Data Intelligence, 1(3):201–223.
Finite state automata that run over infinite words are an important model of computation and a key tool for reasoning about dynamic systems. Roughly speaking, a dynamic system is a system whose state may evolve over time, possibly indefinitely. For instance, the system state of an automated agent may evolve over time to due to the actions the agent makes as well as due to the changes in the environment or other agents’ actions. There are several logic- based formalisms for modeling and automated reasoning about dynamic systems, like various temporal logics and variants of Propositional Dynamic Logic (PDL). Finite state automata over infinite words play a key role in these and related formalisms, since they allow to obtain effective and often worst-case optimal algorithms for reasoning in these logics. For instance, often the formula satisfiability problem in a given temporal logic can be formulated as the problem of checking whether a specially crafted automaton accepts an infinite word. Thus obtaining efficient methods for reasoning about automata over infinite words is a highly relevant research problem: if we obtain such methods for automata, they can be applied to enable efficient reasoning in multiple other relevant settings.
The goal of this work is to study the possible use of Answer Set Programming (ASP) to enable efficient reasoning about automata over infinite words. Specifically, the aim is to find translations from automata into logic programs so that efficient ASP solvers can be used for reasoning about automata. For instance, checking the existence of an infinite word that is accepted by an automaton should correspond to the existence of an answer set for a specially constructed ASP logic program. Since standard ASP solvers handle finite structures only, it is not obvious and challenging to understand how reasoning about infinite words can be enabled in the context of ASP. Fortunately, an accepting run of automata over infinite words can often be witnessed by an accepting lasso, which can be finitely represented but may still be large. Thus in this work we plan to explore how ASP can be used to generate (hopefully short) accepting lassos for automata over infinite words. The specific models of automata that we would like to cover are Alternating Automata over Infinite Words (AWAs) and 2-Way Alternating Automata over Infinite Words (2AWAs). Studying generalizations of the developed methods to automata over infinite tree is also of great interest.
The two-variable fragment FO2 of first-order logic consists of theories that allow at most two variables in formulas. This fragment is especially suitable for modeling and reasoning in the context of graph-structured data, as it allows to easily talk about nodes and edges of a graph. In fact, FO2 subsumes many existing Description Logics (DLs) [Borgida, 1996], which further witnesses the importance of this fragment. Satisfiability testing entailment in theories of FO2 is decidable and NEXPTIME-complete, but implementations of sound and complete algorithms for reasoning in this fragment are largely not available.
The goal of this work is to understand how efficient reasoning in FO2 can be implemented. We will start with the mosaic-based algorithm in [Rudolph and Simkus, 2018] which was in fact not intended for implementation (but rather just to show a complexity upper bounded for a more expressive triguarded fragment). The goal is to implement an optimized version of the mosaic-based algorithm by a translation into Answer Set Programming (ASP), i.e., we want to devise an ASP program that would non-deterministically construct a mosaic for an FO2 formula, if such a mosaic exists. This task is not trivial because mosaics might grow exponentially in the size of the input formula. We need a smart way to ensure that we can spot formula unsatisfiability as early as possible, so that exponentially sized mosaics can be avoided in most cases.
[Borgida, 1996] Borgida, A. (1996). On the relative expressiveness of description logics and predicate logics. Artificial Intelligence, 82(1):353–367.
[Rudolph and Simkus, 2018] Rudolph, S. and Simkus, M. (2018). The triguarded fragment of first-order logic. 22nd International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR-22).
Description Logics (DLs) are family of formalisms for writing ontologies. DLs provide the logic-based foundations to the W3C standard OWL, which an accepted standardized ontology language for the Semantic Web. Unfortunately, reasoning in expressive DLs like ALC is computationally expensive, and thus methods to reduce the computational costs are of great interest.
The goal of this work is to study how various notions of structural decompositions (like tree decompositions that yield small treewidth) may be applied in order to improve the performance of DL reasoners. Some initial work on this was presented in [Simancik et al., 2014] in the context of consequence-based reasoning procedures for the DL ALCI. Another approach can be found in [Zhao, 2022], where the notion of modules is used. Overall, this area is largely unexplored. In this work, we will explore how structural decompositions can be used to improve automata-based procedures for reasoning in DLs. It may be worth starting with decomposition techniques for alternating word automata with succinctly represented alphabets, which should already provide significant insights into the area.
[Simancik et al., 2014] Simancik, F., Motik, B., and Horrocks, I. (2014). Consequence-based and fixed-parameter tractable reasoning in description logics. Artif. Intell., 209:29–77.
[Zhao, 2022] Zhao, H. (2022). Atomic Decomposition-based Ontology Classification. PhD thesis, University of Manchester.
Description Logics (DLs) are logic-based formalisms for Knowledge Representation and Rea- soning (KR&R). DLs provide the formal foundations to the W3C standard OWL, which offers a standard machine-readable and machine-interpretable language for describing resources on the Web. DLs allow to conveniently describe graph-structured data and are closely related to the increasingly popular Knowledge Graphs (KGs). The logic-based foundations of DLs allow us to create automated reasoners that can automatically infer non-trivial and useful new knowledge from large and possibly very complex data sources.
Answer Set Programming (ASP) is a well-established area of Logic Programming. Very efficient ASP systems are available, which makes it promising to reuse such systems also for automated reasoning in DLs. Several previous works have proposed methods to automatically translate (compile) a DL knowledge base into an ASP program, which in turn can be efficiently evaluated using an existing reasoner. The previous approaches however suffer efficiency problems, which this project aims to address. See [Bajraktari et al., 2018] for an example where a translation into ASP is considered. By “data-independent translation” we mean a translation that converts the TBox of a DL knowledge base into an ASP program independently of the ground facts in the knowledge base: in this way, when the data changes, we do not need run the translation again.
The goal of this master thesis is to present a new method for translating DL knowledge bases into ASP programs by using the so-called abductive reasoning [Eiter and Gottlob, 1995, Homola et al., 2023] and techniques related to conflict-driven clause learning (from the area of SAT solving).
[Eiter and Gottlob, 1995] Eiter, T. and Gottlob, G. (1995). The complexity of logic-based abduction. J. ACM, 42(1):3–42.
[Homola et al., 2023] Homola, M., Pukancova, J., Boborova, J., and Balintova, I. (2023). Merge, explain, iterate: A combination of MHS and MXP in an abox abduction solver. In JELIA 2023.
Ontologies expressed in various Description Logics are a useful source of formally represented domain knowledge. For instance, the medical ontology SNOMED CT, which is used by the healthcare systems of several countries, provides a formalization of over 300.000 medical terms. The importance of ontologies is also witnessed by the W3C ontology language OWL, which is an industry standard for writing and sharing ontologies in the Semantic Web. Ontologies however are often too broad when it comes to the development of applications for specific and limited purposes. In particular, there is a need for (automated) methods to obtain, given a broad ontology as input, a more restricted knowledge base that is limited to the knowledge relevant to the specific application in question. More precisely, there is a need to automatically extract, from a given ontology, a smaller collection of integrity constraints, that can be used for managing data in the target application.
In this work we will study ways to extract specifically the so-called SHACL constraints from an ontology, possibly from an OWL ontology. SHACL is a new standard of W3C for expressing constraints on graph-structured data, and specifically on RDF data. The SHACL standard provides a way to greatly enhance and speed-up application development in the context of graph-structured data. However, obtaining SHACL constraints for a concrete application is not straightforward as it requires efforts in modeling the application domain. Fortunately, ontologies are a good source from which SHACL constraints can be harvested and thus the creation of SHACL constraints can be largely guided by existing ontologies. Our goal in this work is to automate the derivation of SHACL constraints to the largest extent possible. One possible direction is to apply learning techniques to learn the constraints from the ontology and some small set of example data.
For this topic we first need to understand the foundational aspects of transforming ontologies into (SHACL) constraints. For this, some ideas from module extraction, inter polation, forgetting, computational learning theory might be needed. In the second step, some implementation to test and evaluate the ideas would be desirable.
One of the standard reasoning problems in Description Logics (DLs) is the problem of answering conjunctive queries (CQs) over knowledge bases. A knowledge base K = (T , A) consists of a DL terminology T (or more generally, a TBox ) coupled with a collection A of facts (often called, an ABox). CQs are a basic query language corresponding to the select-project-join queries of SQL. It is known that answering such queries over knowledge bases in expressive DLs like ALCI is NP-complete in data complexity, i.e., in the case when only the size of the ABox is assumed to vary, while the TBox is considered to be fixed.
While being rather expressive, CQs lack the navigational features of modern query languages for querying graph-structured data. By queries with navigational features we mean extensions of CQs with regular expressions over binary predicates. Such features are standard in the SPARQL query language (via property paths) as well as in query languages for graph databases. The goal of this work is to prove that navigational queries over knowledge bases written in expressive DLs of the ELI family are NP-complete when the so-called closed predicates are supported, i.e. we want to prove that adding navigational features to CQs does not increase the data complexity. This work will involve the development of a query answering algorithm that runs in nondeterministic polynomial time in the size of the input ABox, including a rigorous proof of its correctness and an analysis of its complexity. An implementation of some limited cases by means of a rewriting into other query languages is also a possibility.
Integrity constraints are essential in database systems for ensuring data quality and maintaining the expected structure of stored information. In the context of graph-structured data, the new W3C standard SHACL provides a framework for specifying integrity constraints on RDF graphs. However, when a data graph violates these constraints, identifying and resolving the violations can be challenging.
The objective of this thesis is to develop a tool based on Answer Set Programming (ASP) to generate candidate updates for the graph database. This tool aims to facilitate a semi-automated approach to repairing constraint violations, improving both efficiency and accuracy in maintaining data integrity.
This topic is offered in cooperation with Shqiponja Ahmetaj.
The use of Machine Learning (ML) for classification and prediction tasks in graph-structured data, such as Knowledge Graphs and Graph Databases, is becoming increasingly widespread. Graph Neural Networks (GNNs), in particular, have emerged as popular tools for these tasks. However, GNNs often function as black-box models, making their outputs difficult to interpret. This lack of explainability limits their applicability in contexts where transparency, trustworthiness, and interpretability are critical.
Recent research has revealed a strong connection between GNNs and Description Logics (DLs)—logic-based knowledge representation languages well-suited for developing transparent and explainable reasoning methods for graph-structured data. This connection provides an opportunity to create DL-based classifiers for graphs that are both highly explainable and competitive with GNNs in terms of performance.
The objective of this thesis is to investigate approaches for learning classification tasks over graphs in the context of Description Logics, exploiting their connection to GNNs. A promising avenue involves the development of methods for learning symbolic classifiers (e.g., in the form of DL TBoxes) using SAT solvers.
This topic is offered in cooperation with Sanja Lukumbuzya and Magdalena Ortiz.
The W3C consortium has recently introduced SHACL, a language for specifying integrity constraints for graph-structured data on the Web (specifically for RDF graphs). Given its relevance, SHACL has attracted attention in the database theory and data management communities. One of the key tasks in managing graph data is ensuring data consistency when handling a high volume of updates, either from users or transactions from applications.
The objective of this thesis is to develop automated reasoning methods for the static analysis of transactions over a graph database. The method and tool developed should allow for checking whether a given transaction preserves the integrity constraints expressed in SHACL. This work will involve designing algorithms and some implementation, potentially using existing theorem provers. In particular, the starting point will be the initial work [1] in the context of constraints expressed in Description Logics, which is a family of formalisms closely related to SHACL.
[1] Shqiponja Ahmetaj, Diego Calvanese, Magdalena Ortiz, and Mantas Šimkus. 2017. Managing Change in Graph-Structured Data Using Description Logics. ACM Trans. Comput. Logic 18, 4, Article 27 (October 2017), 35 pages. https://doi.org/10.1145/3143803
This topic is offered in cooperation with Shqiponja Ahmetaj.
Description Logics (DLs) are formal frameworks for Knowledge Representation and Reasoning (KR&R). They form the foundation of the W3C standard OWL, a machine-readable and machine-interpretable language for describing web resources. DLs are particularly suited for modeling graph-structured data, making them closely related to the increasingly popular Knowledge Graphs (KGs). With efficient automated reasoners, DLs enable the inference of complex, non-trivial knowledge from large datasets.
Standard DLs adopt the "classical semantics" of First-Order Logic (FOL). While powerful, this approach is unsuitable for some applications, prompting the development of alternatives like the W3C standard SHACL. Recently, we proposed a "stable model" semantics for DL terminologies, which introduces features such as negation by default while preserving many of the computational advantages of classical semantics [1]. DL terminologies, consisting solely of concept definitions (e.g., "A student is a person attending a university"), benefit from this new semantics but remain computationally demanding.
In a separate context, we introduced a "well-founded semantics" for SHACL, which balances semantic rigor and computational efficiency for RDF graphs [2]. The primary aim of this work is to adapt these results to DL terminologies, providing a framework that combines the advantages of both OWL and SHACL. Subsequent steps include developing algorithms and analyzing the computational complexity of reasoning with well-founded semantics in DL terminologies. For example, we hypothesize that satisfiability and concept implication in ALC terminologies are computable in single exponential time.
This foundational research aims to bridge the gap between the W3C standards OWL and SHACL, offering a unified and efficient framework for reasoning in diverse KR&R applications.
PG-Types is a new ISO standard that provides a basis for integrity constraints over Property Graphs (like graph databases of Neo4j). It provides a concise and expressive way to specify valid node types within a graph database. These node types can be directly used to identify illegal nodes or serve as building blocks for more advanced integrity constraints expressed through PG-Keys.
A key feature of PG-Types is type inheritance, enabling the creation of compact and human-readable type definitions. However, its inheritance mechanism does not support overriding, meaning that an object of type T must satisfy all requirements of its super types. While this provides simplicity, it can be overly restrictive and limits the ability to model exceptions. For example, rules such as "every person must have a personal number, unless they are a recent immigrant" cannot be naturally expressed.
The goal of this work is to extend PG-Types with specificity-based overriding. This will involve formalising the semantics of such an extension, understanding the computational complexity of various task, and possibly include an implementation of selected algorithms.