Automated Multimodal Knowledge Extraction from Publications in the context of Structural Equational Models

Deutscher Titel: "Automatisierte Multimodale Wissensextraktion aus Publikationen am Beispiel von Strukturgleichungsmodellen" im Rahmen der KIT Exzellenzinitiative "Forschungsdatenmanagement"


Silhouetten vor Videowand

Scientific output - or at least the number of publications - is growing exponentially, making a comprehensive overview of related work increasingly difficult. However, established guidelines for good research practice highlight the importance of acknowledging prior art. 

The goal of our project is to develop a platform to effectively extract, explore, and aggregate knowledge from scientific publications. Specifically, we plan to follow a multimodal learning approach in combination with natural language processing and computer vision techniques in order to automate the extraction of knowledge captured in scientific publications. The projects is organized in the four work packages:

  1. Provision of annotated data set (publications with manually labelled data)
  2. Feature Engineering
  3. Selection and comparison of algorithms
  4. Prototypical realization and evaluation of knowledge extraction platform 


We choose the context of Structural Equation Models (SEMs). SEMs were primarily developed for application in psychology and related social sciences. They are nowadays established in various economic fields and information systems, where our research group is at home. In this field SEMs consist of a measurement model and a strutural model. Herby the former describes the relationships between latent and observable variables, while the latter deals with interrelations of latent variables. Both have esatblished visualizations and thus fit our goal of automated knowledge extraction well. We target to aggregate the extracted knowledge systematically, to enable researchers to explore previous SEMs and the underlying theories, in order to appropriately integrate this knowlege into their own work.

Prior Work

In line of this work, the Digital Scientific Knowledge Network (DISKNET) was created. On a preliminary basis it already facilitates systematic accumulation of knowledge from the information systems community, centered around contructs from SEMs. The project as of today produced our freely accessible platform and a publication.