Interactive Labeling for ML-based Structural Formula Extraction: Scaling Human Accessibility of Documents for Visually Impaired People

  • Subject:Joint Master’s thesis offered by IAR/SZS (research group Prof. Stiefelhagen, CV:HCI) and IISM (research group Prof. Mädche, ISSD) for both computer science and information systems students. Open for applications.
  • Type:Master Thesis
  • Supervisor:

    Merlin Knäble & Dr. Thorsten Schwarz (SZS)

  • Add on:

    Status: Open

  • Person in lab coat holding 3D-model of the chemical structure of some molecule.

Problem

Scientific publications, lecture slides, and other documents convey their information not only in plain text, but also in figures and images. This makes documents less accessible for humans and machines alike. Automated metadata extraction, full text search, or information aggregation is impacted by this. Less obvious, but potentially even more important, human accessibility is also hindered. Figures are often entirely incomprehensible for visually impaired users, but also people less accustomed with the domain could benefit from support. This fact limits access to e.g. graphical representations of structural formulas for the visually impaired. However, these graphics are often a crucial part of lecure slides or scientific publications on the topic.

Goals

The goal of this Master’s thesis is to design, develop and evaluate an interactive labeling system to support the accessibility of figures. Thereby interactive labeling refers to a human-machine cooperative approach, which combines automatic with manual steps. Structural formulas from the field of chemistry offer themselves as a context of application for this system, as they are frequently used and standards have already been well established. We envision a semi-automated approach, in which user input is supported by the machine. Well structured tasks like these suit themselves well to be supported by machine learning models. As a user is always involved, the model does not need to achieve near-perfect accuracy scores, but rather should support the users with suggestions. Allowing the model to improve with new user input would be a bonus.

In a first step we expect the student to identify the state of the art such systems, and identify components that could be re-used or adapted to this context. Afterwards the solution should be developed. A full-fledged evaluation of the system is expected as well.

The typical workflow for the system should look like the following:

  • Import a PDF document into the system.
  • The system suggests areas in which figures chemical formulas could be found.
  • Correct the systems suggestions.
  • Crop out all marked areas to obtain indidual figures.
  • For each figure create
    • a chemfig representation of the figure (e.g. “\chemfig{*6(=-=-=-)}”),
    • a non-informative textual description of the figure (e.g. “a hexagon where three edges are double lines”)
    • and an interpretation of the figure (e.g. “Benzene”).
  • The system supports the user in creation of above representations with automatically generated suggestions. Hereby a classifier from automatically generated training data that translates images to chemfig should be trained.
  • Export an accessible EPUB v3 where the original figure is augmented with above data as alternative versions.
  • Export a version of the figure for use on a braille printer (Open Document Graphic format).

Requirements

We expect the student to be familiar with web development. The system should be devloped with a modern web application frontend framework (e.g. Angular, React, or Vue) and a JavaScript or Python backend.

Contact

If you are interested in this topic and want to apply for this thesis, please contact Dr. Thorsten Schwarz (IAR/SZS) and Merlin Knäble (ISSD) with a short motivation statement, your CV, and a current transcript of records. Feel free to contact us beforehand if you have any questions.