Bachelor and Master Theses

To apply for conducting this thesis, please contact the thesis supervisor(s).
Title: Handling confidential data in LLM prompts
Subject: Software engineering, Computer science
Level: Advanced
Description:

The advent of LLMs has disclosed a myriad of opportunities related to automation tasks. The most powerful LLMs require very extensive computational resources, which are provided at the price of paid subscriptions and sharing of data (both the prompts and potential feedback are retained by the provider). Companies and more in general users that want to use LLMs must therefore take into account that prompts shall not include sensitive data, which otherwise will be made publicly available. There exist paid subscriptions that guarantee no usage of prompts data, however this alternative might be not desireable to, e.g., explore the potential use for selected tasks. Another alternative, which can be considered as the default one in many research works, is to use a freely available model to be run locally. As already mentioned, this latter alternative is limited to the computing power available; moreover, in general the most powerful models are not released for free usage, so the result quality might be limited. Some solutions based on homorphic encryption have proposed to use LLMs on encrypted data [1]. Nonentheless, the computation overhead to handle such formats is still overwhelming due to the size of the models.

This thesis explores a third possibility: masking/filtering confindential information in the prompts to still use the full power of publicly available LLMs. This mechanism shall be transparent to the user:

- the user configures what information shall be kept confidential;

- a masking layer is responsible of both automatically filtering/transforming the input data to be sent to the LLM and also to reconstruct the "non masked" result once obtained the output from the LLM.

The work includes:

- exploration of the state of the art;

- selection of a software engineering relevant scenario (e.g. requirement analysis, code summarization, debugging, etc.);

- selection of an appropriate dataset to test and validate the proposed solution.

1. Rho, Donghwan, Taeseong Kim, Minje Park, Jung Woo Kim, Hyunsik Chae, Jung Hee Cheon and Ernest K. Ryu. “Encryption-Friendly LLM Architecture.” (2024). https://doi.org/10.48550/arXiv.2410.02486

Start date: 2025-01-01
End date: 2025-06-08
Prerequisites:

- knowledge of LLMs is required;

- knowledge of Python is required;

IDT supervisors: Antonio Cicchetti
Examiner:
Comments:

This thesis is also suitable for two students

Company contact:

This work is done in collaboration with Riccardo Rubei riccardo.rubei@mdu.se