Examining Llama 2's Propensity for Following the System Prompt

Large Language Models seem to 'forget' their system prompt over the course of a long conversation. Can we measure this effect?

Introduction

I’ll write this blog post after the paper is accepted. For now, here is a benchmark for system prompts I created: https://huggingface.co/datasets/Naomibas/llm-system-prompts-benchmark.

Edit 2/19: Paper has been put on arXiv! See here: https://arxiv.org/abs/2402.10962.