Skip links to main content
Logo of Geometric Image Processing Laboratory,
Computer Science Department Home
Technion Home Page

Projects Last Projects

Presentation Multi‑Speaker Conversational LLM

Adi Tsach

Supervised by Noam Rotstein

Abstract

At the time when this project was launched, most production voice interfaces for large language models (LLMs) collapse audio into text and then reason as if the input were typed. The act of transcription discards information that is crucial in natural multi‑party conversation: who is speaking, how they sound, when turns begin and end, and how emphasis and emotion evolve over time. This project designs and implements a speaker‑aware dialog system that treats audio as a first‑class signal. The central thesis of this work is that speaker and time aware interfaces are necessary to move beyond the single‑speaker voice assistant paradigm and accommodate complex multi-speaker conversations. We operationalize this thesis through a system that listens, segments, aligns, attributes, retrieves, and then reasons. The implementation is fully local (important for privacy in meetings) yet designed to be swappable: every stage (ASR, diarization, retrieval, LLM, TTS) can be replaced as better components appear.

Pictures
Project Presentation Multi‑Speaker Conversational LLM Picture 1
Project Report

Please, see project report.

Final Presentation

Please, see final presentation.

Demo

Please, see demo clip.

Copyright © 2016 by Geometric Image Processing Lab. All rights reserved.