A. ŠARAS
2025 to now · personal research now building

Hibiki LT ↔ EN: real-time speech translation.

Teaching a simultaneous speech translation model a language pair it has never heard: Lithuanian and English. On a laptop.

What it is

Kyutai's Hibiki translates speech to speech in real time, but only for the language pairs it was trained on. Lithuanian is not one of them. This project adapts Hibiki to LT ↔ EN end to end: dataset construction, synthetic speech generation, a custom fine-tuning fork, and fully on-device inference.

Everything runs locally on Apple Silicon through MLX. No cloud, no API, no audio leaving the machine. Current throughput is around 12.5 tokens per second on an M4, real-time factor 1.41x in batch mode.

The work

Concepts

Speech-to-speech Fine-tuning MLX Mimi codec Apple Silicon FastAPI Data curation

What it taught me

Adapting a frontier model to a low-resource language is 10% modelling and 90% building the dataset the model wishes existed.