TTS providers return audio in different ways. Streaming chunk sizes may differ, and the amount of silence prefixing meaningful audio may differ. This tool facilitates a fair comparison by measuring the time to first byte, then running voice activity detection on the generated audio to calculate silence length. The result (time to first real audio) is what users will experience.