We did explore using standard protocols in order to support third party headsets. The most natural solution would be to listen for AT+BVRA (voice recognition command), which most headsets generate after some button is held down for a couple seconds. It didn’t fit with our desired UX, though. We wanted a hold-while-talking UX, rather than hold for a couple seconds, wait for a beep, then release and start talking.
We thought about listening for AVRCP key events to detect when a certain button was pressed and released — probably play/pause, which seems to be the most prominent button on most headsets. It would have been hacky, though, and we ran into several problems. For one thing, a lot of headsets power off if the play/pause button is held down for several seconds.
We also had concerns about audio quality with third party headsets, especially those which didn’t support modern versions of SCO (which introduced new codecs with 16khz support and other improvements), or with poor antennas leading to high packet loss (SCO is a lossy protocol, so we still get some speech and attempt to translate it, but accuracy suffers). We were concerned that all accuracy problems would make Google Translate look bad, even if the headset was to blame.