
Beberapa waktu yang lalu, kami menulis serangkaian artikel tentang cara mengukur kualitas sistem pengenalan ucapan dengan benar, dan benar-benar mengambil metrik dari solusi yang tersedia (seri artikel - 1 , 2 , 3 ) (pada saat itu, baik komersial maupun non- solusi komersial). Di Habré ada ekstrak dari siklus ini dalam kerangka artikel ini , tetapi tidak ada pembaruan skala besar dari studi yang layak dipublikasikan di Habré (ini membutuhkan setidaknya banyak usaha dan persiapan).
Beberapa waktu telah berlalu dan inilah saatnya untuk memperbarui penelitian kami, menjadikannya ultimatum yang sesungguhnya. Berikut ini telah diubah atau ditambahkan dibandingkan dengan studi sebelumnya:
- Banyak set validasi telah ditambahkan dari domain nyata yang berbeda;
- , ;
- , ;
- (, );
- , - "", "";
(. ) :
-
wav
( PCM); - 8 ( , );
- - -, "" , , ;
- — WER. 20% WER, 5% WER ( , );
- 1 . 2-3 ( "" ). 500 !;
- ( , " "), ;
- , . 1 .. WER, ;
-
ogg/opus
, , , "" ; - (8 16 kHz), ;
, Silero bleeding egde, production . — WER ( WER ).
| Ashmanov | Sber | Sber | Silero | Silero new | Tinkoff | Yandex | |||
|---|---|---|---|---|---|---|---|---|---|
| default | enhanced | IVR | prod | bleeding edge | |||||
| 10 | 11 | 10 | 7 | 7 | 6 | 8 | 13 | ||
| 35 | 24 | 6 | 30 | 27 | 27 | 14 | |||
| 24 | 39 | 41 | 20 | 16 | 11 | 15 | 13 | ||
| () | 47 | 16 | 18 | 22 | 32 | 13 | 12 | 21 | 15 |
| 28 | 27 | 24 | 18 | 14 | 12 | 20 | 21 | ||
| () | 31 | 37 | 37 | 24 | 33 | 25 | 24 | 23 | 22 |
| 31 | 36 | 37 | 26 | 21 | 22 | 25 | 21 | ||
| 22 | 60 | 54 | 19 | 24 | 20 | 28 | 22 | ||
| 24 | 61 | 40 | 26 | 18 | 15 | 27 | 23 | ||
| () | 42 | 49 | 8 | 41 | 27 | 52 | 18 | ||
| 62 | 30 | 32 | 24 | 28 | 39 | 35 | 28 | 25 | |
| (e-commerce) | 34 | 45 | 43 | 34 | 45 | 29 | 29 | 31 | 28 |
| 34 | 29 | 29 | 31 | 20 | 20 | 31 | 29 | ||
| Yellow pages | 45 | 43 | 49 | 41 | 32 | 29 | 31 | 30 | |
| () | 43 | 55 | 59 | 41 | 67 | 38 | 37 | 33 | 32 |
| YouTube | 32 | 50 | 41 | 34 | 28 | 25 | 38 | 32 | |
| () | 44 | 72 | 66 | 46 | 41 | 35 | 38 | 35 | |
| 50 | 37 | 40 | 50 | 35 | 33 | 42 | 38 | ||
| 61 | 68 | 68 | 54 | 41 | 32 | 43 | 42 | ||
| , | 54 | 70 | 60 | 61 | 43 | 41 | 56 | 54 | |
| 39 | 50 | 53 | 32 | 25 | 20 | 27 |
WER, .
( , , , - ). . ( , ).
| Ashmanov | Sber | Sber | Silero | Tinkoff | Yandex | |||
|---|---|---|---|---|---|---|---|---|
| default | enhanced | IVR | ||||||
| 0% | 0% | 0% | 0% | 0% | 5% | 4% | ||
| 0% | 2% | 0% | 0% | 4% | 0% | |||
| 1% | 12% | 13% | 6% | 0% | 2% | 1% | ||
| () | 0% | 0% | 0% | 1% | 0% | 0% | 7% | 0% |
| 0% | 1% | 0% | 0% | 0% | 2% | 0% | ||
| () | 0% | 0% | 0% | 2% | 0% | 0% | 6% | 0% |
| 0% | 8% | 10% | 4% | 0% | 4% | 0% | ||
| 0% | 22% | 6% | 2% | 0% | 1% | 0% | ||
| 0% | 19% | 2% | 3% | 1% | 4% | 0% | ||
| () | 0% | 12% | 0% | 0% | 1% | 0% | ||
| 0% | 2% | 3% | 1% | 1% | 0% | 5% | 1% | |
| (e-commerce) | 0% | 0% | 0% | 7% | 1% | 0% | 7% | 0% |
| 0% | 0% | 0% | 1% | 0% | 4% | 0% | ||
| Yellow pages | 1% | 13% | 9% | 14% | 0% | 2% | 2% | |
| () | 0% | 0% | 7% | 35% | 9% | 0% | 5% | 0% |
| YouTube | 0% | 13% | 1% | 6% | 0% | 1% | 0% | |
| () | 1% | 33% | 12% | 17% | 5% | 1% | 1% | |
| 0% | 1% | 0% | 7% | 0% | 6% | 1% | ||
| 3% | 26% | 28% | 25% | 0% | 2% | 4% | ||
| , | 2% | 19% | 3% | 25% | 0% | 1% | 1% | |
| 1% | 12% | 14% | 9% | 0% | 3% | 0% |
, .
, , . Tinkoff — , , . " " (, 1/10 ) . IVR , 8 kHz, , . — , , . — Google, .
, production / ( "" 10% ):
| Ashmanov | 0 | 7 |
| 1 | 13 (9 enhanced) | |
| Sber | 2 | 0 |
| Sber IVR | 4 | 4 |
| Silero | 13 | 0 |
| Tinkoff | 6 | 2 |
| Yandex | 10 | 1 |
— , . " " — . bleeding edge ( ), " " , 17 21. , .
gRPC API. SMB , . ( , ). , "" , . 40 ( PDF), .
Tinkoff gRPC, ( , ). enterprise ( , ) , , . , .
… , , . , b2b , , . 500- 200 . -, "" .

, ( ) ( — ). 1 (RTS = 1 / RTF):
| RTS per Thread | Threads | ||
|---|---|---|---|
| Ashmanov | 0.2 | 8 | |
| Ashmanov | 1.7 | 1 | |
| 4.3 | 8 | ||
| Google enhanced | 2.9 | 8 | |
| Sber | 13.6 | 8 | |
| Sber | 14.1 | 1 | |
| Silero | 2.5 | 8 | 4-core, 1080 |
| Silero | 3.8 | 4 | 4-core, 1080 |
| Silero | 6.0 | 8 | 12 cores, |
| Silero | 9.7 | 1 | 12 cores, |
| Tinkoff | 1.4 | 8 | |
| Tinkoff | 2.2 | 1 | |
| Yandex | 5.5 | 2 | 8 — |
RTS, .
( , ) ( ), . VDS, Nvidia Tesla, - ( — ). .
, EX51-SSD-GPU, . , , .
. 12 + GPU ~150 RTS. , 12+ , . , - . aspirational 2-3 .
( ), ( ) . — ( ), . - … 60 !

, Open STT, , , . - . , . , .
/
, 1080 Ti, 2080 Ti. , .
Kami mengirim data ke Yandex dalam format tersebut opus
. Kami menguji sedikit, tampaknya Yandex tidak memiliki perbedaan khusus antara wav
dan opus
tidak.