IMozilla Common Voice 7.0 ifika namahora angaphezu kwe-13,000 wedatha yezwi

Muva nje INVIDIA neMozilla bamemezele ukukhishwa kwenguqulo entsha ye- "Mozilla Common Voice 7.0" emele amahora angaphezu kwe-13.000 wedatha yezwi yemvelaphi ehlangene kanye nokwengezwa kwezinye izilimi eziyi-16 nalokho kuqhathaniswa nokuvuselelwa kokugcina, ubukhulu be- ivolumu yempahla ekhuluma eqoqweni lenyuke cishe cishe ngama-50% ngaphezulu.

Futhi, inani lezilimi ezesekelwayo lenyukile lisuka ku-60 laya kuma-76, kufaka phakathi ukusekelwa okungeziwe kwezilimi zaseBelarusian, Kazakh, Uzbek, Bulgarian, Armenian, Azerbaijani, neBashkir okokuqala ngqa.

Kulabo abangajwayelene ne-Common Voice, kufanele bazi ukuthi eLena idatha yedatha yezwi evulekile enkulu kunazo zonke emhlabeni futhi yenzelwe ukubusa ngentando yeningi ubuchwepheshe bezwi. Isetshenziswa ngabaphenyi, izifundiswa kanye nabathuthukisi kusukela emhlabeni jikelele.

Abasebenzi bahlanganisa imiphakathi yabo ukuthi inikele ngemininingwane yezwi ku-database ye-MCV yomphakathi, noma ngubani angayisebenzisa ukuqeqesha ubuchwepheshe obunikwe amandla ngezwi. Njengengxenye yokubambisana kwe-NVIDIA cku-Mozilla Common Voice, amamodeli aqeqeshwe kulokhu namanye amasethi wedatha yomphakathi ayatholakala mahhala nge-toolkit yomthombo ovulekile ebizwa nge-NVIDIA NeMo.

Le phrojekthi ihlose ukuhlela ukusebenza ngokuhlanganyela ukuthola imininingwane egciniwe yamathempulethi wezwi, kucatshangelwa zonke izinhlobo zamazwi nezindlela zokukhuluma. Isizindalwazi esiqoqiwe esinamarekhodi wokuphinyiselwa okuhlukahlukene kwemishwana ejwayelekile yenkulumo yomuntu singasetshenziswa ngaphandle kwemingcele ezinhlelweni zokufunda ngomshini nasemisebenzini yocwaningo.

Ngokusho kombhali welabhulali eqhubekayo yokuqashelwa kwenkulumo yeVosk, ukushiyeka kwe-Common Voice set kungukuzinza okukodwa kokukhulunywa ngamazwi (ubukhulu bamadoda aneminyaka engama-20 nengama-30 kanye nokuntuleka kwempahla enezwi labesifazane, izingane nabadala), ukuntuleka kokuhlukahluka kwesilulumagama (ukuphindaphindwa kwemishwana efanayo) nokusatshalaliswa kwamarekhodi e-MP3 athambekele ekuhlanekezelweni.

Mayelana nenguqulo entsha ye-Common Voice 7.0

Kule nguqulo entsha ngaphezu kwabantu abayizinkulungwane ezingama-75 ababambe iqhaza ekulungisweni kwezinto ngesiNgisi, kuphoqa amahora angama-2637 enkulumo eqinisekisiwe (bekukhona ababambiqhaza abayizinkulungwane ezingama-66 namahora angu-1686).

Futhi njengoba sishilo ekuqaleni, le nguqulo entsha yethula izilimi ezintsha eziyi-16 kudathasethi Yezwi Elivamile yezilimi ezingama-76 sezizonke, zazo izilimi ezinhlanu eziphezulu ngamahora esewonke isiNgisi (amahora angu-2.630), isiKinyarwanda (2.260), isiJalimane (1.040), isiCatalan (920) nesi-Esperanto (840).

Izilimi ezikhuphuke kakhulu ngamaphesenti yisiThai (ukukhula cishe okungama-20, kusuka emahoreni ayi-12 kuye emahoreni ama-250), i-luganda (ukukhula okuphindwe ka-9, kusuka emahoreni ayi-8 kuye emahoreni angama-80), isi-Esperanto (ukukhula izikhathi ezingaphezu kwezingu-7, kusuka emahoreni ayi-100 kuya emahoreni angama-840) kanye nesiTamil (ukukhula okungaphezu kwe-8x, kusuka emahoreni angama-24 kuya emahoreni angama-220). Kuyathakazelisa ukuthi, IRwanda ilala isibili ngokwedatha eqoqekayo, lapho kwaqoqwa amahora angama-2260. Zilandelwa yiJalimane (1040), isiCatalan (920) ne-Esperanto (840). Idathasethi manje inamazwi ahlukile angaphezu kwe-182,000, ukukhula okungama-25% emphakathini wabakhokhi bentela ezinyangeni eziyisithupha nje kuphela.

Kushiwo futhi ukuthi njengengxenye yokubamba iqhaza kwabo kuphrojekthi, I-NVIDIA ilungiselele amamodeli aqeqeshiwe asezilungele ukusetshenziswa ezinhlelo zokufunda ngomshini ngokuya ngemininingwane eqoqiwe (iyahambisana nePyTorch). Amamodeli asatshalaliswa njengengxenye yethuluzi lamahhala futhi elivulekile le-NVIDIA NeMo, ngokwesibonelo, esivele lisetshenziswa kwizinsizakalo zezwi ezizenzakalelayo ze-MTS ne-Sberbank.

Izinhlobo zikhona kuhlose ukuqaphela inkulumo, ukuhlanganiswa kwenkulumo kanye nezinhlelo zokucubungula ulwazi ngolimi lwemvelo futhi zingaba wusizo kubaphenyi ekwakhiweni kwezinhlelo zokuxoxisana ngezwi, amapulatifomu wokubhala, kanye nezikhungo zezingcingo ezizenzakalelayo. Ngokungafani namaphrojekthi abekade ekhona, amamodeli ashicilelwe awakhawulelwe ekuqaphelweni kwesiNgisi futhi ahlanganisa izilimi ezahlukahlukene, ama-accents nezinhlobo zokukhuluma.

Okokugcina uma unentshisekelo yokwazi kabanzi ngakho, ungabheka imininingwane kufayela le- isixhumanisi esilandelayo.


Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Unomthwalo wemfanelo ngedatha: AB Internet Networks 2008 SL
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.