IMozilla Common Voice 7.0 ifika ngaphezulu kweeyure ezili-13,000 zedatha yelizwi

Mva nje I-NVIDIA neMozilla zibhengeze ukukhutshwa kwenguqulelo entsha ye- "Mozilla Common Voice 7.0" emele ngaphezulu kweeyure ezili-13.000 zedatha yelizwi yemvelaphi edibeneyo kunye nokongezwa kwezinye iilwimi ezili-16 kwaye xa kuthelekiswa nohlaziyo lokugqibela, ubukhulu be umthamo wezinto eziphathekayo Ukuthetha kwingqokelela inyuke phantse ngama-50% ngaphezulu.

Kwakhona, Inani leelwimi ezixhaswayo lenyukile ukusuka kuma-60 ukuya kuma-76, kubandakanya inkxaso eyongezelelweyo ye-Belarusian, Kazakh, Uzbek, Bulgarian, Armenian, Azerbaijani, kunye ne-Bashkir iilwimi okokuqala.

Kwabo bangaqhelananga neLizwi eliqhelekileyo, kufuneka bazi ukuba eOlu seto lwedatha evulekileyo yedatha yelizwi inkulu emhlabeni kwaye yenzelwe ukwenza idemokhrasi yelizwi ibe yedemokhrasi. Isetyenziswa ngabaphandi, izifundiswa kunye nabaphuhlisi ehlabathini lonke.

Abasebenzi bahlanganisa uluntu lwabo ukuba banikele ngedatha yelizwi kwiziko ledatha le-MCV, elinokuthi nabani na alisebenzise ukuqeqesha itekhnoloji enikwe amandla ngelizwi. Njengenxalenye yentsebenziswano yeNVIDIA ckwilizwi eliqhelekileyo leMozilla, Iimodeli eziqeqeshwe koku kunye nezinye iiseti zedatha yoluntu ziyafumaneka simahla ngokusebenzisa isixhobo esinezixhobo esivulekileyo esibizwa ngokuba yi-NVIDIA NeMo.

Le projekthi Iinjongo zokuququzelela umsebenzi odibeneyo wokuqokelela isiseko sedatha yeetemplate zezwi, uthathela ingqalelo zonke iintlobo zamazwi neendlela zokuthetha. Isiseko sedatha esiqokelelweyo kunye neerekhodi zamabinzana ohlukeneyo amabinzana aqhelekileyo kwintetho yomntu anokusetyenziswa ngaphandle kwezithintelo kwiinkqubo zokufunda ngomatshini nakwiiprojekthi zophando.

Ngokwombhali wethala lencwadi eliqhubekayo lokuqonda intetho, ukusilela kweseti yeLizwi eliqhelekileyo kukuma kwicala elinye lezinto ezinelizwi (ubukhulu bamadoda aneminyaka engama-20 nama-30 kunye nokungabikho kwemathiriyeli enelizwi labasetyhini, abantwana kunye nabadala), ukunqongophala kokungafani kwesigama (ukuphindwaphindwa kwamabinzana afanayo) kunye nokusasazwa kweerekhodi zeMP3 ezithanda ukugqwethwa.

Malunga nohlobo olutsha lweLizwi eliqhelekileyo 7.0

Kule nguqulo intsha ngaphezulu kwama-75 amawaka abantu abathathe inxaxheba kulungiselelo lwezixhobo zesiNgesi, ukuyalela iiyure ezingama-2637 zentetho eqinisekisiweyo (bekukho amawaka angama-66 abathathi-nxaxheba kunye neeyure ezili-1686).

Njengoko besesitshilo ekuqaleni, le nguqulo intsha yazisa ngeelwimi ezili-16 ezintsha kwiData eliDibeneyo leLizwi kwiilwimi ezingama-76, apho iilwimi ezintlanu ngeeyure ezipheleleyo sisiNgesi (iiyure ezingama-2.630 2.260), isiKinyarwanda (1.040), isiJamani (920), isiCatalan (840) kunye nesiEsperanto (XNUMX).

Iilwimi ezonyuke kakhulu ngepesenti sisiThai (phantse ukukhula okungama-20, ukusuka kwiiyure ezili-12 ukuya kwiiyure ezingama-250), i-luganda (ukukhula okuphindwe kalithoba, ukusuka kwiiyure ezisibhozo ukuya kwiiyure ezingama-9), isiEsperanto (ukukhula okungaphezulu kwamaxesha asixhenxe, ukusuka kwiiyure ezili-7 ukuya kwiiyure ezingama-100) kunye nesiTamil (ukukhula okungaphezulu kwe-8x, ukusuka kwiiyure ezingama-24 ukuya kwiiyure ezingama-220). Okumangalisayo kukuba IRwanda ikwindawo yesibini ngokwedatha eyongezelekayo, ekuqokelelwe kuzo iiyure ezingama-2260. Zilandelwa yiJamani (1040), isiCatalan (920) kunye ne-Esperanto (840). Idathasethi ngoku inamazwi angaphezu kwe-182,000, ukukhula ngama-25% kuluntu oluhlawula irhafu kwiinyanga nje ezintandathu.

Kukwakhankanyiwe ukuba njengenxalenye yenxaxheba yabo kwiprojekthi, I-NVIDIA ilungiselele ukulungela ukusebenzisa iimodeli eziqeqeshiweyo zeenkqubo zokufunda ngomatshini esekwe kwidatha eqokelelweyo (iyahambelana nePyTorch). Iimodeli zisasazwa njengenxalenye yesixhobo sasimahla nesivulekileyo seNVIDIA NeMo, umzekelo, esele isetyenziswa kwiinkonzo zelizwi ezizenzekelayo zeMTS kunye neSberbank.

Iimodeli zezi zijolise ekwazisweni kwentetho, ukuhlanganiswa kwentetho kunye neenkqubo zokuqhutywa kolwazi kulwimi lwendalo kwaye zinokuba luncedo kubaphandi kuyilo lweenkqubo zencoko yelizwi, amaqonga okhutshelo, kunye namaziko okufowuna azenzekelayo. Ngokungafaniyo neeprojekthi ezazifudula zikhona, iimodeli ezipapashiweyo azikhawulelwanga ekuqapheleni isiNgesi kwaye zigubungela iilwimi ezahlukeneyo, iindlela zokuthetha kunye neendlela zentetho.

Gqibela ukuba unomdla wokwazi okungakumbi ngayo, ungazijonga iinkcukacha kwi ukulandela ikhonkco.


Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Inoxanduva lwedatha: I-AB Internet Networks 2008 SL
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.