Mva nje meta (obesakuba nguFacebook) Iveze ikhowudi yayo entsha yomsindo ebizwa ngokuba yi-EnCodec, que isebenzisa ubuchule bokufunda ngoomatshini ukwandisa umlinganiselo woxinzelelo ngaphandle kokulahlekelwa ngumgangatho.
Indlela entsha inokucinezela kunye ne-decompress audio ngexesha langempela ukufezekisa ukunciphisa ubungakanani be-state-of-art. ikhowudi ingasetyenziselwa zombini ukusasaza iaudio ngexesha lokwenyani njengokufaka ikhowudi yokugcina kamva kwiifayile.
Namhlanje, sichaza inkqubela phambili yoPhando lwethu lwe-AI (FAIR) esele yenziwe kwindawo ye-AI-powered audio hyper-compression. Khawube nomfanekiso wakho umamele umyalezo ophulaphulwayo womhlobo kwindawo enonxibelelwano olulambathayo kwaye ungayeki okanye untlitheka. Uphando lwethu lubonisa indlela esinokusebenzisa ngayo i-AI ukusinceda sifezekise oku.
KwiCodec zibonelela ngeemodeli ezimbini ilungele ukukhuphela:
- Imodeli ye-causal esebenzisa i-24 kHz isampuli yesampuli, isekela kuphela i-monophonic audio, kwaye iqeqeshelwa kwiintlobo ezahlukeneyo zedatha ye-audio (efanelekile kwi-encoding yentetho). Imodeli ingasetyenziselwa ukupakisha idatha ye-audio yokuhanjiswa kwi-bit rates ye-1,5, 3, 6, 12 kunye ne-24 kbps.
- Imodeli engeyiyo i-causal esebenzisa ireyithi yesampulu ye-48kHz, ixhasa isandi se-stereo, kwaye yaqeqeshwa kumculo kuphela. Imodeli ixhasa imilinganiselo ye-bit ye-3, 6, 12 kunye ne-24 kbps.
Kumzekelo ngamnye, kulungiselelwe imodeli yolwimi olongezelelweyo, yintoni ivumela ukwanda okukhulu kwi-compression ratio (ukuya kwi-40%) ngaphandle kokulahlekelwa komgangatho. Ngokungafaniyo neeprojekthi zangaphambili zokusebenzisa iindlela zokufunda koomatshini kuxinzelelo lomsindo, I-EnCodec ingasetyenziselwa ukupakisha intetho kuphela, kodwa kunye noxinzelelo lomculo kunye nesampulu yesampulu ye-48 kHz, ehambelana nenqanaba leeCD zomsindo.
Ngokutsho kwabaphuhlisi be-codec entsha, ngokuhambisa ngesantya esincinci se-64 kbps xa kuthelekiswa nefomethi ye-MP3, bakwazile ukwandisa umlinganiselo woxinzelelo lwe-audio malunga namaxesha alishumi ngelixa begcina umgangatho ofanayo womgangatho (umzekelo, xa usebenzisa i-MP3 ifuna i-bandwidth ye-64 kbps, ukudlulisa kunye nomgangatho ofanayo kwi-EnCodec, i-6 kbps yanele).
Le datha inokuthi emva koko ihlaziywe kusetyenziswa inethiwekhi ye-neural. Sifumene umlinganiselo oqikelelweyo we-10x xa kuthelekiswa neMP3 kuma-64kbps, ngaphandle kokulahlekelwa ngumgangatho. Ngelixa obu buchule bukhe baphononongwa ngaphambili kwintetho, singabokuqala ukuyenza isebenze i-48 kHz yesampulu yomsindo westereo (okt umgangatho weCD), osemgangathweni wokusasazwa komculo.
Uyilo lwekhowudi Yakhiwe kwisiseko sothungelwano lwe-neural ngolwakhiwo “oluguqulayo” kwaye isekelwe kwiibhondi ezine: i-encoder, quantizer, idikhowuda kunye nomcaluli:
- El encoder ikhupha iiparameters kwidatha yelizwi kwaye iyiguqule ibe ngumjelo opakishwe kwisantya esisezantsi sesakhelo.
- El umxabisi (i-RVQ, iResidual Vector Quantizer) iguqula umjelo wemveliso ye-encoder ibeseti zeepakethi, icinezela ulwazi olunxulumene nesantya sebit esikhethiweyo. Imveliso ye-quantizer yimbonakaliso ecinezelweyo yedatha efanelekileyo yokudluliselwa kwinethiwekhi okanye ukugcinwa kwidisk.
- El idikhowuda icofa umelo lwedatha ecinezelweyo kwaye iphinda iqulunqe isandi soqobo.
- El umcaluli iphucula umgangatho weesampulu ezenziweyo (isampulu) kuthathelwa ingqalelo imodeli yokuva umntu.
Nokuba yeyiphi inqanaba lomgangatho kunye ne-bitrate, imifuziselo esetyenziselwa ukukhowudwa kunye nokuchazwa kwekhowudi yahlukile kwiimfuno zezibonelelo ezithozamileyo (ubalo olufunekayo ekusebenzeni kwexesha lokwenyani lwenziwa kumbindi we-CPU enye).
Okokugqibela, kwabo banomdla, kufuneka wazi ukuba ukuphunyezwa kwereferensi ye-EnCodec ibhalwe kwiPython usebenzisa isakhelo sePyTorch kwaye ilayisenisi phantsi kweCC BY-NC 4.0 (Creative Commons Attribution-NonCommercial) ilayisenisi yokusetyenziswa okungeyontengiso. kuphela.
Ukuba unomdla wokufunda ngakumbi ngayo, ungajonga iinkcukacha ku eli khonkco lilandelayo.