I-EnCodec, i-codec entsha yomsindo ye-Meta

i-encodec

I-Encodec iyikhodekhi ehlukanisa kusetshenziswa inethiwekhi ye-neural enezinga lokucindezelwa cishe elingu-10x

Muva nje Meta (owayekade enguFacebook) yethule i-codec yayo entsha yomsindo ebizwa nge-EnCodec, lokho isebenzisa izindlela zokufunda zomshini ukwandisa isilinganiso sokucindezela ngaphandle kokulahlekelwa ikhwalithi.

Indlela entsha ingacindezela futhi inciphise umsindo ngesikhathi sangempela ukuze kuzuzwe ukuncishiswa kosayizi wezinga eliphezulu. ikhodekhi ingasetshenziselwa kokubili ukusakaza umsindo ngesikhathi sangempela ngokuphathelene nokufakwa kwekhodi kokugcina kamuva kumafayela.

Namuhla, sichaza ngenqubekelaphambili esenziwe yi-Fundamental AI Research (FAIR) endaweni ye-AI-powered audio hyper-compression. Zibone ngeso lengqondo ulalele umlayezo womsindo womngane endaweni enokuxhumana okubuthakathaka futhi engami noma ephahlazeka. Ucwaningo lwethu lubonisa ukuthi singasebenzisa kanjani i-AI ukuze usisize sifinyelele lokhu.

Ku-Codec ukunikeza amamodeli amabili ilungele ukulandwa:

  1. Imodeli ye-causal esebenzisa isilinganiso sesampula esingu-24 kHz, isekela umsindo we-monophonic kuphela, futhi iqeqeshelwa idatha yomsindo ehlukahlukene (efanele ukubhala ngekhodi yenkulumo). Imodeli ingasetshenziselwa ukupakisha idatha yomsindo ukuze idluliselwe ngamanani amancane angu-1,5, 3, 6, 12 kanye no-24 kbps.
  2. Imodeli engeyona imbangela esebenzisa isilinganiso sesampula esingu-48kHz, esekela umsindo we-stereo, futhi yaqeqeshelwa umculo kuphela. Imodeli isekela ama-bit rates angu-3, ​​6, 12 kanye no-24 kbps.

Kumodeli ngayinye, kulungiselelwe imodeli yolimi olwengeziwe, yini ivumela ukwanda okukhulu esilinganisweni sokucindezela (kufika ku-40%) ngaphandle kokulahlekelwa kwekhwalithi. Ngokungafani namaphrojekthi wangaphambilini wokusebenzisa amasu okufunda komshini ekucindezelweni komsindo, I-EnCodec ayikwazi ukusetshenziselwa ukupakisha inkulumo kuphela, kodwa futhi nokucindezelwa komculo nemvamisa yesampula engu-48 kHz, ehambisana nezinga lama-CD alalelwayo.

Ngokusho kwabathuthukisi be-codec entsha, ngokudlulisela ngesilinganiso esincane esingu-64 kbps uma kuqhathaniswa nefomethi ye-MP3, bakwazile ukukhulisa isilinganiso sokucindezela komsindo cishe izikhathi eziyishumi ngenkathi begcina izinga elifanayo lekhwalithi (isibonelo, uma usebenzisa i-MP3 idinga umkhawulokudonsa ongu-64 kbps, ukuze idluliselwe ngekhwalithi efanayo ku-EnCodec, 6 kbps kwanele).

Le datha ingase iqoshwe kusetshenziswa inethiwekhi ye-neural. Sizuze isilinganiso sokucindezelwa okungu-10x uma kuqhathaniswa ne-MP3 ku-64kbps, ngaphandle kokulahlekelwa ikhwalithi. Nakuba lawa masu ake ahlolisiswa ngaphambilini mayelana nenkulumo, singabokuqala ukuwenza asebenze kumsindo we-stereo oyisampula we-48 kHz (okungukuthi ikhwalithi ye-CD), okuyindinganiso yokusabalalisa umculo.

Isakhiwo se-codec Yakhelwe phezu kwesisekelo senethiwekhi ye-neural ngezakhiwo “eziguqulayo” futhi isekelwe kumabhondi amane: i-encoder, i-quantizer, i-decoder kanye nokubandlulula:

  • El ikhodi ikhipha amapharamitha kudatha yezwi futhi iyiguqule ibe ukusakazwa kwephakethe ngenani eliphansi lozimele.
  • El i-quantifier (I-RVQ, I-Residual Vector Quantizer) iguqula ukusakaza kokukhishwayo kwesifaki khodi kube amasethi amaphakethe, kucindezela ulwazi oluhlobene nesilinganiso sebhithi esikhethiwe. Okukhiphayo kwe-quantizer ukumelela okucindezelwe kwedatha efanele ukudluliselwa ngenethiwekhi noma ukulondolozwa kudiski.
  • El i-decoder inquma ukumelwa kwedatha ecindezelwe futhi yakhe kabusha igagasi lomsindo langempela.
  • El umbandlululi ithuthukisa ikhwalithi yamasampuli akhiqiziwe (isampula) kucatshangelwa imodeli yokubona kokuzwa komuntu.

Kungakhathalekile izinga lekhwalithi ne-bitrate, amamodeli asetshenziselwa ukubhala ikhodi nokuqopha ayahluka ezidingweni zesisetshenziswa ezinesizotha (izibalo ezidingekayo ekusebenzeni kwesikhathi sangempela zenziwa kumongo owodwa we-CPU).

Okokugcina, kulabo kini abathanda, kufanele nazi ukuthi ukusetshenziswa kwenkomba kwe-EnCodec kubhalwe nge-Python kusetshenziswa uhlaka lwe-PyTorch futhi kunikezwe ilayisense ngaphansi kwelayisensi ye-CC BY-NC 4.0 (Creative Commons Attribution-NonCommerce) ukuze isetshenziswe okungezona ezohwebo. kuphela.

Uma ungathanda ukufunda okwengeziwe ngayo, ungathintana nemininingwane kokuthi isixhumanisi esilandelayo.


Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Unomthwalo wemfanelo ngedatha: AB Internet Networks 2008 SL
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.