StyleGAN3, Nvidi's machine learning system for facial synthesis

Recently NVIDIA released the source code for StyleGAN3, a machine learning system based on generative adverse neural networks (GAN) to synthesize realistic images of human faces.

In StyleGAN3 are available to download ready-to-use trained models trained in the collection Flickr-Faces-HQ (FFHQ), which includes 70 thousand PNG images of high quality human faces (1024 × 1024). In addition, there are models built on the basis of the AFHQv2 (photographs of animal faces) and Metfaces (images of people's faces from classical painting portraits) collections.

About StyleGAN3

The design focuses on faces, but the system can be trained to generate any type of object, like landscapes and cars. What's more, tools are provided for self-learning of the neural network using your own image collections. Requires one or more NVIDIA graphics cards (Tesla V100 or A100 GPUs recommended), at least 12GB of RAM, PyTorch 1.9, and CUDA 11.1+ Toolkit. To determine the artificial nature of the received faces, a special detector is being developed.

The system allows to synthesize an image of a new face based on the interpolation of the features of several faces, combining their inherent features, in addition to adapting the final image to the required age, gender, hair length, smile character, nose shape, skin color, glasses, photographic angle.

The generator treats the image as a collection of styles, automatically separates the characteristic details (freckles, hair, glasses) of the general high-level attributes (posture, gender, age-related changes) and allows to combine them arbitrarily with the definition of dominant properties through weighting factors and that as a result, images are generated that they are apparently indistinguishable from actual photographs.

The first version of StyleGAN technology (released in 2019), followed by an improved version of StyleGAN2 in 2020, which improves image quality and removes some artifacts. At the same time, the system remained static, that is, it did not allow realistic animations or facial movements. When developing StyleGAN3, the main goal was to adapt the technology for use in animation and video.

StyleGAN3 uses a redesigned non-aliasing imaging architectureay offers new neural network training scenarios and also includes new utilities for interactive visualization (visualizer.py), analysis (avg_spectra.py) and video generation (gen_video.py). The implementation also reduces memory consumption and speeds up the learning process.

A key feature of the StyleGAN3 architecture was the transition to the interpretation of all signals in the neural network in the form of continuous processes, which made it possible to manipulate relative positions by forming parts, not tied to the absolute coordinates of individual pixels in the image, but fixed to the surface of the represented objects.

While in StyleGAN and StyleGAN2, snapping to pixels during build caused issues with dynamic renderingFor example, when the image was moving, there was a mismatch of small details, such as wrinkles and hairs, which seemed to move separately from the rest of the face image, in addition to that in StyleGAN3 these problems are solved and the technology has become quite suitable for video generation.

Finally, also worth mentioning the announcement of the creation by NVIDIA and Microsoft of the largest MT-NLG language model based on a deep neural network with a »transformative« architecture.

The model covers 530 billion parameters and a pool of 4480 GPUs was used for training (560 DGX A100 servers with 8 A100 GPUs of 80 GB each). The areas of application of the model are called information processing problem solving in natural language, such as predicting the completion of an unfinished sentence, answering questions, reading comprehension, forming conclusions in natural language, and analyzing the ambiguity of the meaning of words. .

If you are interested in knowing more about it, you can check the details of StyleGAN3 In the following link.


Be the first to comment

Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.