AI Art Generation: Diffusion compared to GANs

AIGeneration.blog
4 min readNov 12, 2022
A Dream of Haudenosaunee Sky Travelers. Copyright MoniGarr.com All Rights Reserved.
A Dream of Haudenosaunee Sky Travelers. Copyright MoniGarr.com All Rights Reserved.

This article is a starting point to assist artists and developers with continuing conversations and further exploration into AI Art Generation trends, possibilities and features.

AI Art Generators are used in a variety of production pipelines for producing 3D, 2D, Audio, Video, Text, Animations, Films, Virtual Reality, Augmented Reality, 360, 180 films, Fashion, Music, Comics, Illustrations, Poetry, Patterns, Game Assets and more. Any and every art form can be generated with the help of Artificial Intelligence solutions.

GAN : Generative Adversarial Network

GAN MODELS:

  • receives random noise (class conditioning variable) as input and outputs a realistic sample.
  • used for generating images, video and voice.
  • uses 2 neural networks; each are set against the other to generate new synthesized data instances that can pass as real-world data.
  • Generator is a neural network that learns to generate data. The output is directly connected to the discriminator output.
  • Discriminator is a neural network that decides if each data instance belongs to the training dataset. Penalizes the generator when it generates implausible results. Backpropogation is used for the discriminator’s classification to give a signal for updating the Generator’s weights. The output is directly connected to the Generator output.

DIFFUSION MODELS:

  • progressively add Gaussian noise; corrupting the training data; removing data details until it becomes pure noise.
  • trains a neural network to reverse the data corruption process
  • synthesizes data from pure noise; gradually denoising the data until a clean data sample is generated.
  • SR3: Resolution via Repeated Refinements outperform existing GANs as of June 2022, producing strong image super-resolution results in human evaluations.
  • CDM: Cascaded Diffusion Models surpass BigGAN-deep, VQ-VAE2 on both FID score and Classification Accuracy Score as of June 2022; producing high fidelity ImageNet samples.
  • DiffWave: probalistic model for conditional and unconditional waveform generation; producing high-fidelity audio in different waveform generation tasks. Includes neural vocoding conditioned on Mel spectrogram, class-conditional generation and unconditional generation. Results outperform autoregressive and GAN-based waveform models significantly regarding audio quality and sample diversity from automatic evaluations and human evaluations.
  • GPT-3: Generative Pre-Trained Transformer 3 is an autoregressive language model using deep learning to produce human-like text. This is available for developers with the OpenAI apis and being used to assist artists to design text prompts for producing new art with AI Art Generators. Architecture details, history and references are online at GPT-3 — Wikipedia

GAN ISSUES: November 2022

  • Vanishing Gradients. The discriminator gets worse at differentiating real and fake data when the Generator training goes well (leading to a reduction in accuracy).
  • Mode Collapse. Generator can learn to produce only the output from when it produces an especially plausible output. The best strategy is for the Discriminator to always reject the plausible output. If the next generation of Discriminators get stuck in a local minimum and do not find the best strategy; it’s too easy for the next Generator iteration to find the most plausible output for the current Discriminator.
  • Failure to Converge. This is the most common failure when training GANs. It refers to when the discriminator loss goes to zero or close to zero while the discriminator rises and continues rising during the same training period. This is usually caused by the generator outputting garbage images that are easy for the discriminator to identify.

DIFFUSION ISSUES: November 2022

  • Text Prompts without End User Knowledge: Lack of knowledge regarding how to create text prompts for each AI Art Generator results in garbled visuals regarding human appendages, eyes, mouth. Each version and brand of AI Art Generators requires specific knowledge about Art Theory, History, Techniques and Text Prompt syntax to produce art with specific styles, colors, layouts and more.
  • Lack of Documentation: New technology, techniques and documentation are currently being produced even while everything is in a speed of thought — state of change.
  • Human Artists being Harmed: This issue requires a dedicated article about the activities and consequences of web scraping that results in human artists being extracted from (without any prior fully informed consent, erased and then replaced by AI Generated Art for the benefit of those produce AI Art Generation software solutions.

REFERENCES

--

--