Generative AI using python


Module1


Generative AI (GenAI) is the latest subtype of AI that broadly describes Machine Learning (ML) models or algorithms.

Difference Between Traditional AI and Generative AI


Traditional AIGenerative AI
    AI is used to create intelligent systems that can perform those tasks which generally require human intelligence.    It generates new text, audio, video, or any other type of content by learning patterns from existing training data.
    The purpose of AI algorithms or models are to mimic human intelligence across wide range of applications.    The purpose of generative AI algorithms or models is to generate new data having similar characteristics as data from the original dataset.

Overview of Generative Adversarial Network


    GAN stands for Generative Adversarial Network, and it is a class of artificial intelligence algorithms used in machine learning and deep learning for generating data. GANs were introduced by Ian Goodfellow and his colleagues in 2014 and have since become a popular and powerful tool in various applications, including image generation, text generation, and more.




How does a GAN work?

GANs train by having two networks the Generator (G) and the Discriminator (D) compete and improve together. Here's the step-by-step process

1. Generator's First Move

The generator starts with a random noise vector like random numbers. It uses this noise as a starting point to create a fake data sample such as a generated image. The generator’s internal layers transform this noise into something that looks like real data.

2. Discriminator's Turn

The discriminator receives two types of data:

  • Real samples from the actual training dataset.
  • Fake samples created by the generator.

D's job is to analyze each input and find whether it's real data or something G cooked up. It outputs a probability score between 0 and 1. A score of 1 shows the data is likely real and 0 suggests it's fake.

3. Adversarial Learning

  • If the discriminator correctly classifies real and fake data it gets better at its job.
  • If the generator fools the discriminator by creating realistic fake data, it receives a positive update and the discriminator is penalized for making a wrong decision.

4. Generator's Improvement

  • Each time the discriminator mistakes fake data for real, the generator learns from this success.
  • Through many iterations, the generator improves and creates more convincing fake samples.

5. Discriminator's Adaptation

  • The discriminator also learns continuously by updating itself to better spot fake data.
  • This constant back-and-forth makes both networks stronger over time.

6. Training Progression

  • As training continues, the generator becomes highly proficient at producing realistic data.
  • Eventually the discriminator struggles to distinguish real from fake shows that the GAN has reached a well-trained state.
  • At this point, the generator can produce high-quality synthetic data that can be used for different applications.

Discriminative vs Generative Models


What are Discriminative Models?

Discriminative models are ML models and, concentrate on modeling the decision boundary between several classes of data using probability estimates and maximum likelihood. These types of models, mainly used for supervised learning, are also known as conditional models.

Discriminative models are not much affected by the outliers. Although this makes them a better choice than generative models, it also leads to misclassification problem which can be a big drawback.

Popular Discriminative Models

Logistic Regression

Support Vector Machines

K-nearest Neighbor (KNN)

What are Generative Models?

Generative models are ML models and, as the name suggests, aim to capture the underlying distribution of data, and generate new data comparable to the original training data. These types of models, mainly used for unsupervised learning, are categorized as a class of statistical models capable of generating new data instances.

The only drawback of generative models, when compared to discriminative models, is that they are prone to outliers.

Popular Generative Models

Bayesian Network

Generative Adversarial Network (GAN)

Variational Autoencoders (VAEs)

Autoregressive model, Nave Bayes, Markov random field, Hidden Markov model (HMM), Latent Dirichlet Allocation (LDA) are few other examples of the commonly used generative models.

Difference Between Discriminative and Generative Models


CharacteristicDiscriminative ModelsGenerative Models
ObjectiveFocus on learning the boundary between different classes directly from the data. Their primary objective is to classify input data accurately based on the learned decision boundary.Aim to understand the underlying data distribution and generate new data points that resemble the training data. They focus on modeling the process of data generation, allowing them to create synthetic data instances.
Probability DistributionEstimates the parameters of probability P(Y|X) from the training dataset.Calculates the posterior probability P(Y|X) using the Bayes Theorem.
Handling OutliersRelatively robust to outliersProne to outliers
PropertyThey do not possess generative properties.They possess discriminative properties.
ApplicationsCommonly used in classification tasks, such as image recognition and sentiment analysis.Commonly used in tasks like data generation, anomaly detection, and data augmentation, beyond traditional classification tasks.
ExamplesLogistic regression, Support vector machines, Decision trees, neural nets etc.Variational Autoencoders (VAEs), Generative adversarial network (GAN), Nave Bayes etc.


The Role of Probability Distribution in Generative Models


What is Probability Distribution?

Probability Distribution is a mathematical function that represents the probability of different possible values of a random variable within a given range.

A probability distribution is a theoretical representation of frequency distribution (FD). In statistics, FD describes the number of occurrences of a variable in a dataset. On the other hand, probability distribution, along with the frequencies of number of occurrences, also assigns probabilities to them.

Types of Probability Distributions

There are two types of probability distributions −

  • Discrete Probability Distributions
  • Continuous Probability Distributions

Discrete Probability Distributions

Discrete probability distributions are mathematical functions that describe the probabilities of different occurrences from a discrete or categorial random variables.

Discrete probability distribution includes only those values with a possible probability. In simple words, it does not include any value with zero probability. For example, 5.5 is not a possible outcome of dice rolls, hence it does not include as a probability distribution of dice rolls.

The total of the probabilities of all possible values in a discrete probability distribution is always one.

 common discrete probability distributions 

Discrete Probability DistributionExplanationExample
Bernoulli DistributionIt describes the probability of success (1) or failure (0) in a single experiment.The outcome of a single coin flip.
Binomial DistributionIt models the number of successes in a fixed number of trials n with p probability.The number of times it comes heads when you toss a coin 10 times.
Poisson DistributionIt predicts the k number of events occurring in a fixed interval of time or space.The number of emails messages received per day.
Geometric DistributionIt represents the number of trials needed to achieve the first success in a sequence of trials.The number of times a coin is flipped until it lands on heads.
Hypergeometric DistributionIt calculates the probability of drawing a specific number of successes from a finite population.The number of red balls drawn from a bag of mixed colored balls.

Continuous Probability Distributions

Continuous probability distributions are mathematical functions that describe the probabilities of different occurrences within a continuous range of values.

This includes an infinite number of possible values. For example, in the interval [4, 5] there are infinite values between 4 and 5.

 common continuous probability distributions

Continuous Probability DistributionExplanationExample
Continuous Uniform DistributionIt assigns equal probability to all values within equal-sized interval.The height of a person between 5 to 6 feet.
Normal (Gaussian) DistributionIt forms a bell-shaped curve and describes the data clustered around the mean and symmetrical tails.IQ scores
Exponential DistributionIt models the time between events in a Poisson process, where events occur at a constant rate.The time until the next customer arrives.
Log-normal DistributionIt represents the right-skewed data when plotted on a logarithmic scale.Stock prices, income distributions, etc.
Beta DistributionIt describes the random variables constrained to a finite interval. It is often used in Bayesian statistics.The probability of success in a binomial trial.

Use of Probability Distributions in Generative Modeling

Probability distributions play a crucial role in generative modeling. 

  • Data Distribution − Generative Models aim to capture the underlying probability distribution of data from which the samples are taken.
  • Generating New Samples − Once understanding the data distribution is done, generative models can generate new data comparable to the original dataset.
  • Evaluation and Training − Probability distributions are used to evaluate and train generative models. Evaluation metrics such as likelihood, perplexity, and Wasserstein distance are used to evaluate the quality of generated samples compared to the original dataset.
  • Variability and Uncertainty − Probability distributions are used to find the variability and uncertainty present in the data. Generative models can use this information to generate distinct and realistic samples.

Introduction to PyTorch framework for deep learning

PyTorch is defined as an open source machine learning library for Python. It is used for applications such as natural language processing.

Features

The major features of PyTorch are mentioned below −

Easy Interface − PyTorch offers easy to use API; hence it is considered to be very simple to operate and runs on Python. The code execution in this framework is quite easy.

Python usage − This library is considered to be Pythonic which smoothly integrates with the Python data science stack. Thus, it can leverage all the services and functionalities offered by the Python environment.

Computational graphs − PyTorch provides an excellent platform which offers dynamic computational graphs. Thus a user can change them during runtime. This is highly useful when a developer has no idea of how much memory is required for creating a neural network model.

PyTorch is known for having three levels of abstraction as given below −

  • Tensor − Imperative n-dimensional array which runs on GPU.
  • Variable − Node in computational graph. This stores data and gradient.
  • Module − Neural network layer which will store state or learnable weights.

The following are the advantages of PyTorch −

  • It is easy to debug and understand the code.
  • It includes many layers as Torch.
  • It includes lot of loss functions.
  • It can be considered as NumPy extension to GPUs.
  • It allows building networks whose structure is dependent on computation itself.
Pytorch - Implementing First Neural Network

To create a simple neural network with one hidden layer developing a single output unit.

Step 1

import the PyTorch library using the below command −

import torch 
import torch.nn as nn

Step 2

Define all the layers and the batch size to start executing the neural network as shown below −

# Defining input size, hidden layer size, output size and batch size respectively

n_in, n_h, n_out, batch_size = 10, 5, 1, 10

Step 3

As neural network includes a combination of input data to get the respective output data, we will be following the same procedure as given below −

# Create dummy input and target tensors (data)
x = torch.randn(batch_size, n_in) y = torch.tensor([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]])

Step 4

Create a sequential model with the help of in-built functions. Using the below lines of code, create a sequential model −

# Create a model

model = nn.Sequential(nn.Linear(n_in, n_h), nn.ReLU(), nn.Linear(n_h, n_out), nn.Sigmoid())

Step 5

Construct the loss function with the help of Gradient Descent optimizer as shown below −

#Construct the loss function

criterion = torch.nn.MSELoss()

# Construct the optimizer (Stochastic Gradient Descent in this case) optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

Step 6

Implement the gradient descent model with the iterating loop with the given lines of code −

# Gradient Descent

for epoch in range(50): # Forward pass: Compute predicted y by passing x to the model y_pred = model(x) # Compute and print loss loss = criterion(y_pred, y) print('epoch: ', epoch,' loss: ', loss.item()) # Zero gradients, perform a backward pass, and update the weights. optimizer.zero_grad() # perform a backward pass (backpropagation) loss.backward() # Update the parameters optimizer.step()

Step 7

The output generated is as follows −

epoch: 0 loss: 0.2545787990093231 epoch: 1 loss: 0.2545052170753479 epoch: 2 loss: 0.254431813955307 epoch: 3 loss: 0.25435858964920044 epoch: 4 loss: 0.2542854845523834 epoch: 5 loss: 0.25421255826950073 epoch: 6 loss: 0.25413978099823 epoch: 7 loss: 0.25406715273857117 epoch: 8 loss: 0.2539947032928467 epoch: 9 loss: 0.25392240285873413 epoch: 10 loss: 0.25385022163391113 epoch: 11 loss: 0.25377824902534485 ....


Module 2

Architecture of GAN

GANs consist of two main models that work together to create realistic synthetic data which are as follows:

1. Generator Model

The generator is a deep neural network that takes random noise as input to generate realistic data samples like images or text. It learns the underlying data patterns by adjusting its internal parameters during training through backpropagation. Its objective is to produce samples that the discriminator classifies as real.

The Role of Generator in GAN Architecture

The first primary part of GAN architecture is the Generator. 

Generator: Function and Structure

The primary goal of the generator is to generate new data samples that are intended to resemble real data from the dataset. It begins with a random noise vector and transforms it through fully connected layers like Dense or Convolutional layers to generate synthetic data sample.

Generator: Layers and Components

Listed below are the layers and components of the generator neural network −

  • Input Layer − The generator receives a low dimensionality random noise vector or input data as input.
  • Fully Connected Layers − The FLC is used to increase the input noise vector dimensionality.
  • Transposed Convolutional Layers − These layers are also known as deconvolutional layers. It is used for upsampling i.e., to generate an output feature map having greater spatial dimension than the input feature map.
  • Activation Functions − Two commonly used activations functions are: Leaky ReLU and Tanh. The Leaky ReLU activation function helps in decreasing the dying ReLU problem, while the Tanh activation function makes sure that the output is within a specific range.
  • Output Layer − It produces the final data output like an image of a certain resolution.

Generator Loss Function: The generator tries to minimize this loss:

                            JG=m1Σi=1mlogD(G(zi))

where

  • JG measure how well the generator is fooling the discriminator.
  • G(zi) is the generated sample from random noise zi
  • D(G(zi)) is the discriminator’s estimated probability that the generated sample is real.

The generator aims to maximize D(G(zi)) meaning it wants the discriminator to classify its fake data as real (probability close to 1).

The goal of generator neural network is to create data that the discriminator cannot distinguish from real data. This can be achieved by minimizing the generators loss function.


2. Discriminator Model

The discriminator acts as a binary classifier helps in distinguishing between real and generated data. It learns to improve its classification ability through training, refining its parameters to detect fake samples more accurately. When dealing with image data, the discriminator uses convolutional layers or other relevant architectures which help to extract features and enhance the model’s ability.

The Role of Discriminator in GAN Architecture

The second part of GAN architecture is the Discriminator. 

Discriminator: Function and Structure

The primary goal of the discriminator is to classify the input data as real or generated by the generator. It takes a data sample as input and gives a probability as output that indicates whether the sample is real or fake.

Discriminator: Layers and Components

Listed below are the layers and components of the discriminator neural network −

  • Input Layer − The discriminator receives a data sample from either the real dataset or the generator as input.
  • Convolutional Layers − It is used for downsampling the input data to extract relevant features.
  • Fully Connected Layers − The FLC is used to process the extracted features and make a final classification.
  • Activation Functions − It uses Leaky ReLU activation function to address the vanishing gradient problem. It also introduces non-linearity.
  • Output Layer − As name implies, it gives a single probability value between 0 and 1 as output that indicates whether the sample is real or fake.

Discriminator Loss Function: The discriminator tries to minimize this loss:

JD=1mΣi=1mlogD(xi)1mΣi=1mlog(1D(G(zi))

  • JD measures how well the discriminator classifies real and fake samples.
  • xi is a real data sample.
  • G(zi) is a fake sample from the generator.
  • D(xi) is the discriminator’s probability that xi is real.
  • D(G(zi)) is the discriminator’s probability that the fake sample is real.

The discriminator wants to correctly classify real data as real (maximize logD(xi) and fake data as fake (maximize log(1D(G(zi))).

The goal of discriminator neural network is to maximize its ability to correctly distinguish real data from generated data. This is achieved by minimizing the discriminators loss function.

MinMax Loss

GANs are trained using a MinMax Loss between the generator and discriminator:

minGmaxD(G,D)=[Expdata[logD(x)]+Ezpz(z)[log(1D(g(z)))]

where,

  • Gis generator network and is D is the discriminator network
  • pdata(x) = true data distribution
  • pz(z)= distribution of random noise (usually normal or uniform)
  • D(x) = discriminator’s estimate of real data
  • D(G(z))= discriminator’s estimate of generated data

The generator tries to minimize this loss (to fool the discriminator) and the discriminator tries to maximize it (to detect fakes accurately).

Types of GANs

There are several types of GANs each designed for different purposes. Here are some important types:

1. Deep Convolutional GAN (DCGAN)

Deep Convolutional GANs (DCGANs) are among the most popular types of GANs used for image generation.

They are important because they:

  • Uses Convolutional Neural Networks (CNNs) instead of simple multi-layer perceptrons (MLPs).
  • Max pooling layers are replaced with convolutional stride helps in making the model more efficient.
  • Fully connected layers are removed, which allows for better spatial understanding of images.

DCGANs are successful because they generate high-quality, realistic images.

Need for DCGANs:

    DCGANs are introduced to reduce the problem of mode collapse. Mode collapse occurs when the generator got biased towards a few outputs and can't able to produce outputs of every variation from the dataset. For example- take the case of mnist digits dataset (digits from 0 to 9) , we want the generator should generate all type of digits but sometimes our generator got biased towards two to three digits and produce them only. Because of that the discriminator also got optimized towards that particular digits only, and this state is known as mode collapse. But this problem can be overcome by using DCGANs.
    
    The generator of the DCGAN architecture takes 100 uniform generated values using normal distribution as an input. First, it changes the dimension to 4x4x1024 and performed a fractionally stridden convolution 4 times with a stride of 1/2 (this means every time when applied, it doubles the image dimension while reducing the number of output channels). The generated output has dimensions of (64, 64, 3). There are some architectural changes proposed in the generator such as the removal of all fully connected layers, and the use of Batch Normalization which helps in stabilizing training.  ReLU activation function is used in all layers of the generator, except for the output layers. 
    The role of the discriminator here is to determine that the image comes from either a real dataset or a generator. The discriminator can be simply designed similar to a convolution neural network that performs an image classification task.  Instead of fully connected layers, they used only strided-convolutions with LeakyReLU as an activation function, the input of the generator is a single image from the dataset or generated image and the output is a score that determines whether the image is real or generated.
2. Wasserstein GAN (WGANs):
    Wasserstein Generative Adversarial Network (WGANs) is a variation of Deep Learning GAN with little modification in the algorithm. Generative Adversarial Network (GAN) is a method for constructing an efficient generative model. Martin Arjovsky, Soumith Chintala, and Léon Bottou developed this network in 2017. This is used widely to produce real images.
    WGAN's architecture uses deep neural networks for both generator and discriminator. The key difference between GANs and WGANs is the loss function and the gradient penalty. WGANs were introduced as the solution to mode collapse issues. 

WGAN architecture


WGANs use the Wasserstein distance, which provides a more meaningful and smoother measure of distance between distributions.

W(Pr,Pg)=infγϵ(Pr,Pg)E(x,y)γ)[xy]

  • γ denotes the mass transported from x to y in order to transform the distribution Pr to Pg.
  • denotes the set of all joint distributions γ(x, y) whose marginals are respectively Pr and Pg.

Benefits of WGAN algorithm over GAN

  • WGAN is more stable due to the Wasserstein Distance which is continuous and differentiable everywhere allowing to perform gradient descent.
  • It allows to train the critic till optimality.
  • There is still no evidence of model collapse.
  • Not struck in local minima in gradient descent.
  • WGANs provide more flexibility in the choice of network architectures. The weight clipping, generators architectures can be changed according to choose.
3. Conditional GAN (CGANs):

Conditional GAN (cGAN) extends the GAN framework by including the condition information like class labels, attributes, or even other data samples, into both the generator and the discriminator networks.

With the help of these conditioning information, Conditional GANs provide us the control over the characteristic of the generated output.

Architecture of Conditional GANs

Like traditional GANs, the architecture of a Conditional GAN consists of two main components: a generator network and a discriminative network.

The only difference is that in Conditional GANs, both the generator network and discriminative network receive additional conditioning information y along with their respective inputs. Lets understand it with the help of this diagram −

The Generator Network

The generator networks, as shown in the above diagram, takes two inputs: a random noise vector which is sampled from a predefined distribution and the conditioning information "y". It now transforms it into synthetic data samples. Once transformed, the goal of the generator is to not only produce data that is identical to real data but also align with the provided conditional information.

The Discriminator Network

The discriminator network receives both real data samples and fake samples generated by the generator, along with the conditioning information "y".

The goal of the discriminator network is to evaluate the input data and tries to distinguish between real data samples from the dataset and fake data samples generated by the generator model while considering the provided conditioning information.

Conditional Information

Conditional information often denoted by "y" is an additional information which is provided to both generator network and discriminator network to condition the generation process. Based on the application and the required control over the generated output, conditional information can take various forms.

Types of Conditional Information

Some of the common types of conditional information are as follows −

  • Class Labels − In image classification tasks, conditional information "y" may represent the class labels corresponding to different categories. For example, in handwritten digits dataset, "y" could indicate the digit class (0-9) that the generator network should produce.
  • Attributes − In image generation tasks, conditional information "y" may represent specific attributes or features of the desired output, such as the color of objects, the style of clothing, or the pose of a person.
  • Textual Descriptions − For text-to-image synthesis tasks, conditional information "y" may consist of textual descriptions or captions describing the desired characteristics of the generated image.

Applications of Conditional GANs

Listed below are some of the fields where Conditional GANs find its applications −

Image-to-Image Translation

Conditional GANs are best suited for tasks like translating images from one domain to another. Translating images includes converting satellite images to maps, transforming sketches into realistic images, or converting day-time scenes to night-time scenes etc.

Semantic Image Synthesis

Conditional GANs can condition on semantic labels, hence they can generate realistic images based on textual descriptions or semantic layouts.

Super-Resolution and Inpainting

Conditional GANs can also be used for image super-resolution tasks in which low-resolution images are transformed into similar high-resolution images. They can also be used for inpainting tasks in which, based on contextual information, missing parts of an image are filled in.

Style Transfer and Editing

Conditional GANs allow us to manipulate specific attributes like color, texture, or artistic style while preserving other aspects of the image.

Challenges in using Conditional GANs

Conditional GANs offer significant advancements in generative modeling but they also have some challenges. Lets see which kind of challenges you can face while using Conditional GANs −

Mode Collapse

Like traditional GANs, Conditional GANs can also experience mode collapse. In mode collapse, the generator learns to produce limited varieties of samples and fails to capture the entire data distribution.

Conditioning Information Quality

The effectiveness of Conditional GANs depends on the quality and relevance of the provided conditioning information. Noisy or irrelevant conditioning information can lead to poor generation outputs.

Training Instability

The training instability issues observed in traditional GANs can also be faced by Conditioning GANs. To avoid this, CGANs require careful architecture design and training approaches.

Scalability

With the increased complexity of conditioning information, it becomes difficult to handle Conditional GANs. It then requires more computational resources.

Comments

Popular posts from this blog

Business and Data Analytics

Block chain Technology

Software engineering