【Stable Diffusion】I implemented "Unstable Illusion" that turns text into hallucinations!

1/23/20232/23/2023

Stable Diffusion and Unstable Illusion

Stable Diffusion is trendy, isn’t it?
This is explained in detail in the following article, etc., so please refer to it.

https://qiita.com/asparagasu/items/91d1afd4a4f207fcde68

So we paid homage to Stable Diffusion and

I created Unstable Illusion.

Stable Diffusion = Stable Diffusion
↑
vs
.
Unstable Illusion = Unstable Vision

It is.

Because

Anyone who just handles Stable Diffusion does it, so
I tried to do something like that with a little effort.

What we did was
to use TensorFlow’s DeepDream
to overinterpret the images generated by Stable Difussion to
improve the accuracy of the patterns found in the images.

Somehow, dreams are like hallucinations, and the pun is good, so I chose
Unstable Illusion.

『Be Here Now』

Have you ever read Be Here Now?

Square about 20 cm. Indian gods printed on brown straw paper, and warnings and revelatory messages from Ram Das. A pioneer of Eastern thought that spread in the early 1970s as a counter to existing Western materialism, this novel, hip, and graphical book was immediately accepted by young people around the world and became a bestseller with more than 2 million copies. It is also known as yoga and meditation to open up a new world of perception, and as a gateway to the spiritual world.

The explanation is written.

The DeepDream I introduced earlier is connected
to stories about dreams and hallucinations and is spiritual, so
I referred to “Be Here Now”.

To be honest, it wasn’t interesting because it was
a spiritual book that I didn’t understand,

"The big ice cream cone in the sky"

There is a suspicious and somewhat surreal sentence.

In this article, I’m going to
use Stable Diffusionn and TensorFlow DeepDream to extract the sentence of this passage,
The big ice cream cone in the sky, and generate an image.

What to prepare

Google Colaboratory GPU environment (*Jupyter Notebook is also OK)
Distribution source code (*Not required if you want to start from scratch)

Technology used

Python
TensorFlow

“Be Here Now” AI generates an image of “a big ice cream cone floating in the sky” as text

Stable Diffusion

Let’s install stable-diffusion-tensorflow.

!pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet
!pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
!apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2

Let’s generate.

from stable_diffusion_tf.stable_diffusion import Text2Image
from PIL import Image

generator = Text2Image( 
    img_height=512,
    img_width=512,
    jit_compile=False,
)
img = generator.generate(
    "The big ice cream cone in the sky",
    num_steps=50,
    unconditional_guidance_scale=7.5,
    temperature=1,
    batch_size=1,
)
pil_img = Image.fromarray(img[0])
display(pil_img)

A large ice cream cone is generated that floats in the sky.
I’m impressed at this point.

DeepDream

Import everything.

import tensorflow as tf

import numpy as np

import matplotlib as mpl

import IPython.display as display
import PIL. Image

Deprocess the image and display it.

def download(img, max_dim=None):
  return np.array(img)

def deprocess(img):
  img = 255*(img + 1.0) / 2.0
  return tf.cast(img, tf.uint8)

def show(img):
  display.display(PIL. Image.fromarray(np.array(img)))

original_img = download(pil_img, max_dim=500)
show(original_img)
display.display(display.HTML('Image cc-by: <a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg">Von.grzanka</a>' ))

Create a foundation model.

base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')

Create a DeepDream model.

names = ['mixed3', 'mixed5']
layers = [base_model.get_layer(name).output for name in names]

dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)

Create a loss function.

def calc_loss(img, model):
  img_batch = tf.expand_dims(img, axis=0)
  layer_activations = model(img_batch)
  if len(layer_activations) == 1:
    layer_activations = [layer_activations]

losses = []
  for act in layer_activations:
    loss = tf.math.reduce_mean(act)
    losses.append(loss)

return tf.reduce_sum(losses)

Create a DeepDream class.

class DeepDream(tf. Module):
  def __init__(self, model):
    self.model = model

@tf.function(
      input_signature=(
          tf. TensorSpec(shape=[None, None, 3], dtype=tf.float32),
          tf. TensorSpec(shape=[], dtype=tf.int32),
          tf. TensorSpec(shape=[], dtype=tf.float32),
      )
  )
  def __call__(self, img, steps, step_size):
    loss = tf.constant(0.0)
    for n in tf.range(steps):
      with tf. GradientTape() as tape:
        tape.watch(img)
        loss = calc_loss(img, self.model)

gradients = tape.gradient(loss, img)

gradients /= tf.math.reduce_std(gradients) + 1e-8

img = img + gradients*step_size
      img = tf.clip_by_value(img, -1, 1)

return loss, img

Generate a class.

deepdream = DeepDream(dream_model)

Create a DeepDream execution function.

def run_deep_dream_simple(img, steps=100, step_size=0.01):
  img = tf.keras.applications.inception_v3.preprocess_input(img)
  img = tf.convert_to_tensor(img)
  step_size = tf.convert_to_tensor(step_size)
  steps_remaining = steps
  step = 0
  while steps_remaining:
    if steps_remaining > 100:
      run_steps = tf.constant(100)
    else:
      run_steps = tf.constant(steps_remaining)
    steps_remaining -= run_steps
    step += run_steps

loss, img = deepdream(img, run_steps, tf.constant(step_size))

display.clear_output(wait=True)
    show(deprocess(img))
    print("Step {}, loss {}".format(step, loss))

result = deprocess(img)
  display.clear_output(wait=True)
  show(result)

return result

Unstable Illusion

Let’s generate.

dream_img = run_deep_dream_simple(
    img=original_img, 
    steps=100,
    step_size=0.01
)

Let’s increase the intensity of the pattern.

import time
start = time.time()

OCTAVE_SCALE = 1.30

img = tf.constant(np.array(original_img))
base_shape = tf.shape(img)[:-1]
float_base_shape = tf.cast(base_shape, tf.float32)

for n in range(-2, 3):
  new_shape = tf.cast(float_base_shape*(OCTAVE_SCALE**n), tf.int32)

img = tf.image.resize(img, new_shape).numpy()

img = run_deep_dream_simple(img=img, steps=50, step_size=0.01)

display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)

end = time.time()
end-start

Conclusion

Spiritual-ish…

Click here for DeepMagazine

https://deep-recommend.com/ja/magazine