Have you heard about text-to-image models like DALL-E 2, Stable Diffusion and MidJourney? These are AI algorithms that take in text (the “prompt”) that describes what kind of picture you want as input and as output the algorithm creates that picture, based on billions of images.
An example could be “an astronaut on a bicycle on the moon by Van Gogh”. And this would be one of the results:
I got access to DALL-E 2 in July this year. DALL-E 2 is a closed source algorithm made by OpenAI. You can sign up to request access to DALL-E 2. Once you get access you can use it for free for a limited of runs. After that you have to pay to use it more.
Then, in August, Stable Difussion, an open source text-to-image model, was released. You can install this on your own computer. You can use this completely for free and since installation I’ve used it endlessly.
Here is another example of a Stable Diffusion prompt I’ve used: “A lander on the planet Proxima Centauri b, cinematic photo, highly detailed, cinematic lighting, ultra-detailed, ultrarealistic, photorealism, 8k, octane render”:
Also, you can insert an image of your own (512×512 pixels) and ask the algorithm to change the image based on your prompt. In the example below I put in a real photo of myself in cycling gear and tulip fields in the background as the source.
Combined with the text prompt “A medieval portrait” this was one of the results:
How to run Stable Diffusion
Stable Diffusion is open source and so you can install it on your own computer. That computer needs a good GPU however or an Apple M1 or M2 processor. I happen to have a MacBook Pro with M1 processor.
There is a lot of Python code to run Stable Diffusion ready to download. At first I’ve followed these instructions on Twitter to install Stable Diffusion. That worked well, except for the image-to-image part. I got some Python errors there that I could not solve.
Then I found ImaginAIry by Bryce Drennan. And that worked quite well for me.
The Death by Powerpoint image
It’s a lot of fun coming up with interesting prompts to set Stable Diffusion to work. But its purposes are not just for entertainment. The images Stable Diffusion are yours to use and don’t have copyright issues. (Although I’m expecting some text-to-image related lawsuits in the future. But that’s a different story.)
I think Stable Diffusion could be an excellent source of images to use in presentations. For a while I wanted to bring a presentation of mine called “How to sell security/data governance” to Youtube. One example of an image I want to use in that, is one that displays the concept of “Death by Powerpoint”. I had some images like that from comics and I tried to reach one of the creators to ask if I could use their material or somehow pay for usage. But I never got a response.
So this looks like a nice use case for Stable Diffusion. First I tried just entering “Death by Powerpoint” as prompt. Well, Stable Diffusion certainly understands the concept of Powerpoint. It just does not convey the “Death by ..” part that well.
The algorithm doesn’t understand text
This is one typical example when you enter “Death by Powerpoint” as prompt:
This result shows an important limitation of most text-to-image models. They don’t understand text. O sure, they interpret the text in your prompt. But they don’t understand the text they display. It puts some letters that are “often seen together” next to each other. Even the font will look believable. Any resemblance with real words is purely coincidental though.
Some people are working to improve this. But I’ll just keep on working with what I have right now.
Moving on. Some “Death by Powerpoint” depictions are from webcomics. What if I use “Death by powerpoint comic style” as a prompt?
Okay. Clearly a bad idea. Since lots of comics have text and the algorithm doesn’t understand it. Can we tell it not to use text than? Let’s try the prompt “Death by powerpoint comic style no text”.
Clearly that “no text” part was ignored. I’ve learned there should be a way to use negative prompts in Stable Diffusion. But that hasn’t been build into ImaginAIry yet.
But maybe I should try a different approach: describe exactly what I want. So what does “death by powerpoint” look like? Well I image people in a meeting room bored to tears while they listen to a speaker. So let’s ask for that.
Okay, not bad. But the faces are a little off. Actually in some results the faces can be way off. Because these algorithms don’t exactly understand human faces either. Luckily ImaginAIry has a –fix-faces option.
Also let’s to “business-ify” the people in the image by some degree. So we need to put that in the prompt also.
Like a stock photo?
Then I had a great idea. You know where a lot of business people in images show up? In stock photos! So let’s tell the algorithm to make it more stock photo-like.
But do you know what stock photos also often have? Watermarks! I let Stable Diffusion make 20 images based on the prompt “business people bored to tears in a meeting room attending a presentation, stock photo”. They all had remnants of watermarks. I could even read the watermark: Dreamstime. That is an actual brand of stock photos.
I’m not the first one to find this. It already appeared on YCombinator’s Hacker News. I’m not sure how all this ties in with the end results not having copyright issues.
So no stock photo’s then. Also, the people in the results don’t really seem to be bored enough. Let’s say that they must be asleep. New prompt: “business people fallen asleep in a meeting room while someone is talking”.
Okay, clearly if you have four dimensional hands in front of you, no –fix-faces option will save your face.
Can you draw it? Dramatically?
Is a photo really the best option? Or should we ask Stable Diffusion to draw something for us? I regularly look on prompthero.com for inspiration and found this dramatic drawing. And I applied that to my project. Here is “one person is talking in front of a group of business people who have fallen asleep, slumped in their seats in a meeting room. Yoji Shinkawa, ink, black and white”:
Not bad. Not bad at all. Very dramatic. Good for a couple more iterations I’d say.
Why won’t you lie down and sleep?
I’ve done many iterations now of prompts with “business people sleeping in a meeting room” and I’ve noticed that in many of the resulting images there are business people and.. they are standing and sitting up. They might look a little drowsy. But they are rarely sleeping.
It’s as if the idea of business people asleep in a meeting room is just .. too far fetched. You want to display a photorealistic image of an astronaut on a horse? An alpine marmot driving a roadster on Mars? A cow-frog hybrid in clown suit riding a unicycle painted by Rembrandt? Sure. Sleeping business people? Well, that’s just ridiculous.
I mean, do these people look like sleeping to you?
And to be sure: as soon as I drop the business part, there you go: people who are actually sleeping.
Pencil drawings tend to work quite nice:
I’ve heard that results will get better if you use the end result back in as an input for an image-to-image run. So that’s what I gave a try: generate 20-30 images, pick out the result that is most to my liking. And put it back in. That works quite well. Here you see how the image changes in 5 iterations.
Hard to pick a winner, because I feel I haven’t quite arrived at the result I was looking for. But after 40 runs of Stable Diffusion it’s time to call it quits.
And I guess the image belowower comes close as a depiction of an audience enduring a long, agonising and boring presentation.