Accessibility and AI-generated art

Looking at the overlap of good accessibility and creating art using Artificial Intelligence.

Vishal Ramawat
UX Collective

--

Even if you know about accessibility, you can still learn more and get awed by different solutions, in both the digital and physical realms, that are making life easier for people with special needs.

The UN Convention on the Rights of Persons with Disabilities states that the access to the physical environment, to transportation, to information and communications, including information and communications technologies and system including the Web, as a basic human right.

This new James Webb Telescope near and mid-infrared composite image highlights the Cartwheel Galaxy, the result of a high-speed collision that occurred about 440 million years ago, along with two neighboring galaxies.
James Webb captured the Cartwheel Galaxy. NASA

How do you explain this image to someone over a voice call? How does a person, who has visual impairment or uses screen readers or browser does not load the images intentionally/ unintentionally, know about what image is conveying? This is how NASA did it when it Tweeted this image and included the following Alt text:

A large galaxy on the right, with two much smaller companion galaxies to the left at 10 o’clock and 9 o’clock. The large galaxy resembles a speckled wheel, with an oval outer ring and a small, off-center inner ring. The outer ring contains pink plumes like wheel spokes, with dusty blue regions in between. The pink areas are silicate dust, while the blue areas are pockets of young stars and hydrocarbon dust. The inner ring is smoother, filled in with a more uniform pale pink. This smaller ring is interwoven with thin, orange-pink threads. On the galaxy’s right edge, a bright white star with 8 diffraction spikes shines. The two companion galaxies to the left, one above the other, are about the same size and both spiral galaxies. The galaxy above is a reverse S shape but similar in coloring and texture as the large ring galaxy. The galaxy below is smoother and largely white, with a blue tinge. The background is black and full of more distant, orange-red colored galaxies of various sizes.

Although, we know more or less about ‘Alt text’, which is a contraction of ‘Alternative Text’, here is a quick summary. These text descriptions are short, concise description of the image, which for various reasons cannot be ‘seen’ by the end user. The way NASA uses the alt text descriptions for its images, it make their images, along with the science of images, accessible to everyone. This means that for common people who don’t have any visual impairments, as they can learn more by reading the scientific details behind the different shapes and colors being formed in the image.

Alt text not only helps in accessibility, but also add ons to Search Engine Optimisation of the web pages. But, I will not go into more details about Alt text, rather will now talk about generating art through assisted Artificial Intelligence.

Comparison showing that without Alt Text, screen readers speak out irrelevant image name like img584792.jpg, while having an Alt Text screen readers are able to call. out that it is an image of a Cute Cat.
Alt Texts are helpful for Screen readers, Search Engines, Topical Relevance and other scenarios where images fail to load. Source: ahrefs

If you are using ‘any’ social media, personal or professional, you might have come across posts of people creating amazing ‘images’ by using different tools like ‘Midjourney’, ‘DALL-E’, ‘Nightcafe’ and several others. We get mesmerised by looking at how A.I. is evolving everyday, extending human imaginations beyond reality. In simple terms, we have got an ability to materialise our thoughts through least possible efforts in least possible time. Painting an oil painting, making a sculpture, or composing a digital art, different mediums, different techniques let you materialise your thoughts. But they consume a lot of time, material and patience.

Behold, the era of A.I. generated art is here. Using Natural Language descriptions, known as Prompts, users can now ‘tell’ A.I. to generate art from text. Not only you can generate photorealistic images, you can also restore old family photos, scale up images, colorise black and white photos, generate sounds, voices, dance moves, music and much more simply by writing your prompts or using assistive modes. You can use it with different images, styles, artists, concepts and attributes to create different results and effects.

There have been new models being developed like GANs, VAEs, Autoregressive, CLIP+VQGAN and diffusion models like Stable Diffusion that are constantly pushing the boundaries of A.I. generated art. Google has explained in detail about these technologies in their blog post.

Now when you start using these tools, you will realise how much it matters to be able to write a good prompt. The more descriptive and contextual you write, the closer the generated image looks to what you imagined.

Two ladies in traditional Indian attire standing in from of small cottages in an Indian village, overlooking misty forests. Art style by greg rutkowski, ross tran and fenghua zhong.Two ladies in traditional Indian attire standing in from of small cottages in an Indian village, overlooking misty forests. Art style by greg rutkowski, ross tran and fenghua zhong
Prompt: Kerala village, sharp focus, wide shot, trending on artstation, masterpiece, by greg rutkowski, ross tran and fenghua zhong. octane, soft render, oil on canvas, colorful, cinematic, environmental concept art. Sampling: K_LMS
Two ladies in traditional Indian attire standing in from of small cottages in an Indian village, overlooking misty forests. Art style by greg rutkowski, ross tran and fenghua zhong. This image has different Sampling method than last one.
Prompt: Kerala village, sharp focus, wide shot, trending on artstation, masterpiece, by greg rutkowski, ross tran and fenghua zhong. octane, soft render, oil on canvas, colorful, cinematic, environmental concept art. Sampling: K_EULER_ANCESTRAL

The above two images have same context, “Kerala Village” and same seed value but different sampling method. You will notice how similar both the images look, yet there are minor differences that exist.

A group of villagers standing around a bonfire illuminated place beside a tree and a big village cottage that has Japanese architectural influence. Art style by Bryan Hitch, Noriyoshi Ohrai, Vincent Di Fate, Victo Ngai, Arai Yoshimune and Junji Ito
Prompt: Kerala village by Bryan Hitch, Noriyoshi Ohrai, Vincent Di Fate, Victo Ngai, Arai Yoshimune, Junji Ito, hyperdetailed, trending on Artstation, VRay, expansive, 8K resolution, subtractive lighting, romanticism, poster art, epic, surrealism

The above image also has the same context, “Kerala Village” but different styles, attributes and seed. The way A.I. decodes the prompt is influenced by the datasets on which it has been trained. The same context can be depicted through enormous number of ways as it is a matter of the permutations and combinations that one can play around with.

So, now the question arises on the overlapping of a good accessibility and generating A.I. art. While submitting my previous story to the publication UX Collective, the editor, Fabricio Teixeira, reviewed the story and asked me to add Alt Text to my images (thanks Fabricio!!). As I started writing the Alt Text, I realised that this is what I have been doing while creating my digital art work using A.I. generated tool, NightCafe. There I was describing to A.I. what I have thought of in best possible combination of words. Here, while writing the Alt Text, I was doing the same thing. Explaining what the image portrays or is trying to convey. This triggered my interest in exploring more about the Alt Texts and how the A.I. generated art works.

It is really interesting to see that how two different tangents of Physical and Digital realms are crossing each other at the intersection of Accessibility!

--

--