CSIP: PRELIMS BOOSTER SERIES- 143 SCIENCE AND TECHNOLOGY

News

MULTIMODAL AI

Multimodal AI systems enable users to interact with AI through various modalities, including images, sounds, videos, and text. This approach mirrors how humans understand and interpret information from diverse sources.

How Multimodal Artificial Intelligence Works:

Multimodal AI involves combining different modalities, like text and images or text and audio, during training. This equips AI systems to understand and generate content across various modalities. For instance, OpenAI’s DALL.E model connects text and images to generate visual content based on text prompts, while GPT utilizes Whisper, its speech-to-text translation model, to process voice inputs.

Applications of Multimodal AI:

Multimodal AI has diverse practical applications, including:

  • Automatic image caption generation.
  • Detection of hate speech in memes.
  • Prediction of dialogue in videos.
  • Potential applications in fields like medicine, autonomous driving, and robotics.

In medicine, multimodal AI is valuable for processing complex datasets from sources like CT scans. For speech translation, AI models can perform a wide range of translations, such as text-to-speech, speech-to-text, speech-to-speech, and textto-text, for multiple languages.

Additional AI Models

GEMINI: Gemini is Google’s next-generation foundation model, succeeding PaLM 2, the current AI model behind Google’s Bard chatbot and other recently announced features.

STABLE DIFFUSION: Stable Diffusion is a latent diffusion model, a type of deep generative artificial neural network used for generating and editing realistic images.

MID-JOURNEY: Midjourney is a generative AI program and service hosted by the independent research lab Midjourney, Inc. It generates images from natural language descriptions, known as “prompts,” similar to OpenAI’s DALL-E and Stability AI’s Stable Diffusion.

WHISPER: Whisper is an automatic speech recognition (ASR) system developed by OpenAI. It’s trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

To know more: https://www.thehindu.com/sci-tech/technology/what-ismultimodal-artificial-intelligence-and-why-is-it-important/article67401139.ece