cross-posted from: https://lemmy.world/post/1134694
KOSMOS-2: Microsoft’s New AI Breakthrough Generating Text, Images, Video & Sound in Real-Time!
Microsoft has unveiled its latest AI breakthrough, KOSMOS-2, which can generate text, images, video, and sound in real-time[1]. This multimodal large language model (MLLM) is grounded in the real world through its ability to understand and analyze image content[4]. It was trained using large-scale data of grounded image-text pairs called GrIT[2].
KOSMOS-2 is a significant step forward in AI technology, with its ability to generate content across multiple modalities[6]. It has the potential to revolutionize computer vision applications with improved efficiency, accuracy, and accessibility in image and video processing[3].
This breakthrough is a testament to Microsoft’s commitment to advancing AI technology and its potential to transform industries across the board. We can’t wait to see what the future holds with KOSMOS-2!
Citations: [1] https://youtube.com/watch?v=VxsqtoytLsA [2] https://www.microsoft.com/en-us/research/publication/kosmos-2-grounding-multimodal-large-language-models-to-the-world/ [3] https://azure.microsoft.com/en-us/blog/announcing-a-renaissance-in-computer-vision-ai-with-microsofts-florence-foundation-model/ [4] https://arstechnica.com/information-technology/2023/03/microsoft-unveils-kosmos-1-an-ai-language-model-with-visual-perception-abilities/ [5] https://www.linkedin.com/posts/trishuhl_generativeai-multimodal-ai-activity-7040590986057564160-ImJp [6] https://www.cjco.com.au/article/news/unleashing-the-power-of-kosmos-2-a-leap-forward-in-ai-tech-with-grounded-multimodal-language-models/
Is there somewhere we can see a demo?
I found this that says it offers an online demo, but the actual demo link wouldn’t work from my computer. Maybe you’ll have success.
502 still, but thanks for the link! This is super interesting. I’m curious to see where it goes.
Are they making this open source as well?