Elon Musk‘s OpenAI rival xAI has released the first iteration of Grok, which can handle visual data. First-generation multimodal AI model Grok-1.5V from the business can process not just text but also “documents, diagrams, charts, screenshots, and photographs.” xAI provided some examples of real-world applications for its capabilities in its launch. For example, you can ask Grok to construct a novel based on a drawing, translate a photo of a flow plan into Python code, or even explain a meme that you don’t understand. Hey, not everyone has the time to read everything that appears on the internet.
Only a few weeks have passed since the business unveiled Grok-1.5. That model was intended to outperform its predecessor in math and coding tasks and to handle longer contexts, allowing it to cross-check information from several sources to gain a deeper understanding of specific queries. Grok-1.5V’s capabilities will soon be available to early testers and current users, according to xAI, though it did not specify when exactly this would happen.
In addition to introducing Grok-1.5V, the company has also released a benchmark dataset it’s calling RealWorldQA. You can use any of RealWorldQA’s 700 images to evaluate AI models: Each item comes with questions and answers you can easily verify, but which may stump multimodal models like Grok. xAI claimed its technology received the highest score when the company tested it with RealWorldQA against competitors, such as OpenAI‘s GPT-4V and Google Gemini Pro 1.5.