Elon Musk's company xAI has introduced a new model of the Grok chatbot, capable of processing requests in various formats.
The presentation took place a few weeks after the release of the previous version.
"Grok-1.5V competes with existing multimodal models in a number of areas, from interdisciplinary reasoning to understanding scientific diagrams, graphs and screenshots," the blog says.
The developers provided several examples in the press release demonstrating the new features of the chatbot:
converting a flowchart outline into Python code;generating a bedtime story from a children's drawing;explanation of memes; converting a table to a CSV file format.An example of translating a schematic outline into Python code. Data: xAI.Having tested analogues of GPT-4V, Claude 3Sonnet, Claude 3 Opus and Gemini Pro 1.5, xAI claims that its multimodal model occupies a leading position in many respects.
Comparison of AI models. Data: xAI.The company's representatives emphasized that Grok-1.5V surpasses its competitors in the RealWorldQA benchmark, a new metric created to assess spatial understanding of the real world.
Examples of passing RealWorldQA. Data: xAI.To pass the test, the AI model was trained on more than 700 images, accompanied by a question and answer for each element. xAI has made RealWorldQA publicly available under a Creative Commons license.
Grok-1.5V appeared less than a month after xAI published the open source code of the model.
According to the developers, "significant" updates will be made in the coming months to the chatbot's capabilities for understanding and generating multimodal signals.
Early testers and current users will have access to Grok-1.5V soon.
Recall that in December 2023, representatives of xAI sent a notification to the SEC about plans to raise $ 1 billion through the private sale of equity securities.