Gemini 1.5: A New Stage in the Development of Artificial Intelligence from Google.
Date
Gemini (Generalized Multimodal Intelligence Network), developed by Google, is a multimodal intelligent network capable of processing multiple types of data and a wider range of tasks simultaneously. Gemini can handle text, images, audio, video, 3D models and even graphics.
Furthermore, Gemini is not just a single model, but a network of models, each contributing to the overall performance of the system. Models in the network collaborate with each other, sharing information and learning from each other, making Gemini an incredibly versatile and powerful AI tool.
Gemini, Pro and Nano versions were released at the end of last year, and last week Google introduced Gemini 1.0 Ultra and presented the latest performance updates to its products. Currently, Cloud developers and customers can get started with Gemini 1.0 Ultra and the Gemini API in AI Studio and Vertex AI. Google also announced the next generation of its model – Gemini 1.5, which already has significantly higher performance and is built using the new Mixture-of-Experts (MoE) architecture.
The first model of this generation, the Gemini 1.5 Pro, is a medium-sized multimodal model optimized for a wide range of tasks and with an experimental long context understanding function.
Gemini 1.5 Pro has a default context window of 128,000 tokens, and is available for testing with a context window of up to 1 million tokens via AI Studio and Vertex AI. Google is actively working on optimizing the model to reduce latency, lower computational requirements, and improve user experience. The continuous development of next-generation models opens up new opportunities for users, developers and enterprises to implement and use artificial intelligence.
What’s new:
- Gemini 1.5 uses cutting-edge research in transformer architecture and Mixture-of-Experts (MoE). MoE models allow choosing the most important expert paths of the neural network depending on the type of data, which increases the effectiveness of the model. Google pioneered the application of MoE techniques in deep learning.
- Gemini 1.5 Pro has an extended context window that allows you to process huge amounts of information at once. The model successfully passes tests for analysis of text, code, images, audio and video, significantly ahead of previous versions in terms of efficiency. She also demonstrates impressive learning skills in context, the ability to learn new skills based on cues.
- Gemini 1.5 Pro has high performance even when the context window is enlarged. The model shows high accuracy in tasks related to searching for information in long blocks of text, and successfully copes with tasks of machine translation into various, even very rare, languages of the world. Google continues to develop new variable tests with the aim of deeper and more extensive evaluation of the new capabilities of the model.
Google also conducts ethics and security testing, which includes rigorous model testing and integration of results into management processes. Before the release of version 1.5 Pro, a thorough assessment was carried out in the field of content security and representation, and new state-of-the-art tests were specially developed to take into account the advanced capabilities of the new model.
A limited preview of 1.5 Pro is available to developers and enterprise customers through AI Studio and Vertex AI. The model comes with a default context window of 128,000 tokens, with further plans to introduce pricing tiers starting at this tier and scaling up to 1 million tokens.
Early testers can use the context window for free for 1 million tokens during the testing period. It is also planned to significantly increase the speed of processing requests. Developers can sign up to test version 1.5 Pro in AI Studio, and enterprise customers can contact the Vertex AI team.
You can learn more about Gemini’s capabilities on the official website.