USA: The potential of ChatGPT is mesmerizing the world. Many believe this signals the end of Google's hegemony in the search market. Microsoft recently revealed its intention to integrate ChatGPT into Bing Search and Azure cloud service, which has further worried the company.
Google has published a blog post detailing its AI and machine learning research and development to address the concerns of investors (ML).
Most discussions of AI currently focus on language models. Everyone has been surprised by the large language model's ability to produce "coherent, relevant, and natural-sounding responses" and accomplish a wide variety of tasks, including writing code.
Also Read: PM Modi at Rozgar Mela: Citizen Is Always Right
Creating content, and providing complex answers. LaMDA, a reportedly "sensitive" machine-learning language model being developed by Google, is trained on dialogues.
The company is investigating with LaMDA how the language model can be used for safe and immersive dialogue with multiple turns. ChatGPT has demonstrated how easily it can weave conversations with multiple turns.
However, some of its solutions veer into dangerous territory. In this regard, Google's emphasis on safe and concrete responses could benefit the company in the race against AI.
PaLM (Pathways Language Model), a 540 billion parameter language model built on the company's Pathways software infrastructure, is another language model Google is working on.
According to Google the work on PaLM has shown how large language models that have been trained on "multilingual data and large amounts of source code" can perform a variety of tasks, even if they need to be specifically trained to do so. have not been trained. tasks.
One of the biggest problems in AI is multi-step reasoning. Getting an AI system to break down complex problems into smaller tasks is not as easy as it sounds, then combine solutions for those tasks to tackle the larger problem. Google is developing "chains of thinking", which encourage language models to outline the steps needed to reach the answer.
The use of "chain of thought prompting", according to Google, will enable the language model to produce "more structured, organized and accurate responses".
Also Read: Geotail mission by NASA ends after 30 years
According to the company, this method increases the likelihood that the models will arrive at the correct solutions to complex problems that require multiple stages of inference. This will be particularly useful in tackling challenging scientific and mathematical problems.
The computer vision field of AI is developing rapidly. It focuses on emulating the complexity of the human vision system so that computers can recognize and process objects in a similar way to humans.
Google has by far made the most significant contribution to this field by pioneering the use of the Transformer architecture in computer vision applications instead of convolutional neural networks.
Google is developing several models for computer vision. Both local and non-local information from a vision model is combined by MaxVIT (Multi-Axis Vision Transformer).
On classification tasks and other object detection tasks with very low computational cost, this method has been proven to outperform other models on ImageNet-1k (the core dataset for related models for computer vision tasks).
With Pix2Seq, the tech giant tries to approach object detection from a different angle. In contrast to the traditional task-specific approach, Google views object detection as a language modeling task conditional on observed pixel inputs.
The model is trained to identify the location and other features of interesting objects in the image. This system has generated competitive results according to Google.
Understanding the 3D structure of real-world objects from one or a few 2D images is a major challenge in computer vision. With the LOLNerf program, Google has made significant progress in tackling this problem.
It can determine the 3D structure of an object from a single 2D image. This was accomplished by using a variety of examples of a specific class of objects to train the model.
ML models often focus on a single type of data. In an effort to move forward, Google is investigating multi-modal models or models that can handle multiple modalities.
According to the business, combining different modalities followed by some steps of modality-specific processing and then combining features from different modalities via a convolution layer would be effective in such scenarios.
Also Read: New voice status functionality for Android beta users is released by WhatsApp
The most common AI models after language processing models are generative models. The leading lights of the world of generative models are text-to-image models.
Text-to-image models conjure up images of DALL-E and stable diffusion in our minds. In-house image generation models from Google include Imagen and Parti. The latter makes use of an autoregressive transformer network, while the former is based on diffusion.
It's challenging to create generative models for videos. primarily due to the additional dimension of time. Two such models, Phenaki and Imagen Video, are under development by Google.
High-resolution videos are produced by Imagen Video using cascaded diffusion models. The length of videos produced in that way is still under development by the company. In contrast, Phenaki is a transformer-based model.
Google urges responsible AI in its blog post's conclusion. The company stated that leaders in ML and AI "must lead in state-of-the-art approaches to responsibility and implementation, as well as state-of-the-art technologies."
It is unlikely that the company would be able to address concerns about OpenAI overtaking competitors in the ML and AI fields with the justification of responsible AI