January 6, 2024 | Posted in News
Google Gemini partially launched this week with claims that a forthcoming high-end version will outperform OpenAI’s GPT-4. But for software developers, choosing an AI coding assistant isn’t just a matter of which foundational model has most recently leapfrogged another.
On the heels of Microsoft adding support for GPT-4 Turbo to Windows 11 and search engine Bing this week, Google launched a comparable Pro version of Gemini 1.0 for its Bard search engine and the lower-end Gemini Nano for Pixel 8 Pro phones.
Starting December 13, developers will have access to Gemini Pro in Google AI Studio and Google Cloud Vertex AI. The high-end Gemini Ultra, for which Google published early testing benchmarks that exceed GPT-4, will be released early next year.
Google’s AI updates this week also encompassed deeper layers of the generative AI IT stack; it released a new version of its tensor processing unit (TPU), which was used to train Gemini as well as a Google Cloud AI Hypercomputer service –a combination of hardware, software and resource management utilities similar to those used to train Gemini. Hypercomputer utilities include a new dynamic resource scheduler to address demand for GPU and TPU capacity in cloud computing.
Google’s PaLM 2 large language model (LLM) was seen as less mature than OpenAI’s GPT, especially once GPT-4 became available in March. AWS made strides in its own AI-specific chips with the AWS Trainium and Inferentia product lines and it formed a partnership with NVIDIA.
With Gemini, Google has arguably surged ahead on all those fronts, at least in terms of theoretical technical specs, said Andy Thurai, an analyst at Constellation Research.
“Google is taking [the lead] on competitors in three major categories – on OpenAI/Microsoft ChatGPT with their Gemini announcement, on AWS/NVIDIA with their infrastructure [and] TPU chips … [and] on IBM/HP/Oracle with their Hypercomputer,” Thurai said.
At least on the surface, Gemini also has some key potential differentiators from OpenAI and Microsoft’s GPT-4 based tools, he said.
“First, it is multimodal from the ground up. Technically, this means this LLM could cross the boundary limitations of modalities” such as text, code, and image data, Thurai said. “Second, they also released three model sizes rather than one size fits all; third, [it will have] a lot of safety guardrails to avoid any toxic content.”
This last point, however, has already been a sticking point for Google Gemini, which was reportedly delayed from an original planned launch this month in part because of issues with its chat responses in languages other than English. Google spokespeople including CEO Sundar Pichai were careful to emphasize trust and safety in a blog post and video associated with this week’s launch.
“We’re approaching this work boldly and responsibly,” Pichai wrote. “That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working collaboratively with governments and experts to address risks as AI becomes more capable.”