At the gold rush of generative AI, OpenAI, backed by Microsoft, has launched a new version of ChatGPT which has left the tech industry in wonders. It is called GPT-4o– the ‘o’ stands for omni- which gives the chatbot the ability to create videos, audios and images.
“We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.” says Mira Murati, CTO of OpenAI.
GPT-4o is “much faster” and more efficient than the previous versions. It has brought accessibility to another level by making Chatgpt-4 available to all OpenAI’s users, including non-subscribers. It is closer than ever to science-fiction theory by providing video and audio chat and detecting emotions of the users. A post by Sam Altman, CEO of OpenAI, says that the model is “natively multimodal,” which means the new model could create content or understand commands in voice, text, or images.
Mike Chen and Barret Zoph, both research leads at OpenAI, explained the usability of the new model with live demonstrations. The most impressive of all the facilities was the live conversation. You could intercept the model while it is answering and it would simply pause, listen and then pivot to give a more suitable answer. With the new GPT-4o, the tone of the model can also be changed.
When asked to read a ‘bed-time story’ by Chen, the model proceeded to read the story with different voice tones, be it robotic or gentle. The model was then commanded by Murati to change the tone from dramatic to robotic and it pivoted smoothly without any breaks. Additionally, GPT-4o can detect sarcasm, fast counting, dad jokes and so much more along with emotions.
Live demonstrations also show that this model can solve mathematical problems such as (3x+1=4) written on a piece of paper, as well as teach them the solution in the mannerism of a teacher. “The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”
It analyzed some computer code, gave translations between English and Italian and interpreted the emotions in a selfie of a smiling man. It is also told by OpenAI that the new model can respond to users’ audio prompts “in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.” OpenAI have tried to make the new model as human-like as possible with the real-time interactions so that it gives the feeling of talking to a person when in reality it is advanced technology.
In 2023, 700 generative AI deals have been invested in with the value of $29.1 billion. This is 260% more than what was the year prior to that and is estimated to go over $1 Trillion in revenue within the next 10 years. It is said by OpenAI that more than 500 prestigious companies are using ChatGPT. They have had the most record-breaking launch since 2022, with a hundred million weekly active users, and they only seem to be growing more and more successful as the days progress.