Even though OpenAI GPT 4 is the best generative AI tool we’re looking ahead. OpenAI CEO Sam Altman frequently teases GPT 5 suggesting a new improved AI model soon. GPT 5 launch date is unknown and most of what we know comes from connecting the dots. Regardless of the release date we want a few key features in GPT 5.
What Is OpenAI GPT-5?
The successor to OpenAI GPT 4 AI model GPT 5 is expected to be the most potent generative model. There is no proper release date for GPT 5 but it may be released in summer 2024. While little is known about the model several things are certain
- OpenAI registered its name with the USPTO.
- Several OpenAI executives have hinted at the model capabilities.
- OpenAI CEO Sam Altman frequently mentioned the model in a March 2024 YouTube interview with Lex Fridman.
These show GPT 5 imminent arrival Many things are still speculations. We hope and expect a few things from the model. Here are some
More Multimodality
Multimodality is one of the most exciting improvements in the GPT AI model. Multimodality means an AI model can process text images, audio and video. Multimodality will be a landmark for GPT model development. GPT 5 is a good starting point for OpenAI audio and video processing improvements as GPT 4 is already good at image inputs and outputs.
Google Gemini AI model is making impressive multimodal progress. Not responding is uncharacteristic of OpenAI. However, don’t believe us. In the Unconfuse Me podcast PDF transcript Bill Gates asked OpenAI CEO Sam Altman about GPT series milestones in the next two years. His first response Processing video.
We expect to upload videos as prompts, create videos on the go, edit videos with text prompts, extract segments and find specific scenes from large video files in GPT 5. We expect audio file-like capabilities. Yes it’s a big ask. However AI development is fast so it is reasonable.
More Effective And Larger Context Window
The GPT family of AI models has one of the most miniature context windows despite being one of the most advanced. Anthropic Claude 3 features a 200000 token context window while Google Gemini can handle 1 million tokens 128000 for standard usage. GPT 4 has a context window of 128000 tokens with only 32000 tokens available for interfaces like ChatGPT.
Advanced multimodality guarantees a better context window. A factor of two or four might work but we want a factor of ten. GPT 5 can process more data more efficiently. A bigger context window is sometimes better. We’d instead improve context processing efficiency than increase the context window.
Despite having a one million token context window around 700000 words capacity a model can only process part of the context and fails to summarize a 500000 word book. Reading a 500000 word book doesn’t mean you can remember or process it.
GPT Agents
An exciting possibility for GPT 5 is the debut of GPT Agents. Although game changers have been overused in AI, GPT agents would be game changers. Today AI models like GPT 4 can help you finish a task. They can write an email joke, math problem or blog post for you. However they can only do that task and not the related tasks needed to finish your job.
Imagine being a web developer. Your job requires you to design code troubleshooting and more. At present AI models can only handle a portion of these tasks. You could ask the GPT 4 model to code the home page contact page About page etc. You’ll need to repeat these tasks. Models cannot complete some tasks.
Urging AI models for specific subtasks takes time and effort. The web developer must coordinate and prompt the AI models one task at a time until they finish related tasks. GPT Agents promises GPT 5 coordinated expert bots that can self prompt and complete all subtasks of a complex task. Focus on self prompting and autonomy.
You could ask GPT 5 to build a portfolio website for Maxwell Timothy instead of writing me a code for the homepage. GPT 5 could then self prompt by invoking expert AI agents to complete website building subtasks. It could use one GPT to search the web for Maxwell Timothy information, another agent to write page code, another to generate and optimize images and even another AI agent to deploy the site without human prompting.
Less Hallucination
The test for GPT 5 will be its ability to address hallucinations which have slowed AI adoption in high stakes safety critical domains like healthcare aviation and cybersecurity. OpenAI has made progress in this area. AI would greatly benefit these areas but it has yet to be widely adopted. A hallucination occurs when the AI model generates and confidently presents plausible sounding but wholly fabricated information.
Imagine a diagnostic system that analyzes patient symptoms and medical reports using GPT 4. A hallucination could cause the AI to confidently diagnose incorrectly or recommend a dangerous treatment based on imagined facts and false logic. Medical errors can have dire consequences.
Aviation nuclear power maritime operations and cybersecurity have similar concerns. We don’t expect GPT 5 to eliminate hallucinations but it should reduce their frequency. As we await the release of this highly anticipated AI model, GPT 5 could redefine artificial intelligence and usher in a new age of human machine collaboration and innovation.