The larger multimodal language model, GPT-4, is ready for prime time, although, contrary to reports circulating since Friday, it does not support the ability to generate text-to-video.
However, GPT-4 can accept image and text input and generate text output. OpenAI explains on its website that on a range of domains – including text and documents containing photographs, diagrams or screenshots – GPT-4 exhibits similar capabilities as it does on text-only inputs.
However, this feature is in “research preview” and will not be publicly available.
OpenAI explained that GPT-4, while less capable than humans in many real-world scenarios, demonstrated human-level performance on various professional and academic benchmarks.
For example, he passed a mock bar exam with a score in the top 10% of test takers. In contrast, the GPT-3.5 score was down about 10%.
surpasses past models
One of the early users of GPT-4 is Casetext, maker of CoCounsel, an AI legal assistant that it says has been able to pass both the multiple-choice and written portions of the Uniform Bar Exam.
“GPT-4 surpasses the power of earlier language models,” Pablo Arredondo, CaseText’s co-founder and chief innovation officer, said in a statement. “The model’s ability not only to generate text, but to interpret it, is nothing less than a new era in the practice of law.”
“CaseText’s Co-Counsel is changing how law is done by automating important, time-intensive tasks and freeing up our attorneys to focus on the most impactful aspects of the practice,” said Frank Ryan, U.S. President of global law firm DLA Piper. how to practice.” Press release.
OpenAI reported that it spent six months aligning GPT-4 using lessons learned from its adversarial testing program as well as ChatGPT, resulting in its best-ever results for — though not perfect — factuality, Maneuverability and refusal to go outside the handrail.
It added that the GPT-4 training run was phenomenally stable. It was the company’s first major model capable of making accurate predictions ahead of time about its training performance.
“As we continue to focus on reliable scaling,” it wrote, “we aim to sharpen our methodologies to help predict and prepare for future capabilities — Something we consider important for security.”
OpenAI notes that the difference between GPT-3.5 and GPT-4 can be subtle. The difference emerges when the complexity of the task reaches a sufficient extent, it explained. GPT-4 is more reliable and creative and can handle more granular instructions than GPT-3.5.
GPT-4 is also more customizable than its predecessor. OpenAI explained that instead of the classic ChatGPT personality with a certain verbosity, tone, and style, developers — and soon ChatGPT users — can now set the style and function of their AI by describing those directions in a “system” message. System messaging APIs allow users to customize their users’ experience within significant limits.
API users will initially have to wait to try that feature, however, as their access to GPT-4 will be restricted by a waiting list.
OpenAI acknowledged that despite its capabilities, GPT-4 has the same limitations as earlier GPT models. Most importantly, it is still not completely reliable. It “hallucinates” facts and makes logical errors.
Great care should be taken when using language model outputs, especially in high-stakes contexts, OpenAI warned.
GPT-4 can also be overconfident in its predictions, not paying attention to double-checking work when there is a possibility of error.
Anticipation for a new release of GPT was halted over the weekend after a Microsoft executive in Germany suggested that text-to-video capability would be part of the final package.
“We will introduce GPT-4 next week, where we have multimodal models that will offer completely different possibilities – for example, video,” Andreas Braun, Microsoft’s chief technology officer in Germany, said at a press event on Friday.
Text-to-video will be very disruptive, said Rob Enderle, president and principal analyst at the Enderle Group, an advisory services firm in Bend, Ore.
“It could dramatically change how movies and TV shows are made, how news programs are formatted by providing a mechanism for user customization,” he told TechNewsWorld.
Enderle said an early use of the technology could be in creating storyboards from script drafts. “As this technology matures, it will move into something closer to a finished product.”
The content created by text-to-video applications is still basic, noted Greg Sterling, co-founder of Near Media, a news, commentary and analysis website.
“But text-to-video has the potential to be disruptive in the sense that we’ll see a lot more video content generated for little or almost no cost,” he told TechNewsWorld.
“The quality and effectiveness of that video is a different matter,” he continued. “But I suspect some of it will be decent.”
Explainers and basic how-to information are good candidates for text-to-video, he said.
“I could imagine that some agencies would use this to create videos for SMBs to use on their sites or YouTube for ranking purposes,” he said.
“It won’t do well – at least at first – on any branded content,” he continued. “Social media content is another use case. You’ll see creators on YouTube use it to drive up the volume to generate views and ad revenue.
not fooled by deepfakes
As was discovered with ChatGPT, there are potential dangers to a technology like text-to-video.
“The most dangerous use cases, like all such tools, are garden-variety scams that target people with relatives or particularly vulnerable individuals or institutions,” said the Cato Institute, a policy think tank in Washington, DC. analyst Will Duffield said. ,
Duffield, however, discounted the idea of using text-to-video to create effective “deepfakes”.
“When we have seen well-resourced attacks, such as the Russian deepfakes Zelensky surrendered last year, they fail because there is enough context and expectation in the world to dismiss the fake,” he explained.
“We have very well defined notions of who public figures are, what they are about, what we can expect them to do,” he continued. “So, when we see that their media is behaving in a way that is unusual, that is not in line with those expectations, we are likely to be very critical or skeptical about it.”