OpenAI has unveiled GPT-4, the latest breakthrough in its quest to scale up deep learning. The multimodal AI model, capable of image and text-understanding, is being offered through an API to the company’s paying users. Developers can also sign up to the waitlist for access. GPT-4 is available through ChatGPT Plus, with a usage cap. The model performs at human-level on professional and academic benchmarks and is capable of generating text and accepting image and text inputs, a significant improvement on its predecessor GPT-3.5.

Several companies have already started using GPT-4, including Stripe, Duolingo, Morgan Stanley and Khan Academy. Stripe is using GPT-4 to scan business websites and summarise the information to customer support staff. Duolingo has built GPT-4 into its new language-learning subscription tier. Morgan Stanley is creating a GPT-4-powered system to retrieve information from company documents and deliver it to financial analysts. Meanwhile, Khan Academy is leveraging GPT-4 to build an automated tutor.

One of the key features of GPT-4 is its ability to understand images as well as text. OpenAI is testing its image understanding capability with Be My Eyes. Powered by GPT-4, Be My Eyes’ new Virtual Volunteer feature can answer questions about images sent to it. The feature can not only identify what is in an image but can also analyse it to determine what can be prepared with those ingredients.

Steerability tooling is another improvement in GPT-4. OpenAI has introduced a new API capability, “system” messages, that allow developers to prescribe style and task by describing specific directions. These messages set the tone for the AI’s next interactions and establish boundaries. GPT-4 is still far from perfect, and OpenAI admits that it can “hallucinate” facts and make reasoning errors. Nonetheless, it represents a significant milestone in deep learning.