OpenAI releases update: Achieving real-time cross-audio, visual, and text reasoning
ChainCatcher news, according to Cointelegraph, OpenAI made four updates to its models in October to help its AI models engage in conversations better and improve image recognition capabilities. The first major update is the real-time API, which allows developers to create AI-generated voice applications using a single prompt, enabling natural conversations similar to ChatGPT's advanced voice mode. Previously, developers had to "stitch together" multiple models to create these experiences. Audio input typically needed to be fully uploaded and processed before receiving a response, resulting in high latency for real-time applications like voice interactions. With the streaming capabilities of the Realtime API, developers can now achieve instant, natural interactions, just like voice assistants. This API runs on GPT-4, which will be released in May 2024, and can reason in real-time across audio, visual, and text inputs.Another update includes fine-tuning tools provided for developers, enabling them to improve AI responses generated from image and text inputs. The image-based fine-tuner allows the AI to better understand images, enhancing visual search and object detection capabilities. This process includes feedback from humans, who provide examples of good and bad responses for training.In addition to voice and visual updates, OpenAI also introduced "model distillation" and "prompt caching," allowing smaller models to learn from larger models and reducing development costs and time by reusing processed text. According to Reuters, OpenAI expects its revenue to increase to $11.6 billion next year, up from an estimated $3.7 billion in 2024.