ChatGPT can now access and read some of your Mac's desktop apps.

As first reported by Semafor, OpenAI's ChatGPT is starting to hook up with other apps on your computer.
ChatGPT can now access and read some of your Mac's desktop apps.

As first reported by Semafor, OpenAI's ChatGPT is starting to hook up with other apps on your computer.
The startup announced Thursday that its ChatGPT desktop app for macOS can now read code in a handful of developer-focused coding apps, including VS Code, Xcode, TextEdit, Terminal and iTerm2.

That means developers will no longer have to paste their code into ChatGPT, how the chatbot has usually been used. Instead, when you enable the feature, OpenAI automatically sends what you're working on--the section of code--into its chatbot as context, along with your prompt.

That being said, so far, like Cursor or GitHub Copilot, ChatGPT does not write code directly in the apps for the developer.

Work with Apps represents an entirely different category. OpenAI's Call it whatever you want, but this feature has nothing in common with an AI agent, though the developers do say getting ChatGPT to understand other apps is a "key building block toward building agentic systems." Being able to take in nearly everything else on your screen instead of just prompts or responses represents a big challenge for AI agents today.

As OpenAI points out, it's starting with coding apps, because probably AI coding assistants have proven to be one of the most successful applications of LLMs. Today, this feature goes live for all of its Plus and Teams users; within a few weeks, it will start rolling out to Enterprise and Edu accounts. According to OpenAI, ChatGPT will be integrated into further applications, including text-based ones that can be used for writing tasks.

An OpenAI employee demonstrated ChatGPT to TechCrunch by opening the ChatGPT app and an Xcode environment containing a simple project modeling the solar system, though missing Earth. The employee selected an Xcode tab in ChatGPT, which prompts the AI chatbot to focus on the app and then asked the chatbot to "add the missing planets." The chatbot succeeded in completing the request by creating a line of code to represent Earth that was exactly the same format as the remainder of the project. They still had to paste ChatGPT's answer back into their environment, though.

According to OpenAI's desktop product lead Alexander Embiricos, OpenAI is mainly relying on the macOS accessibility API to read text and translate that into ChatGPT in order to read different apps. The screen reader on macOS has been around for almost two decades to help Apple's VoiceOver feature work. It's pretty reliable for most common apps but not everything.

For some apps, for example, Microsoft's VS Code, user has to install a special extension just to query content. And, just as it suggests, the screen reader from Apple will only read text; it cannot then help in making ChatGPT understand visual elements, such as photos, the orientation of objects, or videos.

Work with Apps will add your last 200 lines of code to the chatbot with every request for some apps. For the rest, it will use all code in your front-most window as input for the chatbot. You can highlight parts of code or text to point out to ChatGPT the right part of the project, but ChatGPT will also include text surrounding it. That all sounds like it will use a lot of input tokens.

It is not clear how OpenAI intends to extend this feature to the other apps that cannot be accessed via the screen reader from Apple. Their competitor with OpenAI, Anthropic, developed an AI model that could analyze screenshots of a user's desktop to understand and use the rest of the apps. Frankly speaking, Anthropic's approach in its current state leaves much to be desired. It is slow and makes many mistakes. Rather it's more a general-purpose version of an AI agent which doesn't depend on APIs and can do much more than read text in some other window.

"This isn't meant to be an agent; it's a way to collaborate with coding tools to start, and there will be more tools coming soon," said Embiricos in a briefing with TechCrunch. "On the side of agents, I think this is really a strong building block. This idea that ChatGPT understands or can work with all of the content that you have so that it can help with it."

This is especially important in recent reports saying OpenAI is to develop a general-purpose AI agent, codenamed "Operator," according to Bloomberg. The software will be available early 2025 and will become one of the first-to-market general-purpose AI agents attempts, like Anthropic's Computer use or Google's reported "Jarvis" agent.

OpenAI is first launching these features on macOS, just ahead of an Apple integration with ChatGPT coming in December. It's not clear when Work with Apps will arrive on Windows, the operating system created by OpenAI's largest funder, Microsoft.

Blog
|
2024-11-16 20:08:37