The Verge - Artificial Intelligence
Explore how AI is seamlessly integrating into our lives, beyond the hype, reshaping technology's role across various sectors.
In the latest version of the Files by Google app, summoning Gemini while looking at a PDF gives you the option to ask about the file, writes Android Police. You’ll need to be a Gemini Advanced subscriber to use the feature though, according to Mishaal Rahman, who reported on Friday that it had started rolling out.
If you have the feature, when you summon Gemini while looking at a PDF in the Files app, you’ll see an “Ask about this PDF” button appear. Tapping that lets you ask questions about the file, the same way you might ask ChatGPT about a PDF. Google first announced this screen-aware feature during its I/O developer conference in May.
Rahman posted a screenshot of what it looks like in action:
Other context-aware Gemini features include the ability to ask about web pages and YouTube videos. For apps or file types without Gemini’s context-aware support, the assistant instead offers to answer questions about your screen, using a screenshot it takes when you tap “Ask about this screen.”
Asus has announced the Asus NUC 14 Pro AI, the first Copilot Plus-capable AI mini PC that crams an Intel Core Ultra 9 processor into a form factor resembling a black M4 Mac Mini. First introduced at IFA in September, Asus is providing a little more detail about the mini PC’s specs than it did before, but still isn’t saying it will become available or how much it will cost.
The NUC 14 Pro AI will come in five CPU configurations, from the Core Ultra 5 226V processor with 16GB of integrated RAM to a Core Ultra 9 288V processor with 32GB of RAM. The company says it has up to 67 TOPS of GPU performance and 48 NPU TOPS, and that its M.2 2280 PCIe Gen 4 x 4 slot supports 256GB to 2TB NVMe SSDs.
All of that is packed into a PC that measures 130mm deep and wide and just 34mm tall; comparatively, the Mac Mini is 127mm deep and wide and 50mm tall. Here are some pictures from Asus’ website:
The Asus NUC 14 Pro AI features a fingerprint sensor on top and a Copilot button on the front for speaking voice commands to Microsoft’s AI assistant. Also on the front are two USB-A ports, a Thunderbolt 4 port, a headphone jack, and a power button. Around the back, you’ll find a 2.5Gbps ethernet jack, another Thunderbolt 4 port, two more USB-A ports, and an HDMI port. For connectivity, it features Wi-Fi 7 and Bluetooth 5.4.
Asus still hasn’t said when the NUC 14 Pro AI will be available, nor how much it will cost.
Earlier this year, TCL released a trailer for Next Stop Paris — an AI-animated short film that seems like a Lifetime movie on steroids. The trailer had all the hallmarks of AI: characters that don’t move their mouths when they talk, lifeless expressions, and weird animation that makes it look like scenes are constantly vibrating.
I thought this might be the extent of TCL’s experimentation with AI films, given the healthy dose of criticism it received online. But boy, was I wrong. TCL debuted five new AI-generated short films that are also destined for its TCLtv Plus free streaming platform, and after the Next Stop Paris debacle, I just had to see what else it cooked up.
Though the new films do look a little better than Next Stop Paris, they serve as yet another reminder that AI-generated videos aren’t quite there yet, something we’ve seen with many of the video generation tools cropping up, like OpenAI’s Sora. But in TCL’s case, it’s not just the AI that makes these films bad.
Here are all five of them, ranked from tolerable (5) to “I wish I could unsee this” (1).
5. Sun Day
This futuristic short film basically has the same concept as Ray Bradbury’s short story “All Summer in...
For my last issue of the year, I’m focusing on the AI talent war, which is a theme I’ve been covering since this newsletter launched almost two years ago. And keep reading for the latest from inside Google and Meta this week.
But first, I need your questions for a mailbag issue I’m planning for my first issue of 2025. You can submit questions via this form or leave them in the comments.
“It’s like looking for LeBron James”
This week, Databricks announced the largest known funding round for any private tech company in history. The AI enterprise firm is in the final stretch of raising $10 billion, almost all of which is going to go to buying back vested employee stock.
How companies approach compensation is often undercovered in the tech industry, even though the strategies play a crucial role in determining which company gets ahead faster. Nowhere is this dynamic as intense as the war for AI talent, as I’ve covered before.
To better understand what’s driving the state of play going into 2025, this week I spoke with Naveen Rao, VP of AI at Databricks. Rao is one of my favorite people to talk to about the AI industry. He’s deeply technical but also business-minded, having...
For the last day of ship-mas, OpenAI previewed a new set of frontier “reasoning” models dubbed o3 and o3-mini. The Verge first reported that a new reasoning model would be coming during this event.
The company isn’t releasing these models today (and admits final results may evolve with more post-training). However, OpenAI is accepting applications from the research community to test these systems ahead of public release (which it has yet to set a date for). OpenAI launched o1 (codenamed Strawberry) in September and is jumping straight to o3, skipping o2 to avoid confusion (or trademark conflicts) with the British telecom company called O2.
The term reasoning has become a common buzzword in the AI industry lately, but it basically means the machine breaks down instructions into smaller tasks that can produce stronger outcomes. These models often show the work for how it got to an answer, rather than just giving a final answer without explanation.
According to the company, o3 surpasses previous performance records across the board. It beats its predecessor in coding tests (called SWE-Bench Verified) by 22.8 percent and outscores OpenAI’s Chief Scientist in competitive programming. The model nearly aced one of the hardest math competitions (called AIME 2024), missing one question, and achieved 87.7 percent on a benchmark for expert-level science problems (called GPQA Diamond). On the toughest math and reasoning challenges that usually stump AI, o3 solved 25.2 percent of problems (where no other model exceeds 2 percent).
The company also announced new research on deliberative alignment, which requires the AI model to process safety decisions step-by-step. So, instead of just giving yes/no rules to the AI model, this paradigm requires it to actively reason about whether a user’s request fits OpenAI’s safety policies. The company claims that when it tested this on o1, it was much better at following safety guidelines than previous models, including GPT-4.
Google is planning to add a new “AI Mode” to its search engine, according to a report from The Information. The company will reportedly display an option to switch to AI Mode from the top of the results page, allowing you to access an interface similar to its Gemini AI chatbot.
The new AI Mode tab would live on the left side of the “All,” “Images,” “Videos,” and “Shopping” tabs, The Information reports. When you receive a response in AI Mode, The Information says Google will display links to related webpages and “a search bar below the conversational answer that prompts users to ‘Ask a follow-up...’”
This tracks with Android Authority’s report from earlier this month, which spotted an AI Mode in a beta version of the Google app. 9to5Google also dug up code suggesting you can use AI Mode to ask questions using your voice. The Verge reached out to Google with a request for comment but didn’t immediately hear back.
With OpenAI rolling out search in ChatGPT for all users, Google is likely under increased pressure to consolidate search and AI. The company already displays AI search summaries for some queries and recently expanded the feature to dozens of more countries in October.
Google has introduced a new AI “reasoning” model capable of answering complex questions while also providing a rundown of its “thoughts,” as reported earlier by TechCrunch. The model, called Gemini 2.0 Flash Thinking, is still experimental and will likely compete with OpenAI’s o1 reasoning model.
In a post on X, Google DeepMind chief scientist Jeff Dean says the model is “trained to use thoughts to strengthen its reasoning,” and also benefits from the speed that comes along with the faster Gemini Flash 2.0 model. The demo shared by Dean shows how Gemini 2.0 Flash Thinking goes about answering a physics problem by “thinking” through a series of steps before offering a solution.
Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning. pic.twitter.com/Nl0hYj7ZFS
— Jeff Dean (@JeffDean) December 19, 2024
This isn’t necessarily “reasoning” in the way humans perform it, but it means the machine breaks down instructions into smaller tasks that can produce stronger outcomes.
Another example, posted by Google product lead Logan Kilpatrick, shows the model reasoning its way through a problem that involves both visual and textual elements. “This is just the first step in our reasoning journey,” Kilpatrick says. You can try out Gemini 2.0 Flash Thinking on Google’s AI Studio.
There have been quite a few notable updates in the AI space as of late, with Google revealing its upgraded Gemini 2.0 model earlier this month as part of the company’s push into “agentic” AI. Meanwhile, OpenAI made the full version of its o1 reasoning model available to ChatGPT subscribers.
Instagram is planning to introduce a generative AI editing feature next year that will allow users to “change nearly any aspect of your videos.” The tech is powered by Meta’s Movie Gen AI model according to a teaser posted by Instagram head Adam Mosseri, and aims to provide creators with more tools to help transform their content and bring their ideas to life without extensive video editing or manipulation skills.
Mosseri says the feature can make adjustments using a “simple text prompt.” The announcement video includes previews of early research AI models that change Mosseri’s outfit, background environments, and even his overall appearance — in one scene transforming him into a felt puppet. Other changes are more subtle, such as adding new objects to the existing background or a gold chain around Mosseri’s neck without altering the rest of his clothing.
It’s an impressive preview. The inserted backgrounds and clothing don’t distort unnaturally when Mosseri rapidly moves his arms or face, but the snippets we get to see are barely a second long. The early previews of OpenAI’s Sora video model also looked extremely polished, however, and the results we’ve seen since it became available to the public haven’t lived up to those expectations. We won’t know how good Instagram’s AI video tools truly are by comparison until they launch.
Meta unveiled its Movie Gen AI video generator in October, which promises to “preserve human identity and motion” in the videos it creates or edits. The announcement was made months after similar models from competitors like OpenAI’s Sora and Adobe’s Firefly Video model, the latter of which is already powering beta text-to-video editing tools inside Premiere Pro. Meta hasn’t announced when Movie Gen will be available but Instagram is the first platform that the company has confirmed will utilize the text-to-video model.
Microsoft is previewing live translation on Intel and AMD-based Copilot Plus PCs. The feature is rolling out now to Windows 11 Insiders in the Dev Channel, allowing users to translate audio from over 44 languages into English subtitles.
Live translation, which initially launched on Qualcomm-powered Copilot Plus PCs, works with any audio played through a Copilot Plus PC, whether it’s coming from a YouTube video, a live video conference, or a recording. If the audio is in a supported language, Windows 11 will display real-time captions in English. The feature can currently translate from Spanish, French, Russian, Chinese, Korean, Arabic, and more.
Microsoft has been gradually bringing more AI features to Intel and AMD-powered Copilot Plus PCs. Earlier this month, Microsoft began testing Recall, which takes snapshots of your activity on a Copilot Plus PC and lets you call up specific memories, on devices with Intel and AMD chips.
Microsoft is also rolling out an update to live translation on Qualcomm-equipped Copilot Plus PCs, as Windows 11 Insiders in the Dev Channel can now translate select languages to Simplified Chinese.
For the 10th day of “ship-mas,” OpenAI rolled out a way to call ChatGPT for up to 15 minutes for free over the phone using 1-800-CHATGPT.
The feature was a project spun up just a few weeks ago, OpenAI’s chief product officer Kevin Weil said on the livestream. Users can now call ChatGPT in the US and message via WhatsApp globally at 1-800-242-8478. The 15-minute limit is per phone number per month, so really, you could spin up a few Google Voice numbers to get as much time with it as you want.
The phone number is built using OpenAI’s Realtime API, and the WhatsApp feature is powered by GPT-4o mini through an integration with the WhatsApp API.
OpenAI sees this feature as an important stepping stone for newcomers to AI, since the service represents a simplified version of ChatGPT compared to its web-based counterpart and offers a “low-cost way to try it out through familiar channels.” The company notes that existing users seeking more comprehensive features, higher usage limits, and personalization options should continue using their regular ChatGPT accounts through traditional channels.
Funnily enough, Google launched a similar tool in 2007 called GOOG-411, which offered free directory assistance by voice. The service was discontinued in 2010 without an official explanation from Google, but some speculate that it was shut down because the company had already achieved its underlying goal: collecting a sufficient database of voice samples to advance its speech recognition technology.
At the time, Google VP Marissa Mayer said it outright: “The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.”
OpenAI spokesperson Taya Christianson said the company won’t be using these calls to train large language models.