The Verge - Artificial Intelligence
Explore how AI is seamlessly integrating into our lives, beyond the hype, reshaping technology's role across various sectors.
In the latest version of the Files by Google app, summoning Gemini while looking at a PDF gives you the option to ask about the file, writes Android Police. Youâll need to be a Gemini Advanced subscriber to use the feature though, according to Mishaal Rahman, who reported on Friday that it had started rolling out.
If you have the feature, when you summon Gemini while looking at a PDF in the Files app, youâll see an âAsk about this PDFâ button appear. Tapping that lets you ask questions about the file, the same way you might ask ChatGPT about a PDF. Google first announced this screen-aware feature during its I/O developer conference in May.
Rahman posted a screenshot of what it looks like in action:
Other context-aware Gemini features include the ability to ask about web pages and YouTube videos. For apps or file types without Geminiâs context-aware support, the assistant instead offers to answer questions about your screen, using a screenshot it takes when you tap âAsk about this screen.â
Asus has announced the Asus NUC 14 Pro AI, the first Copilot Plus-capable AI mini PC that crams an Intel Core Ultra 9 processor into a form factor resembling a black M4 Mac Mini. First introduced at IFA in September, Asus is providing a little more detail about the mini PCâs specs than it did before, but still isnât saying it will become available or how much it will cost.
The NUC 14 Pro AI will come in five CPU configurations, from the Core Ultra 5 226V processor with 16GB of integrated RAM to a Core Ultra 9 288V processor with 32GB of RAM. The company says it has up to 67 TOPS of GPU performance and 48 NPU TOPS, and that its M.2 2280 PCIe Gen 4 x 4 slot supports 256GB to 2TB NVMe SSDs.
All of that is packed into a PC that measures 130mm deep and wide and just 34mm tall; comparatively, the Mac Mini is 127mm deep and wide and 50mm tall. Here are some pictures from Asusâ website:
The Asus NUC 14 Pro AI features a fingerprint sensor on top and a Copilot button on the front for speaking voice commands to Microsoftâs AI assistant. Also on the front are two USB-A ports, a Thunderbolt 4 port, a headphone jack, and a power button. Around the back, youâll find a 2.5Gbps ethernet jack, another Thunderbolt 4 port, two more USB-A ports, and an HDMI port. For connectivity, it features Wi-Fi 7 and Bluetooth 5.4.
Asus still hasnât said when the NUC 14 Pro AI will be available, nor how much it will cost.
Earlier this year, TCL released a trailer for Next Stop Paris â an AI-animated short film that seems like a Lifetime movie on steroids. The trailer had all the hallmarks of AI: characters that donât move their mouths when they talk, lifeless expressions, and weird animation that makes it look like scenes are constantly vibrating.
I thought this might be the extent of TCLâs experimentation with AI films, given the healthy dose of criticism it received online. But boy, was I wrong. TCL debuted five new AI-generated short films that are also destined for its TCLtv Plus free streaming platform, and after the Next Stop Paris debacle, I just had to see what else it cooked up.
Though the new films do look a little better than Next Stop Paris, they serve as yet another reminder that AI-generated videos arenât quite there yet, something weâve seen with many of the video generation tools cropping up, like OpenAIâs Sora. But in TCLâs case, itâs not just the AI that makes these films bad.
Here are all five of them, ranked from tolerable (5) to âI wish I could unsee thisâ (1).
5. Sun Day
This futuristic short film basically has the same concept as Ray Bradburyâs short story âAll Summer in...
For my last issue of the year, Iâm focusing on the AI talent war, which is a theme Iâve been covering since this newsletter launched almost two years ago. And keep reading for the latest from inside Google and Meta this week.
But first, I need your questions for a mailbag issue Iâm planning for my first issue of 2025. You can submit questions via this form or leave them in the comments.
âItâs like looking for LeBron Jamesâ
This week, Databricks announced the largest known funding round for any private tech company in history. The AI enterprise firm is in the final stretch of raising $10 billion, almost all of which is going to go to buying back vested employee stock.
How companies approach compensation is often undercovered in the tech industry, even though the strategies play a crucial role in determining which company gets ahead faster. Nowhere is this dynamic as intense as the war for AI talent, as Iâve covered before.
To better understand whatâs driving the state of play going into 2025, this week I spoke with Naveen Rao, VP of AI at Databricks. Rao is one of my favorite people to talk to about the AI industry. Heâs deeply technical but also business-minded, having...
For the last day of ship-mas, OpenAI previewed a new set of frontier âreasoningâ models dubbed o3 and o3-mini. The Verge first reported that a new reasoning model would be coming during this event.
The company isnât releasing these models today (and admits final results may evolve with more post-training). However, OpenAI is accepting applications from the research community to test these systems ahead of public release (which it has yet to set a date for). OpenAI launched o1 (codenamed Strawberry) in September and is jumping straight to o3, skipping o2 to avoid confusion (or trademark conflicts) with the British telecom company called O2.
The term reasoning has become a common buzzword in the AI industry lately, but it basically means the machine breaks down instructions into smaller tasks that can produce stronger outcomes. These models often show the work for how it got to an answer, rather than just giving a final answer without explanation.
According to the company, o3 surpasses previous performance records across the board. It beats its predecessor in coding tests (called SWE-Bench Verified) by 22.8 percent and outscores OpenAIâs Chief Scientist in competitive programming. The model nearly aced one of the hardest math competitions (called AIME 2024), missing one question, and achieved 87.7 percent on a benchmark for expert-level science problems (called GPQA Diamond). On the toughest math and reasoning challenges that usually stump AI, o3 solved 25.2 percent of problems (where no other model exceeds 2 percent).
The company also announced new research on deliberative alignment, which requires the AI model to process safety decisions step-by-step. So, instead of just giving yes/no rules to the AI model, this paradigm requires it to actively reason about whether a userâs request fits OpenAIâs safety policies. The company claims that when it tested this on o1, it was much better at following safety guidelines than previous models, including GPT-4.
Google is planning to add a new âAI Modeâ to its search engine, according to a report from The Information. The company will reportedly display an option to switch to AI Mode from the top of the results page, allowing you to access an interface similar to its Gemini AI chatbot.
The new AI Mode tab would live on the left side of the âAll,â âImages,â âVideos,â and âShoppingâ tabs, The Information reports. When you receive a response in AI Mode, The Information says Google will display links to related webpages and âa search bar below the conversational answer that prompts users to âAsk a follow-up...ââ
This tracks with Android Authorityâs report from earlier this month, which spotted an AI Mode in a beta version of the Google app. 9to5Google also dug up code suggesting you can use AI Mode to ask questions using your voice. The Verge reached out to Google with a request for comment but didnât immediately hear back.
With OpenAI rolling out search in ChatGPT for all users, Google is likely under increased pressure to consolidate search and AI. The company already displays AI search summaries for some queries and recently expanded the feature to dozens of more countries in October.
Google has introduced a new AI âreasoningâ model capable of answering complex questions while also providing a rundown of its âthoughts,â as reported earlier by TechCrunch. The model, called Gemini 2.0 Flash Thinking, is still experimental and will likely compete with OpenAIâs o1 reasoning model.
In a post on X, Google DeepMind chief scientist Jeff Dean says the model is âtrained to use thoughts to strengthen its reasoning,â and also benefits from the speed that comes along with the faster Gemini Flash 2.0 model. The demo shared by Dean shows how Gemini 2.0 Flash Thinking goes about answering a physics problem by âthinkingâ through a series of steps before offering a solution.
Want to see Gemini 2.0 Flash Thinking in action? Check out this demo where the model solves a physics problem and explains its reasoning. pic.twitter.com/Nl0hYj7ZFS
â Jeff Dean (@JeffDean) December 19, 2024
This isnât necessarily âreasoningâ in the way humans perform it, but it means the machine breaks down instructions into smaller tasks that can produce stronger outcomes.
Another example, posted by Google product lead Logan Kilpatrick, shows the model reasoning its way through a problem that involves both visual and textual elements. âThis is just the first step in our reasoning journey,â Kilpatrick says. You can try out Gemini 2.0 Flash Thinking on Googleâs AI Studio.
There have been quite a few notable updates in the AI space as of late, with Google revealing its upgraded Gemini 2.0 model earlier this month as part of the companyâs push into âagenticâ AI. Meanwhile, OpenAI made the full version of its o1 reasoning model available to ChatGPT subscribers.
Instagram is planning to introduce a generative AI editing feature next year that will allow users to âchange nearly any aspect of your videos.â The tech is powered by Metaâs Movie Gen AI model according to a teaser posted by Instagram head Adam Mosseri, and aims to provide creators with more tools to help transform their content and bring their ideas to life without extensive video editing or manipulation skills.
Mosseri says the feature can make adjustments using a âsimple text prompt.â The announcement video includes previews of early research AI models that change Mosseriâs outfit, background environments, and even his overall appearance â in one scene transforming him into a felt puppet. Other changes are more subtle, such as adding new objects to the existing background or a gold chain around Mosseriâs neck without altering the rest of his clothing.
Itâs an impressive preview. The inserted backgrounds and clothing donât distort unnaturally when Mosseri rapidly moves his arms or face, but the snippets we get to see are barely a second long. The early previews of OpenAIâs Sora video model also looked extremely polished, however, and the results weâve seen since it became available to the public havenât lived up to those expectations. We wonât know how good Instagramâs AI video tools truly are by comparison until they launch.
Meta unveiled its Movie Gen AI video generator in October, which promises to âpreserve human identity and motionâ in the videos it creates or edits. The announcement was made months after similar models from competitors like OpenAIâs Sora and Adobeâs Firefly Video model, the latter of which is already powering beta text-to-video editing tools inside Premiere Pro. Meta hasnât announced when Movie Gen will be available but Instagram is the first platform that the company has confirmed will utilize the text-to-video model.
Microsoft is previewing live translation on Intel and AMD-based Copilot Plus PCs. The feature is rolling out now to Windows 11 Insiders in the Dev Channel, allowing users to translate audio from over 44 languages into English subtitles.
Live translation, which initially launched on Qualcomm-powered Copilot Plus PCs, works with any audio played through a Copilot Plus PC, whether itâs coming from a YouTube video, a live video conference, or a recording. If the audio is in a supported language, Windows 11 will display real-time captions in English. The feature can currently translate from Spanish, French, Russian, Chinese, Korean, Arabic, and more.
Microsoft has been gradually bringing more AI features to Intel and AMD-powered Copilot Plus PCs. Earlier this month, Microsoft began testing Recall, which takes snapshots of your activity on a Copilot Plus PC and lets you call up specific memories, on devices with Intel and AMD chips.
Microsoft is also rolling out an update to live translation on Qualcomm-equipped Copilot Plus PCs, as Windows 11 Insiders in the Dev Channel can now translate select languages to Simplified Chinese.
For the 10th day of âship-mas,â OpenAI rolled out a way to call ChatGPT for up to 15 minutes for free over the phone using 1-800-CHATGPT.
The feature was a project spun up just a few weeks ago, OpenAIâs chief product officer Kevin Weil said on the livestream. Users can now call ChatGPT in the US and message via WhatsApp globally at 1-800-242-8478. The 15-minute limit is per phone number per month, so really, you could spin up a few Google Voice numbers to get as much time with it as you want.
The phone number is built using OpenAIâs Realtime API, and the WhatsApp feature is powered by GPT-4o mini through an integration with the WhatsApp API.
OpenAI sees this feature as an important stepping stone for newcomers to AI, since the service represents a simplified version of ChatGPT compared to its web-based counterpart and offers a âlow-cost way to try it out through familiar channels.â The company notes that existing users seeking more comprehensive features, higher usage limits, and personalization options should continue using their regular ChatGPT accounts through traditional channels.
Funnily enough, Google launched a similar tool in 2007 called GOOG-411, which offered free directory assistance by voice. The service was discontinued in 2010 without an official explanation from Google, but some speculate that it was shut down because the company had already achieved its underlying goal: collecting a sufficient database of voice samples to advance its speech recognition technology.
At the time, Google VP Marissa Mayer said it outright: âThe speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or weâre trying to get the voice out of video, we can do it with high accuracy.â
OpenAI spokesperson Taya Christianson said the company wonât be using these calls to train large language models.