The AI Briefing
A log of what's changing in the AI landscape — filtered for what actually matters to working professionals and business owners.
Not comprehensive. Not updated on a schedule. Updated when something genuinely changes. Newest entries first. What we leave out is as important as what we include.
For deeper analysis of any development noted here, see our commentary.
Last updated:
Our fortnightly newsletter covers the developments that matter — so you don’t need to check back here every other day.
Claude Sonnet 4.6: The Spreadsheet Wins
Anthropic released Claude Sonnet 4.6 yesterday — the mid-tier model in its range, priced at one-fifth of the flagship Opus. The headline result: on GDPval, a benchmark designed by OpenAI to measure AI against the actual day-to-day work of experienced professionals, Sonnet 4.6 topped the table. Ahead of Opus 4.6. Ahead of GPT-5.2. Ahead of Gemini.
What GDPval measures, and why it matters here. Most AI benchmarks test academic reasoning — graduate exams, olympiad mathematics, PhD-level problems. GDPval took a different approach. Its designers went to the US Bureau of Labor Statistics, identified 44 occupations across nine sectors, recruited professionals with an average of fourteen years’ experience, and asked them to build tasks based on their actual work. Legal briefs. Financial models. Engineering documents. Insurance intake forms. The question wasn’t whether the AI could reason impressively. It was whether the output was good enough to use.
Sonnet 4.6 scored an Elo of 1633 on GDPval’s office tasks, against 1606 for Opus 4.6 and 1462 for GPT-5.2. The gap is not trivial.
The computer use trajectory. Separately, Anthropic’s computer use benchmark — measuring a model’s ability to operate a computer the way a human does, navigating spreadsheets, filling out multi-step web forms, completing workflows — shows a sixteen-month arc worth noting. October 2024: 14.9%. February 2025: 28%. June 2025: 42.2%. October 2025: 61.4%. This week: 72.5%. Anthropic’s own description of what 72.5% means in practice: human-level performance on tasks like navigating a complex spreadsheet or filling out a multi-step web form.
What this means for the two audiences.
For professionals: The tools are getting better at exactly the kind of work the headlines keep saying will be automated — the routine cognitive tasks that sit at the high-exposure end of the Working Picture spectrum. Not the judgement calls. Not the relationship management. The spreadsheet. The form. The brief. The thing that takes three hours of a Wednesday. This doesn’t mean your role disappears. It means the balance of your work shifts — and the shift is accelerating faster than many expected.
For SMEs: The cost point matters. If you’ve been watching frontier models and thinking “interesting but expensive for what we’d use it for,” the calculation has changed. The model that now leads on practical office tasks is the mid-tier one — available on free tiers, costing a fraction of the premium models at API level. The barrier to entry for genuine AI-assisted office work has dropped significantly. If you’re in the first three months of the phased approach (Making It Work, Chapter 7), this is the kind of development worth paying attention to: not a new category of capability, but the same capability becoming dramatically more accessible.
The wider significance. Early feedback on Sonnet 4.6 keeps reaching for the same words: reliable, consistent, less prone to over-engineering, better at following instructions without losing the thread. Users describe it as having “design taste” — building things that need fewer editing passes, drafting documents that feel closer to done. These are not the qualities that make headlines. They are the qualities that make a tool useful in a professional context. The gap between “impressive on an exam” and “useful in an office” turns out to be significant — and this release suggests the industry is starting to close it deliberately.
Status for SMEs: Ready for experimentation. If you’re already using a frontier model for content drafting, summarisation, or document work, Sonnet 4.6 is worth trying — the improvement in practical task completion is genuine. If you haven’t started, the free tier now gives you access to a model that performs at or above flagship level on the tasks most relevant to office work. The barrier has never been lower.
You & AI has no commercial relationship with Anthropic or any AI vendor. This assessment is independent.
Agentic AI — tools that take actions, not just text
The most significant shift in the current landscape. “Agentic” AI refers to systems that don’t just generate text but take actions — booking appointments, filing documents, sending emails, updating records, executing multi-step workflows. Anthropic’s recent agentic tools triggered a stock market selloff as investors realised automation might hit the revenue models of large software companies. Vendor marketing is running far ahead of reliable deployment.
The reliability gap. The capability is real and advancing quickly. The reliability is not yet where it needs to be for unsupervised use in a business context. When an AI assistant drafts an email, you can review it before sending. When an agentic system sends the email on your behalf, the review step disappears — and with it, your safety net.
For professionals: Agentic tools are the mechanism behind the “task automation” predictions from Suleyman, Amodei, and others. Understanding what they can and can’t do reliably is worth your attention, even if you’re not adopting them yet.
Status: Watch this space. Expect this category to mature significantly over the next twelve months, but treat current offerings as experimental.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
Transcription and summarisation — the quiet success story
The maturity story. AI-powered meeting transcription (Otter, Fireflies, Microsoft Copilot in Teams, Google Gemini in Meet) has matured into one of the most reliable and immediately useful AI applications for any professional or small business. Accuracy is high for clear English in standard meeting formats. Summaries and action item extraction are genuinely time-saving.
The adoption barrier is low. Costs range from free tiers to £10–15/month. Most people understand the value immediately. One thing to establish first: a consent policy. Not everyone is comfortable being recorded, and some contexts (legal, medical, regulated industries) have specific constraints.
Status: Ready for most SMEs. Among the lowest-risk, highest-return AI tools currently available.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
Frontier models as general-purpose assistants
The current state of play. ChatGPT, Claude, and Gemini now function as capable general-purpose writing and thinking tools. For most professional tasks — drafting correspondence, summarising documents, restructuring information, brainstorming approaches — any of the three major providers produces usable output that requires human editing rather than starting from scratch.
The learning curve. Genuine but manageable — expect a week of awkward experimentation before the tool becomes useful rather than distracting. Free tiers exist; paid tiers run £15–25/month per user.
What the marketing doesn’t say. Output quality varies enormously by task. These tools are fluent, not accurate. They produce confident text whether or not the underlying claim is correct. Every output needs checking by someone who knows the subject matter. For routine drafting, they save significant time. For anything requiring specialist knowledge, they’re a starting point, not a solution.
Status: Ready for most professionals and SMEs.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
AI content platforms — the layer you probably don't need
The pitch. Tools marketed as “AI content platforms” — Jasper, Copy.ai, Writer, and similar — promise to automate marketing content, blog posts, social media, and client communications. Typically built on the same underlying models as the frontier assistants above, with templates and workflow layers added.
The reality for most SMEs. The standalone frontier models do the same work at a fraction of the cost. The value proposition of these platforms is workflow automation and brand consistency at scale — relevant for a marketing team of ten, less so for a firm where one person handles all communications.
Status: Vendor hype exceeds current reality for most SMEs.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
Spreadsheet and data analysis — useful, with a significant caveat
The capability. Frontier models can now read spreadsheets, generate formulas, identify patterns, and produce summaries of structured data. For SMEs whose data lives in spreadsheets — which is most of them — this is one of the most immediately practical capabilities.
The caveat. Works well for summarisation, pattern spotting, and formula generation. Works poorly when the data is messy, inconsistent, or spread across multiple files — which is the reality in most small businesses. The tool won’t tell you your data is too disorganised to analyse reliably. You need to know that yourself.
Status: Ready with caveats.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
Contract and document review — not yet ready to trust alone
The promise. AI tools for reviewing contracts, policies, and long documents are improving rapidly. Some are embedded in legal-specific platforms; others are capabilities within the frontier models. They can summarise, flag inconsistencies, compare versions, and extract key terms.
The limitation. The tools can miss nuance that matters — unusual clauses, jurisdiction-specific issues, implications that require contextual understanding. Useful as a first pass for professionals who know what they’re looking at. Risky as a substitute for that knowledge.
Status: Promising but premature for most SMEs without legal expertise in-house.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
AI chatbots — handle with care if you're small
The improvement. AI-powered chatbots have improved significantly but remain a reputational risk for SMEs if deployed carelessly. They handle routine enquiries competently — opening hours, pricing queries, booking confirmations. They handle nuanced or emotionally charged customer interactions poorly.
The scale problem. A bad chatbot interaction for a firm with 500,000 customers is a statistical inevitability absorbed by volume. For a firm with 200 clients, it’s a relationship-damaging event. Deploy only for genuinely routine queries, with clear escalation to a human, and test extensively before going live.
Status: Ready with caveats — and the caveats matter more for small businesses than large ones.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
AI coding assistants — what they mean for non-developers
Not for developers — for everyone else. GitHub Copilot, Cursor, and similar tools have transformed software development productivity. This entry is not for developers — it’s for the professionals and business owners who employ them or rely on their work.
For SMEs: AI-assisted coding means software can be produced faster and by smaller teams. This may reduce the cost of custom development. For professionals: In adjacent roles (project management, product management, QA), it’s reshaping team structures and expectations. The Stanford finding that entry-level hiring in AI-exposed technical jobs has dropped 13% is driven partly by this category.
Status: The implications matter even if the tools don’t apply to you directly.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
AI in bookkeeping — already in your subscription
The quiet arrival. Automated bank reconciliation, receipt processing, and transaction categorisation are maturing. Several UK accounting platforms (Xero, QuickBooks, FreeAgent) now include AI features as standard. For SMEs already using these platforms, the AI capability is arriving inside tools they already pay for — no separate purchase required.
The reality check. The productivity gain is real but modest: hours saved per week on routine bookkeeping tasks. The “AI will replace your accountant” narrative is premature. The judgement, advisory, and compliance dimensions of accounting remain firmly human.
Status: Ready for most SMEs — often already available in existing subscriptions.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
Personal AI assistants — compelling promise, significant privacy trade-off
The promise. A growing category of tools (Notion AI, Mem, Rewind/Limitless) that aim to become a persistent AI layer across your working life — capturing, organising, and retrieving information from meetings, documents, and conversations. An AI that knows your context and can surface the right information at the right time.
The trade-off. The privacy implications are significant — these tools work by ingesting large amounts of your professional and sometimes personal data. The value proposition depends on sustained use over months. Early adopters report uneven results. Worth watching for professionals who manage large volumes of information, but not yet a confident recommendation for most.
Status: Promising but premature.
You & AI has no commercial relationship with any AI vendor. This assessment is independent.
The AI Briefing launched February 2026 with initial entries across ten developments. Entries will be added, updated, and revised as developments warrant. For deeper analysis, see our commentary.
Stay current
Our fortnightly newsletter covers the developments that matter — so you don’t need to check back here every other day.