AI driven bots are rapidly overtaking humans as the primary consumers of online content, creating growing sustainability concerns around energy use, digital efficiency, and the future structure of the open web.
Report
The latest State of the Bots report from AI bot traffic measurement company TollBit shows a marked acceleration in automated web traffic during the second half of 2025, alongside a measurable decline in human visits. It seems that what was once framed primarily as a debate about AI training data has evolved into a broader structural change, with AI systems now reading the live internet at scale to support search, chat, and information retrieval tools.
Rising Bot Traffic And Declining Human Visits
TollBit’s analysis shows that the ratio of AI bot traffic to human traffic has changed rapidly over a short period. For example, in the first quarter of 2025, the average site monitored by TollBit saw one AI bot visit for every 200 human visits. By the end of the year, that ratio had increased to one AI bot visit for every 31 human visits.
Over the same period, human web traffic declined. Between the third and fourth quarters of 2025 alone, TollBit recorded a 5 per cent fall in human visits across its partner sites. The report stresses that these figures likely understate the true scale of automated activity, as many modern bots are designed to closely mimic human browsing behaviour.
In its findings, TollBit says “from the tests we ran, many of these web scrapers are indistinguishable from human visitors on sites”, adding that the data should be treated as conservative. This increasing difficulty in separating human and automated traffic complicates both measurement and mitigation efforts.
From Training Crawlers To Live Web Retrieval
Earlier concerns around AI and the web focused largely on large scale scraping for model training. While training related crawling continues, TollBit’s data actually shows it is no longer the dominant driver of AI bot activity.
In fact, it seems that training crawler traffic fell by around 15 per cent between the second and fourth quarters of 2025. However, over the same period, traffic from retrieval augmented generation bots increased by 33 per cent.
RAG systems, which fetch live web content to answer user prompts, allow AI tools to provide current answers rather than relying solely on static training data.
This distinction has some important implications. For example, training crawlers typically access content once and store it for offline use. RAG bots, by contrast, return to the same pages repeatedly. TollBit found that in the fourth quarter of 2025, RAG bots made roughly ten page requests for every single page request made by training bots. This repeated access reflects the growing role of AI tools as substitutes for traditional search engines and direct browsing.
The Role Of AI Search Indexing
Alongside RAG bots, AI search indexing activity is expanding rapidly. Indexing crawlers systematically map the web so that RAG systems can locate relevant pages when responding to prompts. TollBit recorded a 59 per cent increase in AI search indexer traffic between the second and fourth quarters of 2025.
This growth seems to show that AI driven search is building out its own parallel infrastructure to support real time information retrieval. While indexing has long been a feature of traditional search engines, the combination of indexing and repeated live retrieval increases the volume of automated traffic moving across the web.
Concentration Of Scraping Activity
TollBit’s data also shows that AI scraping activity is unevenly distributed across providers. For example, OpenAI’s ChatGPT User agent was identified as the most active RAG bot across monitored sites. In the fourth quarter of 2025, it averaged around five times as many scrapes per page as the second most active scraper, attributed to Meta.
Other major contributors include bots operated by Google, Perplexity, Anthropic, and Amazon, each running multiple user agents for training, indexing, and user triggered retrieval. The combined effect is a background layer of automated traffic that now rivals human browsing in scale on many sites.
Which Parts Of The Web Are Most Affected?
It should be noted here that not all content categories seem to be affected equally. For example, TollBit reports that B2B and professional sites, national news outlets, and lifestyle content are among the most heavily scraped. Technology and consumer electronics content experienced the fastest growth in scraping activity, increasing by 107 per cent since the second quarter of 2025.
According to TollBit, the most frequently scraped pages tend to relate to time sensitive topics. In the third quarter of 2025, heavily scraped URLs included political controversies and live sports coverage. By the fourth quarter, entertainment releases and shopping related content, such as streaming series and seasonal buying guides, featured more prominently.
This pattern could be said to reflect how users are increasingly turning to AI tools for up-to-date information, prompting RAG bots to revisit high demand pages repeatedly throughout the day.
The Sustainability Cost Of Repeated Access
From a sustainability perspective, the rise of RAG driven browsing introduces a less visible but growing cost. For example, each automated page request consumes energy across data centres, networks, and supporting infrastructure. When the same content is retrieved repeatedly to support similar prompts, overall energy demand increases significantly.
TollBit, therefore, describes the current environment as inefficient for both publishers and AI developers. AI companies invest heavily in scraping infrastructure, proxy services, and evasion techniques, while publishers spend increasing sums on defensive technologies. This duplication of effort results in higher processing and energy use, alongside increased indirect emissions.
In fact, the report notes that advanced scraping services can charge more than 22 dollars per 1,000 pages retrieved. At the scale required to support popular consumer AI applications, data acquisition costs alone can reach tens of millions of dollars per year. These financial costs sit alongside rising electricity demand in data centres, which sustainability researchers already identify as a growing contributor to global emissions.
Robots Txt And Escalating Inefficiency
Existing mechanisms for controlling automated access seem to have proven ineffective. For example, in the fourth quarter of 2025, around 30 per cent of AI bot scrapes recorded by TollBit did not actually comply with robots.txt permissions. In categories such as deals and shopping, non-permitted scrapes exceeded permitted ones by a factor of four.
OpenAI’s ChatGPT User bot showed the highest rate of non-compliance among major bots, accessing blocked content in 42 per cent of cases. TollBit argues that this environment encourages increasingly sophisticated evasion strategies, including IP rotation, user agent spoofing, and cloud based headless browsers.
Each layer of evasion and detection adds computational overhead. Bots expend more resources to appear human, while websites consume more resources attempting to identify and block them. From an environmental standpoint, this escalation increases energy use without delivering proportional value to end users.
Low Referral Traffic And Structural Implications
The sustainability issue is closely tied to the economics of online publishing. TollBit reports that referral traffic from AI applications remains extremely low and continues to decline. Average click through rates from AI tools actually fell from 0.8 per cent in the second quarter of 2025 to 0.27 per cent by the end of the year.
Even websites with direct licensing agreements saw some pretty sharp declines. For example, click through rates for sites with one-to-one AI deals fell from 8.8 per cent early in 2025 to 1.33 per cent in the fourth quarter. This indicates that licensing arrangements alone are not insulating publishers from reduced human traffic.
The result, therefore, appears to be a system in which machines read and reuse content at scale, while fewer people visit the original sources. For example, TollBit’s report states that “AI traffic will continue to surge and replace direct human visitors to sites”, pointing to a future in which automated systems become the primary readers of the internet.
The data suggests that this transition is already underway, with some significant implications for sustainability, digital infrastructure, and the long-term viability of the content ecosystem that AI systems depend on.
What Does This Mean For Your Organisation?
The picture emerging from TollBit’s data seems to be one of structural change rather than a short-term disruption, where AI systems are no longer just indexing the web or training on it in the background. In fact, it seems they are now repeatedly consuming live content at scale, with clear consequences for energy use, infrastructure efficiency, and the sustainability of the wider digital ecosystem. Without changes to how AI systems access content, the current pattern risks locking in higher energy demand and escalating inefficiencies across both AI development and online publishing.
For UK businesses, this trend has practical implications on several fronts. For example, organisations increasingly relying on AI tools for research, search, and decision support are indirectly contributing to rising digital energy use and associated emissions. At the same time, UK publishers, professional services firms, and content driven businesses face growing operational costs from defending their websites against automated access, while seeing diminishing human engagement in return. These pressures sit alongside wider regulatory and sustainability expectations, particularly as UK businesses are required to demonstrate progress on energy efficiency, emissions reporting, and responsible technology use.
For AI developers, publishers, regulators, and end users, the data shows that the current scrape and block dynamic appears inefficient, costly, and environmentally counterproductive. If AI systems are to become permanent fixtures in how information is accessed, it looks as though the underlying mechanics of content access will need to evolve in a way that supports sustainability, fair value exchange, and long-term viability. Without that recalibration, the growth of AI driven web consumption risks undermining both the digital economy it depends on and the sustainability goals many organisations are now expected to meet.