The Invisible Data Battle: How AI Became a Cybersec Professional’s Biggest Friend and Foe
According to industry estimates, cybercrime grew by 15% year-on-year from 2021 to 2025. If this trend continues, by the end of 2025, it will likely cost the world about $10.5 trillion. It should come as no surprise, therefore, that as the incidence of cybercrime increases, cybersec teams are becoming faster at detecting threats.
Unfortunately, with powerful AI tools getting more and more accessible, both sides are constantly becoming better at what they do. Even the tools used by criminals and their hunters are sometimes the same. In this article, we’ll explore this dynamic from the standpoint of AI, namely, how it is used and abused and what new techniques security experts have been working on lately.
The Advanced Cybercriminal
For the last 10 years, continuous advancements in machine learning (ML) have enabled cybersecurity professionals to detect meaningful patterns and discrepancies in large and messy datasets that might otherwise go unnoticed. For instance, ML models are good at flagging suspicious login attempts, whether successful or not, based on analysis of a company’s records.
Since much of the emerging malware is just iterations of the same code, ML systems can be trained on several decades’ worth of historical data to flag the latest version of that code as soon as it appears. AI-powered scrapers are also being used to automatically scan emails, email attachments and visit links for any early signs of a phishing scam.
Yet, cybercriminals also evolve with the help of AI — these people often are professionals with advanced knowledge and malicious intent. One of the most effective ways for them to employ AI is by making cybersec companies’ web scrapers ineffective. There are various techniques for this. A common one is to block specific user agents and IP addresses considered likely to belong to cybersec teams.
Cybercriminals may also limit the number of times a request can be sent to their websites from the same IP address. While this is normally used to filter out potentially malicious non-human agents, fraudsters weaponize it against legitimate web scrapers.
For extra protection, advanced fraudsters deploy Java-based anti-bot protection features like CAPTCHA challenge-response tests, which automated systems find hard to deal with. Coupled with that, they have yet another trick up their sleeves — regularly changing their sites’ HTML and CSS codes. This often throws web scrapers off their scent, as they rely on detecting patterns in a site’s HTML markup to get their data-fetching scripts working.
AI and ML opened opportunities to automate all these tactics at scale and deploy them in real time. Moreover, AI also became a way to thwart cybersec efforts by blocking threat intelligence collection efforts and opened yet unseen opportunities to exploit people and extort data. AI is extremely effective when creating honeytraps. For instance, scammers use AI tools to generate deepfake videos or convincingly mimic real people’s voices. Honeytraps can be set in different places, from phony dating sites to existing social media platforms.
AI to the Rescue
By employing reliable residential proxy networks, cyber teams were able to bypass many anti-scraping strategies, but only partially. Fortunately, AI is a double-edged sword, and cyber teams increasingly employ AI-powered scraping tools to enhance web intelligence collection efforts. Scrapers are crucial for cyber threat detection and prevention as they enable the collection of threat intelligence from both the dark web and seemingly “normal” ordinary websites to detect different cybercrime traces, such as leaked data.
ML techniques make cybersec scrapers more immune to anti-scraping techniques. With AI and ML, one can automate various tasks, from generating fingerprints and managing proxies to response recognition and maintaining parsing pipelines. All these tasks take a lot of time and effort for developers when done manually. Additionally, by monitoring the various criteria that websites have, such as domain age, URL depth and others, scrapers can figure out whether a given website is harmless or nothing but a phishing trap.
The Rise of Smart Scrapers
With all of these boobytraps and stonewalling techniques in mind, cybersec professionals have been working on smart scrapers for years, and they’re finally here. A “smart” or “adaptive” scraper uses natural language processing (NLP) and machine learning to handle dynamic content and intricate website architectures (e.g., nested categories and varied page layouts), bypass IP blocking and rate limiting via rotating proxies, deal with CAPTCHAs, login forms and cookies — and even provide real-time data updates.
For instance, adaptive scrapers can identify the structure of a web page by analyzing its document object model (DOM) or by following specific patterns, and this allows for dynamic adaptation. AI models like convolutional neural networks (CNNS) can also detect and interact with visual elements on websites, such as buttons. In fact, smart scrapers can even mimic human browsing patterns with random pauses, mouse movements and realistic navigation sequences that bypass behavioral analysis tools.
And that’s not all. AI-powered web scrapers can modify browser configurations to mask telltale signs of automation (such as headless browsers that run without a traditional graphical interface) that anti-bot systems look for. Adaptive scrapers can also understand deliberately obfuscated content — text rendered as images or CSS-based text scrambling — through NLP techniques. The most sophisticated of the bunch continuously learn from detection attempts, automatically adjusting their approach when blocked.
Now that AI is hitting its stride, a new concept has begun to emerge: Predictive scraping. At the most basic level, this refers to a scraper capable of anticipating changes in specific data points and website structures, thereby allowing businesses to preemptively collect the information they need. Predictive scrapers can also search for leaked credentials or personally identifiable information (PII) across the most valuable sections of the dark web and maintain continuous visibility into cybercriminal activities. This can be used to flag anomalies that might indicate emerging threats, such as new exploit discussions, zero-day vulnerabilities, or discussions of an impending attack.
AI as a Force Multiplier for Both Sides
In the ongoing technological arms race between cybercriminals and their pursuers, AI acts as a force multiplier for both sides, enabling brand new approaches both to security operations and criminal tactics, not to speak of AI’s contribution to “democratizing” cybercrime by lowering its bar of entry.
Arguably, the most significant impact of AI has been the dramatic acceleration of the attack-defense cycle: What once took months now happens in hours as AI systems quickly learn and adapt to emerging hazards. This compression challenges not only traditional security models but also the regulatory frameworks in force today.
It’s hard to say how all of this will shake out in the future, but there’s good reason to believe that it will include cybersecurity shifting more and more towards proactive and predictive approaches. That’s one. Second, it’s important to remember that while AI offers powerful defensive capabilities, it cannot replace human judgment and ethical oversight. Thus, the cybersec future will depend not just on technology but also on governance. I believe we’re likely to see not only more AI-based proactive solutions but also smarter (if not stricter) regulation.