Blog/Shadow AI
Shadow AI

What 22 Million AI Prompts Reveal About Your Employees' AI Habits

Satya Vegulla·Co-founder, Vloex·March 8, 2026·11 min read
22.4M

enterprise AI prompts analyzed — here's what they reveal

When Harmonic Security analyzed 22.4 million enterprise AI prompts in their 2025 research report, the findings confirmed what security teams suspected but couldn't prove. The data is now public, and it paints a detailed picture of how employees actually use AI at work — not how policies say they should.

The numbers are uncomfortable. Not because employees are doing anything malicious — they're not. They're trying to do their jobs faster. The problem is that the tools they choose, the accounts they use, and the data they share create risks that most security teams can't see, let alone manage.

The Personal Account Problem Is Worse Than You Think

The single most important finding from the research: 73.8% of ChatGPT usage in enterprises happens on personal accounts. Not corporate accounts with enterprise data protection agreements. Personal accounts — with consumer-grade terms of service, minimal data handling guarantees, and in many cases, training on user input enabled by default.

For Google Gemini, the number is even starker: 94.4% of enterprise usage happens on non-corporate accounts. These aren't outlier employees — this is the dominant usage pattern. The corporate AI account is the exception, not the rule.

Three-quarters of your company's ChatGPT usage is happening on accounts you don't control, can't monitor, and can't audit.

What Employees Are Actually Sharing

The research breaks down sensitive data exposure by type. The distribution reveals which departments and use cases create the most risk:

Source code (most common). Engineers paste code into AI tools for debugging, refactoring, and code review. This includes proprietary algorithms, security-critical logic, infrastructure configuration, and occasionally hardcoded credentials. A single code snippet can expose your architecture, your security posture, and your intellectual property simultaneously.

Customer data. Support teams paste customer conversations for summarization. Sales teams paste CRM records for analysis. Product teams paste user research transcripts. Each interaction potentially exposes PII, account details, and usage patterns to third-party AI providers.

Legal documents. Legal teams use AI to draft contracts, review agreements, and research case law. When the input includes client names, deal terms, or privileged communications, the data exposure extends beyond the organization to its clients and counterparties.

Financial data. Finance teams use AI for modeling, forecasting, and report drafting. Revenue projections, M&A analysis, and compensation data in AI prompts create insider information exposure risk. For public companies, this intersects with SEC disclosure requirements.

The Free-Tier Blind Spot

Here's the number that should reframe your entire AI security strategy: 16.9% of all sensitive data exposure happens on personal free-tier accounts. Not enterprise accounts. Not even personal paid accounts. Free-tier accounts — the ones with the weakest privacy protections and the broadest training-on-input policies.

ChatGPT Free accounts are responsible for 87% of these free-tier exposures. These accounts default to training on user input, meaning your company's proprietary data could influence the model's responses to other users. The employee didn't intend this. They just wanted a quick answer. But the data processing implications are significant.

The 21.81% of sensitive data that went into tools actively training on user inputs is the most concerning finding. That's not theoretical risk — it's data that has already been processed, learned from, and potentially incorporated into model weights. It cannot be recalled.

Why Blocking Doesn't Work

The natural reaction to these numbers is to block everything. The data shows why this fails: usage of unsanctioned AI tools increased 68% in 2025 despite growing corporate awareness and increasingly restrictive policies. Employees don't route around AI restrictions because they're reckless — they do it because AI genuinely makes them more productive, and the productivity gap between AI users and non-users is widening every quarter.

  • Blocking ChatGPT at the network level pushes usage to mobile devices and personal networks — you lose all visibility
  • Restricting AI tool approvals to a committee creates a 6-8 week delay — employees use personal accounts in the meantime
  • Requiring enterprise accounts without providing them creates the worst outcome: a policy nobody follows and nobody enforces
  • Each new restriction teaches employees which workarounds are effective — you're training them to evade your controls

Usage of unsanctioned AI tools increased 68% in 2025. Your employees will use AI regardless of your policy. The question is whether you'll have visibility into how.

The Monitor-Coach-Enforce Playbook

The 22-million-prompt dataset suggests a different approach: meet employees where they are, not where you wish they were.

Step 1: Deploy passive monitoring. Understand your actual AI usage patterns before writing a single policy. Which tools? Which departments? What data types? Connect your workspace for instant OAuth-connected app discovery. Add browser detection for interaction-level visibility. Gather 2-4 weeks of baseline data.

Step 2: In-browser coaching. When an employee is about to paste an API key into ChatGPT, show them a real-time notification: "This looks like a credential. Consider removing it before sending." Don't block — educate. The research shows 73% of employees modify their behavior after a single coaching prompt. The key word is prompt, not block.

Step 3: Targeted enforcement. Block only the highest-risk patterns: production credentials in any AI tool, PII entering tools without enterprise agreements, source code entering tools that train on input. This should be less than 5% of interactions. Everything else gets coached, not blocked.

Calculating Your Exposure

You can estimate your organization's AI data exposure risk with a simple formula. Multiply your employee count by your estimated AI adoption rate (typically 60-80% for knowledge workers). Multiply that by the sensitive data rate from the research (roughly 8-10% of prompts contain sensitive data). The result is the number of sensitive data exposures per month you're probably not seeing.

For a 200-person company with 70% AI adoption: 200 x 0.70 x ~30 prompts/week x 0.09 = roughly 378 sensitive data exposures per week. Most of these are low-severity (an email address, a phone number). But some are high-severity (an API key, a customer list, source code). And you're currently seeing zero of them.

From Data to Action

The 22-million-prompt dataset isn't an abstract research report. It's a preview of what's happening inside your organization right now. The specific numbers will differ — your company might have higher or lower AI adoption, different data sensitivity profiles, different tool preferences. But the patterns are universal: personal accounts dominate, free-tier usage concentrates risk, and blocking pushes usage underground.

Vloex shows you your own version of this data — in real time, for your organization. See which AI tools your team uses, on which accounts, with what data. Monitor, coach, and enforce without blocking productivity. Get started free.

shadow AIdata exposuresensitive dataemployee AI usageChatGPTenterprise AI
SV

Satya Vegulla

Co-founder, Vloex

Ready to see your AI landscape?

Connect your workspace. Get instant visibility. No agents required.

Get Started Free