Why Did LinkedIn Opt Us All Into AI Training? Understanding the Business Rationale Behind Data Leverage
Recently, one of my most engaging posts on LinkedIn was about how the platform has begun training large language models (LLMs) using LinkedIn data—opting all of us in by default. The post raised a lot of questions, and understandably so. This isn’t about demonizing LinkedIn or any of the companies working with AI. I know plenty of good people at tech companies trying to figure out how to navigate this landscape responsibly. But there’s a bigger picture here. To understand why LinkedIn is doing this, you need to understand the underlying business logic that’s driving such decisions.
Why Are Companies Leveraging Proprietary Data for AI?
At its core, this is about what sets companies apart in the competitive AI landscape. Everyone can access the same base dataset: the public internet. So how do companies make their specific products better? How do they create compelling, differentiated AI tools? The answer lies in proprietary data—the unique, non-public data that these companies hold.
In the case of LinkedIn, their platform is a goldmine of user-generated data. By incorporating this data into AI models, LinkedIn can create smarter, more tailored AI-driven products, potentially reshaping how we network, job search, and do business. And they aren’t alone—many companies are racing to do the same. If you want to understand how this all works, I recommend checking out my course, .
But as much as this is an opportunity, it’s also a call to action. Both businesses and individuals need to start protecting their proprietary data before it becomes someone else’s competitive advantage.
Businesses: How to Protect Your Data While Still Embracing AI
The excitement around AI is palpable, but it can’t come at the cost of your core differentiator—your data. Here are three things businesses should implement to safeguard their valuable data while still fostering innovation:
Evaluate GenAI services for data security: It’s crucial to carefully vet the GenAI tools employees use. Find services that meet your company’s security standards, but don’t simply block access to all GenAI tools. If you try to limit usage too much, employees may go rogue without realizing it, exposing your data in the process. Instead, identify approved tools and make clear why certain services are off-limits.
Encourage experimentation with clear guidelines: Let your employees experiment with GenAI to organically learn how to integrate it into their workflows. This experimentation can drive innovation, but it needs to happen within a framework of approved tools and processes. Create an open environment for employees to explore AI while ensuring they understand the guardrails.
Audit your publicly accessible content: Companies often have proprietary content that they think is locked down but is actually publicly accessible to AI web crawlers. Here’s a personal example: my team was recently vetting an AI chatbot tool for the University of Pittsburgh. As part of its training stage, it quickly scanned our IT website and uncovered publicly accessible knowledge base pages that were outdated and deactivated. These pages weren’t available through a simple Google search, but the AI crawler still found them. This illustrates how businesses must perform regular audits of their online content—especially content that might be unintentionally exposed to crawlers. Locking down sensitive content is crucial to protect your proprietary data from ending up in future AI models.
Individuals: How to Protect Your Data from Becoming AI Fuel
As individuals, we need to recognize that our data is just as valuable as our attention. Documentaries like The Social Dilemma have opened our eyes to how our attention is being monetized, but today it’s our actual data that is going into these LLMs. Here’s what you can do to protect yourself:
Be mindful of what you put out publicly: Every piece of data you share—whether it’s a social media post, blog article, or video—can be used to train AI models. If that content is accessible to web crawlers without any form of authentication, it’s fair game. You may not realize it, but your data is likely being swept up into large AI models that you have no control over. So, think carefully about what you make publicly accessible.
Reevaluate your relationship with “free” platforms: Many social media platforms or free tools may be using your data to fine-tune their AI models or even selling it to third parties. While it’s tempting to use these platforms, you have to ask yourself if it’s worth the trade-off. You might be giving away your personal data in exchange for “free” services.
Take control of your digital footprint: Actively review what you’ve shared online and determine if it’s still content you want publicly available. You may need to remove or restrict access to certain pieces of content that could be scraped by web crawlers and used in ways you didn’t intend. Taking ownership of your digital footprint is critical in the age of AI.
Our Realization at bluefoxinsights.ai
At bluefoxinsights.ai, a few of us wanted to create an AI community to help both individuals and businesses navigate this AI-driven world, including how to protect themselves. However, we also realized that to live out the advice we’re giving, we will eventually need to make parts of our site paid in order to sustain and grow the business. This approach allows us to avoid monetizing users’ data without direct permission from you all. For now, we’re keeping everything free, including articles like this one, which can be accessed and shared on LinkedIn without needing an account on our website. However, over time, even the free parts of our website will require users to create an account and log in, while also verifying that they are not bots. This isn’t about driving growth for the sake of it—although we do want the community to grow—it’s about protecting our content, our voices, and our values.
If you’ve made it this far, I hope you see that our goal is to create a safe, secure, and trustworthy AI community where businesses and individuals can thrive. We hope you create an account and we hope that you share this community with your friends. We will do our best to help you move ahead in our AI-enabled future.
Author’s Note:
The ideas and content in these articles come directly from me, but I do engage in conversations with AI to get feedback on my thoughts before producing the final pieces. While AI helps me refine the structure and flow, all the insights and ideas are my own. These articles are NOT robo-generated.