New paper shows that text embeddings are significantly less secure than previously assumed. The researchers developed a method that can translate embeddings between different models without requiring paired data or access to the original encoder. The security implications of these findings are concerning. The method enables attackers to extract sensitive information from embedding vectors alone, without access to source documents or the model that created them. In their experiments, researchers successfully recovered email content, medical diagnoses, financial information, and personal details with up to 80% accuracy on certain datasets. Many organizations currently treat embeddings as privacy-preserving abstractions, assuming that vector representations are inherently "safe" to share or store. This research challenges that assumption. If an attacker gains access to an embedding database through a breach, leak, or legitimate data sharing arrangement, they may be able to reconstruct sensitive information about the original documents. The technique exploits universal geometric patterns that appear to exist across different AI architectures, suggesting that semantic meaning has consistent mathematical structures that can be reverse-engineered. This work suggests that current privacy protections for vector databases may be insufficient. Organizations using embeddings with sensitive data should reassess their security assumptions and consider additional protective measures. Full paper: https://lnkd.in/dym5AtU4 #AI #security
Reassessing assumptions in data trust
Explore top LinkedIn content from expert professionals.
Summary
Reassessing assumptions in data trust means taking a closer look at the beliefs and practices around how we secure, interpret, and share data, especially as new risks and misunderstandings come to light. Data trust relies on more than just technical fixes—it’s about questioning old habits, ensuring transparency, and recognizing that data can be misunderstood or even misused if not carefully managed.
- Question old habits: Regularly scrutinize why you believe your data is safe or accurate, especially as technology and threats evolve.
- Communicate clearly: Make it easy for others to understand where your data comes from, how it’s handled, and what its limitations are.
- Challenge neutrality: Always consider the context, sources, and gaps in your data to avoid false confidence and reduce bias in decision-making.
-
-
One of the most popular false statements I hear is "issues with trust in data are mostly technical.” Examining the Statement - Common Belief: Trust problems vanish if we fix errors in code, tools, or pipelines. - Key Question: Can flawless data still be mistrusted if people don’t understand or believe in its source or purpose? Rethinking “Trust” Trust isn’t just about accuracy; it’s about transparency, context, and credibility. Even perfect data won’t be trusted if no one knows where it came from or why it matters. Think of a beautifully wrapped gift from a stranger. Without knowing what’s inside or who sent it, scepticism persists. In Practice - Documentation & Explanation: Show how data is collected, validated, and maintained. - Open Communication: Invite questions and be honest about limitations. - Cultural Acceptance: Foster an environment where challenges to data are welcomed and addressed. Trust in data isn’t earned by technical perfection alone. It grows when people understand, relate to, and find meaning in the numbers they rely on. #DataTrust
-
Last week at one of my AI workshops, someone asked me, “Are there ways to make AI bias-free with the kind of data we have?” And, of course, I gave a long, nerdy-like-comfortable-in-the-rabbit-hole response... But here is one key idea from that piece I want to share: we need to move away from the “data is data" mindset. Raise your hand if you have said this at least once and heard it more than once -“Data is data.” That three-word statement states formally that our data is a neutral entity, based on objective truth. Like those numbers on a chart/ screen/ excel sheet are the absolutes. And that is the problem—data is never just data. In using that statement as a fact, we risk to miss that ● a number without context is just a statistic that can be used (or misused) however we choose. ● a survey result without ‘why’ can reinforce the wrong assumptions. ● a trend in AI-generated insights is only as unbiased as the dataset it was trained on. If we take data at face value, we risk replicating the same biases, blind spots, and systemic inequities that got us here in the first place. I am not saying we do not trust what our data says. (I am a data scientist at heart, after all.) I am asking you to do the homework of building that “trust” into what our data says. (I am also a life coach, you see.) So, how do we go beyond these “absolutes”? Here are some examples: ● Add context to numbers Example: If your data tells you 60% of your donors are over 55. Ask why? What about your outreach, messaging, or history has led to this? Is anything missing from the picture? ● Ask ‘why?’ more than once Example: Data says program participation is dropping. Instead of stopping at “engagement is low,” again, ask: Why? Is it accessibility? Timing? Lack of awareness? ● Identify what is missing Example: Data only shows what was collected. Who wasn’t surveyed? What wasn’t asked? What assumptions were baked into the process of this historical data collection? ● Act on what data you collect Example: Evaluate and assess if there is alignment in your data collection and your actions. Are you asking for something and not acting on what you hear? Or are you not asking questions that can bring back negative reactions or feedback? If your data constantly confirms the status quo, challenge it. And when (not “if”) it reveals something uncomfortable, lean in. To go beyond the “data is data” mindset, we need to go beyond collecting data. Our work, then, is in listening to it, questioning it, making it available to all related and involved in the data, and yes, acting on it. #nonprofits #nonprofitleadership #community
-
From Assumptions to Knowledge: A Journey of Understanding Recognize Assumptions Assumptions are beliefs, ideas, or statements accepted as true without proof or evidence. They serve as the foundation for reasoning, decisions, or actions but can sometimes be misleading if unexamined. Ask Questions Adopt a curious mindset, use open-ended questions like why, how, and what if to challenge assumptions. Gather Evidence Seek data, expert opinions, and firsthand experiences. Use diverse and reliable sources to validate or refute assumptions. Test and Experiment Apply scientific (Six Sigma) or practical methods to test assumptions in real scenarios and be prepared for unexpected outcomes and adapt accordingly. Analyze and Reflect Compare evidence against initial assumptions. Identify patterns and draw insights to form a deeper understanding. Synthesize Knowledge Integrate validated information into a cohesive understanding and build frameworks or principles based on the new knowledge. Share and Reassess Communicate findings for feedback. Continuously review as new evidence emerges, ensuring knowledge evolves. By transitioning from unverified assumptions to evidence-backed knowledge, individuals and organizations can make informed decisions, innovate effectively, and achieve greater understanding.
-
“𝐘𝐨𝐮𝐫 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐛𝐢𝐚𝐬𝐞𝐝.” – 𝐅𝐢𝐯𝐞 𝐰𝐨𝐫𝐝𝐬 𝐭𝐡𝐚𝐭 𝐜𝐚𝐧 𝐝𝐞𝐫𝐚𝐢𝐥 𝐰𝐞𝐞𝐤𝐬 𝐨𝐟 𝐚𝐧𝐚𝐥𝐲𝐬𝐢𝐬. You have crunched the numbers, built the dashboards and presented insights… But the room is quiet. Eyes skeptical. Someone finally says it: Can we really trust this data? 👉 𝐇𝐞𝐫𝐞 𝐢𝐬 𝐡𝐨𝐰 𝐭𝐨𝐩 𝐚𝐧𝐚𝐥𝐲𝐬𝐭𝐬 𝐛𝐮𝐢𝐥𝐝 𝐭𝐫𝐮𝐬𝐭 𝐚𝐧𝐝 𝐜𝐨𝐦𝐛𝐚𝐭 𝐛𝐢𝐚𝐬 𝐜𝐨𝐧𝐜𝐞𝐫𝐧𝐬: ✅ Start with Transparency: Document your data sources, assumptions and limitations upfront. Example: Netflix openly publishes algorithm biases to refine their recommendations. ✅ Diversify Your Data: Relying on a single source? Big risk. Cross-verify with independent datasets. Example: During COVID-19, Johns Hopkins used multi-source data to avoid regional bias. ✅ Involve Stakeholders Early: Co-create metrics with business teams to reduce misalignment. Bias often hides in what is not measured. ✅ Audit Models Regularly: Bias creeps in quietly — periodic validation helps detect drift early. 👍 𝘐𝘧 𝘱𝘦𝘰𝘱𝘭𝘦 𝘥𝘰𝘯’𝘵 𝘵𝘳𝘶𝘴𝘵 𝘺𝘰𝘶𝘳 𝘥𝘢𝘵𝘢, 𝘵𝘩𝘦𝘺 𝘸𝘰𝘯’𝘵 𝘵𝘳𝘶𝘴𝘵 𝘺𝘰𝘶𝘳 𝘥𝘦𝘤𝘪𝘴𝘪𝘰𝘯𝘴. 𝘋𝘢𝘵𝘢 𝘪𝘴 𝘯𝘰𝘵 𝘫𝘶𝘴𝘵 𝘢𝘣𝘰𝘶𝘵 𝘣𝘦𝘪𝘯𝘨 𝘳𝘪𝘨𝘩𝘵. 𝘐𝘵’𝘴 𝘢𝘣𝘰𝘶𝘵 𝘣𝘦𝘪𝘯𝘨 𝘣𝘦𝘭𝘪𝘦𝘷𝘦𝘥. 📌 𝐇𝐨𝐰 𝐝𝐨 𝐲𝐨𝐮 𝐡𝐚𝐧𝐝𝐥𝐞 𝐬𝐤𝐞𝐩𝐭𝐢𝐜𝐢𝐬𝐦 𝐢𝐧 𝐚𝐧𝐚𝐥𝐲𝐭𝐢𝐜𝐬 𝐝𝐢𝐬𝐜𝐮𝐬𝐬𝐢𝐨𝐧𝐬? 𝐏𝐥𝐞𝐚𝐬𝐞 𝐬𝐡𝐚𝐫𝐞 𝐲𝐨𝐮𝐫 𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐚𝐧𝐝 𝐞𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐜𝐞𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐜𝐨𝐦𝐦𝐞𝐧𝐭𝐬! ♻️ Share this post with others, if you found it valuable. 🙏 Follow Devendra Kumar for more such actionable and insightful posts. #DataTrust #AnalyticsMindset #DataDrivenDecisions