Tips for Machine Learning Success

Explore top LinkedIn content from expert professionals.

Summary

Mastering machine learning success involves understanding data quality, aligning AI projects with clear goals, and managing the intricacies of model performance and deployment. It's about avoiding common pitfalls while using structured approaches to achieve impactful results.

Understand your data: Ensure your training and testing datasets come from the same distribution by using techniques like adversarial validation to address potential mismatches early on.
Identify real business needs: Focus on solving real, recurring problems that align with organizational priorities rather than chasing trends or implementing AI without a clear purpose.
Iterate and refine: Continuously evaluate and adjust model parameters, prompts, and components to improve performance and maintain reliability in deployment.

Summarized by AI based on LinkedIn member posts

Santiago Valdarrama

Computer scientist and writer. I teach hard-core Machine Learning at ml.school.

119,908 followers 1y
Report this post
I want to show you a clever trick you didn't know before. Imagine you have six months' worth of data. You want to build a model, so you take the first five months to train it. Then, you use the last month to test it. This is a common approach for building machine learning models. Unfortunately, you may find out your model works well with the train data but sucks on the test data. Overfitting is not weird. We've all been there. But often, the worst you can do is try and fix it before understanding why it’s happening. Ask anyone about this, and they will give you their favorite step-by-step guide on regularizing a model. They will jump right in and try to fix overfitting. Don’t do this. There's a different way. A better way. Here is the question I want you to answer before you start racking your brain trying to fix a model: Do your test and training data come from the same distribution? When building a model, we assume the train and test come from the same place. Unfortunately, this is not always the case. Here is where the trick I promised comes in: 1. Put your train and test set together. 2. Get rid of the target column. 3. Create a new binary feature, and set every sample from your train set to 0 and every sample from the test set to 1. This feature will be the new target. Now, train a simple binary classification model on this new dataset. The goal of this model is to predict whether a sample comes from the train or the test split. The intuition behind this idea is simple: If all your data comes from the same distribution, this model won't work. But if the data comes from different distributions, the model will learn to separate it. After you build a model, you can use the ROC-AUC to evaluate it. If the AUC is close to 0.5, your model can't separate the samples. This means your training and test data come from the same distribution. If the AUC is closer to 1.0, your model learned to differentiate the samples. Your training and test data come from different distributions. This technique is called Adversarial Validation. It's a clever, fast way to determine whether two datasets come from the same source. If your splits come from different distributions, you won't get anywhere. You can't out-train bad data. But there's more! You can also use Adversarial Validation to identify where the problem is coming from: 1. Compute the importance of each feature. 2. Remove the most important one from the data. 3. Rebuild the adversarial model. 4. Recompute the ROC-AUC again. You can repeat this process until the ROC-AUC is close to 0.5 and the model can’t differentiate between training and test samples. Adversarial Validation is especially useful in production applications to identify distribution shifts. Low investment with a high return.
No more previous content

No more next content
35 Comments
Like Comment
Alexander Ratner

Co-founder and CEO at Snorkel AI

22,723 followers 1y
Report this post
In enterprise AI - '23 was the mad rush to a flashy demo - '24 will be all about getting to real production value Three key steps for this in our experience: - (1) Develop your "micro" benchmarks - (2) Develop your data - (3) Tune your entire LLM system- not just the model 1/ Develop your "micro" benchmarks: - "Macro" benchmarks e.g. public leaderboards dominate the dialogue - But what matters for your use case is a lot narrower - Must be defined iteratively by business/product and data scientist together! Building these "unit tests" is step 1. 2/ Develop your data: - Whether via a prompt or fine-tuning/alignment, the key is the data in, and how you develop it - Develop = label, select/sample, filter, augment, etc. - Simple intuition: would you dump a random pile of books on a student's desk? Data curation is key. 3/ Tune your entire LLM system- not just the model: - AI use cases generally require multi-component LLM systems (eg. LLM + RAG) - These systems have multiple tunable components (eg. LLM, retrieval model, embeddings, etc) - For complex/high value use cases, often all need tuning 4/ For all of these steps, AI data development is at the center of getting good results. Check out how we make this data development programmatic and scalable for real enterprise use cases @SnorkelAI snorkel.ai :)

10 Comments
Like Comment
Abhishek Rungta

Tech Partner for Growing Enterprises - AI/GenAI, Data Analytics/BI, Cloud & Cybersecurity, Product Engineering, Managed Services, GCC for 25+ years. Founder & CEO - INT.

44,054 followers 4mo
Report this post
AI projects are failing—not loudly, but quietly and often. Last week, I shared some learnings from AI initiatives we've run over the past couple of years. These were not theoretical ideas. These were real projects, built for real businesses, by real teams. Some succeeded. Some taught us what not to do. Warren Buffett: "The first rule is: don’t lose money." In the AI world, the first rule should be: don’t let the project fail. 🔁 1. Chasing AI without a real business problem This is the #1 reason AI projects fail. The excitement is real, but the clarity is missing. Too many initiatives start with, “We have to do something in AI. The Board/CEO wants it.” When you ask “Why?”—the answers get fuzzy. There’s often no alignment with a meaningful problem, no defined outcome, and no plan for business value. You must start with a sharp, urgent problem. Ask: - Is it real and recurring? - Is it costing us time, money, or customers? - Is solving it a priority for leadership? If the answer is lukewarm, drop it. Don’t chase hype—solve pain. 📉 2. No data, but big ambitions AI needs fuel—and that fuel is data. Most companies don’t even have decent dashboards, but they want AI to “think” for them. You can’t train models on instincts or opinions. AI needs history, decisions, edge cases, and volume. Before even thinking about AI, get your data stack in order: - Start capturing what matters. - Structure and cleaning it consistently. - Build visibility through dashboards. 🧠 3. Ignoring the role of context Even the best algorithms are clueless without context. What works in one scenario may totally fail in another. AI can’t figure that out on its own. Think of it like this: if I’m asked to speak at an event, I’ll want to know the audience, their challenges, the format—otherwise, I’ll miss the mark. AI is the same. Without business logic, edge conditions, and constraints, its outputs are generic at best, misleading at worst. ⚡ 4. Forgetting hidden and ongoing costs Many leaders assume AI is a one-time build. It’s not. Even after a model is trained, there’s hosting, fine-tuning, monitoring, guardrails, integrations, and more. And the infra isn’t free—especially if you’re using Gen AI APIs. Today, a lot of this cost is masked by subsidies from big players. But like every other tech cycle, the discounts won’t last. 🧭 So what should companies actually do? - Map where time and money are leaking internally. - Start capturing data in those areas—every day, every interaction. - Use dashboards and analytics before jumping to AI. - Identify where automation or decision support can create value. - Train your systems not just with data, but with your decision logic. And make sure AI is embedded where work happens—not in some separate tab. If your team needs to “go to ChatGPT”, they won’t. The AI has to come to them—right inside their workflows. 🚶♂️ Crawl → Walk → Run The hype will make you want to run. But strong AI systems are built the boring way.
No more previous content

No more next content
86 Comments
Like Comment
Fareed Mosavat

Visiting Partner, a16z speedrun. Product & Growth Advisor for PLG Companies.

10,229 followers 1y
Report this post
The latest episode of Unsolicited Feedback is an absolute must listen for anyone building AI at scale. This one goes deep, with a ton of technical insights from Ben Kus, Box CTO. From non-deterministic challenges to the transformative power of AI, Ben shares invaluable insights that can reshape how we approach AI in our businesses. Understand AI's Unpredictability 🤖 Building with AI means dealing with its non-deterministic nature—where the same input can yield different outputs each time. As Ben illustrates, "We’ve gotten to the point where if we add a period at the end of a prompt versus not, it’ll change the answer." Developers need to be meticulous, constantly testing and refining models to ensure consistent performance. Fine-Tune AI for Better Outcomes 🔍 Managing AI's unpredictability starts with fine-tuning interactions. One easy place to start - Try adjusting the "temperature" setting to control response randomness: Temperature 0: Precise and consistent responses. Temperature 1: Creative but varied outputs. Experimenting with different settings helps find the optimal balance for your use cases, significantly enhancing AI's utility. Customize Prompts for Each Model 📌 AI models require tailored prompts for best results. "We have to customize prompts per model, and we have to then manage the version history and control on those prompts," says Ben. This trial-and-error process is essential to identify which prompts work best with specific models. Leverage AI Feedback Loops for Continuous Improvement 💬 Mimicking human behavior, Box uses one AI to evaluate another's output. "You get an AI to tell you if another AI did a good job." This iterative process refines results, ensuring higher accuracy and reliability. Transform Unstructured Data into Usable Insights 🌐 AI can revolutionize how we handle unstructured data. By processing and structuring documents, images, and videos, AI creates valuable metadata, making it easier to analyze and leverage this data for traditional analytics. Practical Tips for Startups Choosing AI Models 💡 Ben's advice for startups focuses on practicality and cost-effectiveness: Start Simple: Use pre-existing models to save on infrastructure costs. Use Cloud Providers: Leverage providers like GCP, AWS, or Azure to simplify model management. Delay Optimization: Focus on product-market fit before optimizing infrastructure. Navigate AI Model Management with Strategic Grouping 🔄 Categorizing models into 'premium' and 'standard' based on performance and cost helps streamline decision-making. Evaluate key attributes such as hosting platform, safety, and open-source nature to ensure reliability. Embrace AI's Future: Continuous Learning and Adaptation 🔮 The true potential of AI lies in its ability to learn and adapt. Ben predicts that fostering a cycle of feedback and refinement will progressively enhance AI's accuracy and usefulness. Full episode linked in the comments. This one was full of insights!👇

8 Comments
Like Comment
Ashiq Rahman

CAIO | Risk-smart AI Advantage Advisor | Data Science AI ML Technical Leader

3,243 followers 1y
Report this post
𝐀 𝐟𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤 𝐟𝐨𝐫 𝐀𝐈 𝐒𝐮𝐜𝐜𝐞𝐬𝐬 𝐟𝐨𝐫 𝐩𝐫𝐨𝐣𝐞𝐜𝐭 𝐦𝐚𝐧𝐚𝐠𝐞𝐫𝐬 𝐚𝐧𝐝 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐬 - First, determine how much a failure by the AI costs you and how much a success saves or earns you. AI may still come out ahead but do the due diligence. Then, figure out the technical stuff: prompting, RAG, and fine-tuning strategies guided by evaluations and business metrics. 📚 𝐂𝐨𝐧𝐭𝐞𝐱𝐭: ‘Tuning for accuracy can be a never-ending battle with LLMs - they are unlikely to get to 99.999% accuracy using off-the-shelf methods.’ Colin Jarvis of OpenAI shared a well-written best practice document based on their experience in LLM deployment with the early customers: https://lnkd.in/g6y7Jpmx It covers both business and technical contexts and risks, including: • Knowing how to start optimizing accuracy • When to use what optimization method • What level of accuracy is good enough for production 💰 𝐖𝐡𝐲 𝐝𝐨𝐞𝐬 𝐢𝐭 𝐦𝐚𝐭𝐭𝐞𝐫? ‘For the business, it can be hard to trust LLMs after the comparative certainties of rules-based or traditional machine learning systems, or indeed humans! A system where failures are open-ended and unpredictable is a difficult circle to square.’ On the technical side, prompting, RAG, and fine-tuning can be confusing tasks. This write-up will guide the enterprise in successful AI deployments. #ai #llm #prompt #RAG #finetuning #businessvalue
No more previous content

No more next content
4 Comments
Like Comment

Tips for Machine Learning Success

Summary

More in Machine Learning Model Tuning

Explore categories