Enterprises today are drowning in multimodal data - text, images, audio, video, time-series, and more. Large multimodal LLMs promise to make sense of this, but in practice, embeddings alone often collapse nuance and context. You get fluency without grounding, answers without reasoning, “black boxes” where transparency matters most. That’s why the new IEEE paper “Building Multimodal Knowledge Graphs: Automation for Enterprise Integration” by Ritvik G, Joey Yip, Revathy Venkataramanan, and Dr. Amit Sheth really resonates with me. Instead of forcing LLMs to carry the entire cognitive burden, their framework shows how automated Multi Modal Knowledge Graphs (MMKGs) can bring structure, semantics, and provenance into the picture. What excites me most is the way the authors combine two forces that usually live apart. On one side, bottom-up context extraction - pulling meaning directly from raw multimodal data like text, images, and audio. On the other, top-down schema refinement - bringing in structure, rules, and enterprise-specific ontologies. Together, this creates a feedback loop between emergence and design: the graph learns from the data but also stays grounded in organizational needs. And this isn’t just theoretical elegance. In their Nourich case study, the framework shows how a food image, ingredient list, and dietary guidelines can be linked into a multimodal knowledge graph that actually reasons about whether a recipe is suitable for a diabetic vegetarian diet - and then suggests structured modifications. That’s enterprise relevance in action. To me, this signals a bigger shift: LLMs alone won’t carry enterprise AI into the future. The future is neurosymbolic, multimodal, and automated. Enterprises that invest in these hybrid architectures will unlock explainability, scale, and trust in ways current “all-LLM” strategies simply cannot. Link to the paper -> https://lnkd.in/gv93znbQ #KnowledgeGraphs #MultimodalAI #NeurosymbolicAI #EnterpriseAI #KnowledgeGraphLifecycle #MMKG #AIResearch #Automation #EnterpriseIntegration
Why multimodal reasoning builds trust
Explore top LinkedIn content from expert professionals.
Summary
Multimodal reasoning refers to AI systems that use different types of data—such as text, images, audio, and graphs—to make informed decisions, and this approach builds trust because it enables transparent, explainable, and context-rich solutions. By combining multiple ways of understanding and reasoning, these systems help users see how conclusions are reached, making AI less of a "black box."
- Show your work: Use reasoning methods that create visible trails of how answers are generated so people can see the evidence and logic behind each decision.
- Blend perspectives: Integrate both human-like imagination and logical structure, such as visualizing ideas or using knowledge graphs, to provide richer, more relatable problem-solving.
- Tailor to needs: Adapt multimodal reasoning for specific industries by combining relevant data types and refining rules to match unique challenges, increasing confidence in AI-powered results.
-
-
📑Fascinating - with the new Multimodal Visualization of Thought (MVoT), researchers have added the power of visual imagination to AI models to enable spatial reasoning. In "Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (MVoT)", they introduce an exciting approach that allows Multimodal Large Language Models (MLLMs) to visualize their thought processes. By integrating visual reasoning with traditional verbal reasoning, these models are now capable of handling spatial tasks in ways that closely mimic human imagination. Why This Matters? 🔹 Improved Spatial Reasoning: Many industrial challenges—whether in engineering design, manufacturing, or logistics—demand a deep understanding of spatial relationships. MVoT significantly enhances the ability of AI to tackle these tasks. 🔹 Increased Interpretability: By visualizing its "thoughts," the model offers greater transparency, making it easier to understand how decisions are made and fostering trust in AI applications. 🔹 Real-World Impact: Think about optimizing manufacturing layouts, designing more efficient supply chains, or simulating engineering systems—all areas where spatial reasoning and visualization are crucial. MVoT’s ability to imagine opens doors to smarter and faster problem-solving. 𝗜𝘁'𝘀 𝗳𝗮𝘀𝗰𝗶𝗻𝗮𝘁𝗶𝗻𝗴 𝘁𝗼 𝘀𝗲𝗲, 𝘁𝗵𝗮𝘁 𝗰𝗼𝗽𝘆𝗶𝗻𝗴 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 𝗳𝗿𝗼𝗺 𝗼𝘂𝗿 𝘁𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝗮𝗹𝘀𝗼 𝗵𝗲𝗹𝗽𝘀 𝘁𝗵𝗲 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗼 𝗽𝗲𝗿𝗳𝗼𝗿𝗺 𝗯𝗲𝘁𝘁𝗲𝗿! #LLM #Research #NLP #Science Janine Wagner-Dittrich | Reyhan Merekar | Jiri Kram | Nick Rosa | Dominik Krimpmann, PhD
-
What if we added symbolic logic and multi-agent planning to GraphRAG, for retrieving and reasoning over chemistry knowledge? That’s MOSES. GraphRAG pairs LLMs with knowledge graphs (KG) to improve retrieval and grounding through entities and relationships. But in scientific domains, where relationships are multiscale, logic-heavy, and often implicit, LLMs need additional scaffolding. Two mechanisms help: 🔹Ontology as a compass: formalize hierarchy, properties and constraints to enable precise querying and logical inference. 🔹Multi-agent systems as workers: divide tasks into preprocessing, planning, validation, and refinement steps. MOSES (Multi-agent Ontology System for Explainable Knowledge Synthesis) combines both for chemistry. Its ontology-based, agentic workflow: 1️⃣ Generates hypothesis: proposes likely mechanisms, relevant entities, and the structure of a complete answer. 2️⃣ Parses query: extracts intent and key entities, maps them to ontology classes, and classifies the query type. 3️⃣ Plans and executes: formulates a detailed execution plan and retrieve structured information from the KG. 4️⃣ Validates & iterates: Checks and refines results for completeness and logical coherence; format final output. This is less about producing "better answers" than explainable answers, with a visible trail of what was asked, what was found, how concepts were connected, and where evidence originates. Based on consistent feedback from scientists, this traceability and explainability are what build trust, and make LLMs a credible partner in scientific discovery. 📄 MOSES: combining automated ontology construction with a multi-agent system for explainable chemical knowledge reasoning, ChemRxiv, October 1, 2025 🔗 https://lnkd.in/e89vB6_V