Name: "Chuck Data: A Terminal Tool for Customer Data Models in Databricks" | ChuckData posted on the topic | LinkedIn
Uploaded: 2025-10-01T22:10:37.606Z
Duration: 2 min 59 s
Channel: ChuckData
Description: Tired of writing boilerplate code just to wrangle customer data? Chuck Data runs natively in your terminal and uses natural language to build and manage customer data models in Databricks. What used to take days of coding now takes minutes. Built for data engineers, by data engineers. Check it out!

ChuckData

70 followers

1mo

Tired of writing boilerplate code just to wrangle customer data? Chuck Data runs natively in your terminal and uses natural language to build and manage customer data models in Databricks. What used to take days of coding now takes minutes. Built for data engineers, by data engineers. Check it out!

Transcript

Hey, I'm Derek. Welcome. Super excited to show you. Chuck Data, we're this way. Alright, I'm ready. I'm ready if you're ready. Hi, I'm Joyce, head of AI at Amperity, and I'm Derek, CTO and cofounder at Amperity We love what we're seeing in the AI community for enabling people to write code and make their jobs easier. These tools are amazing for software coders and creative roles, but we've been asking ourselves how can we apply these concepts to improve the lives of customer data engineers? We've been working on some great new tools, one of which we are launching today. We're launching Chuck Data as a research review. Chuck is an agentic data engineering tool built for Databricks that works with you directly in your terminal. We want to show you an example of it in action. So you can see here we have a Databricks account with the variety of tables from different sources for our business, ecommerce profiles, point of sale profiles, and loyalty profiles. Let's see what Chuck can tell us about them. Here. You see, Chuck has profiled the customer tables and reports back relevant tables and details about each. For almost a decade, Amperity has been helping the world's biggest brands build tailored customer data models. Our biggest innovation has been our identity resolution, which we call Stitch. Our first version of Chuck is bringing this capability to users without all of the enterprise contracts running right in your Databricks account. We're starting with ID resolution because it's the backbone of customer data. Let's have Chuck run ID resolution directly in our Databricks account. I'll simply ask Chuck to build an identity graph. What's happening behind the scenes is that Chuck is looking at my community catalog for tables that have customer, PII and them. Chuck will then apply semantic tags to those tables which allows Chuck to standardize and unify the data as part of the STITCH process. Before Chuck applies our patented ID resolution algorithms and creates an identity graph. Here we can see Chuck has identified tables let have identity information and triggered a job to run the Stitch process. Now when I come back into Databricks I can open my job runs and see the stitch job that's currently running. Now that the stitch job is complete, I can see that Chuck generated. My new ID graph data asset in my Unity catalog called Unified Identity. This table is a single ID for each customer and includes all of the customer data and identifiers from each source table. This table is great for use cases like creating a customer 360 or compliance. Chuck also created a notebook and explaining the new ID graph and is showing me a view of my customers I haven't had before. I can see improvement in known customers, the duplication rate of each of the table Stitch used, and some insights about the quality and confidence of my new ID graph. We've seen agents be. Major step forward across the industry for customer service, creative work, and software development. Chuck is a major step forward for data engineering teams by applying these same concepts to the complex tasks data teams are faced with every day. We are so excited to see what data teams build with Chuck.

To view or add a comment, sign in

More Relevant Posts

Kiran M.

Data Engineer @ Johnson & Johnson | Data Science | AI | IMMEDIATE JOINER
1mo Edited
Report this post
The Modern Data Stack is amazing... and a bit of a mess. Every week, I come across a new “must-have” data tool, a shiny connector here, a new orchestration layer there. Don’t get me wrong, I love the modern data stack. dbt, Snowflake, Airbyte, Dagster, all of them have made our lives as data engineers much better. But at some point, you start realizing something: 👉 Simplicity scales better than hype. We spend weeks evaluating tools that promise “no-code” transformations or “self-healing” pipelines, but then we still end up debugging JSON configs at 2 AM. Here’s what I’ve learned after a few projects in 2025: Focus on data contracts early, broken schemas cause 80% of downstream pain. Metadata and lineage aren’t optional anymore, they save hours of blame-game debugging. Don’t build a Ferrari when a reliable Toyota (simple ELT to a warehouse) does the job. A smaller, well-understood stack always beats a “trendy” one. The modern data stack isn’t bad, it’s just growing up. And maybe the next evolution isn’t “more tools”… but more discipline. Curious to hear from others. What’s one “fancy” data tool you stopped using because it overcomplicated things? #DataEngineering #ModernDataStack #ELT #DataPipelines #AnalyticsEngineering
Like Comment
To view or add a comment, sign in
Dan Dormody

Enterprise Account Executive
1mo
Report this post
Big news in the data engineering world today! Everyone’s talking consolidation. Matillion's evolving. I'm not a data engineer, and I just built an end to end Snowflake pipeline for retailers to understand their customers. My prompt was "Build a customer intelligence and product affinity pipeline." In Fivetran: Configure 3 connectors, set up schedules, pay per row. Then, in DBT: Hand code 5+ SQL files, debug, define dependencies in YAML, run to execute. With Maia, it's minutes. Combining tools doesn’t solve the delivery bottleneck. It just consolidates it. We're redefining data engineering for what’s coming next.
3 Comments
Like Comment
To view or add a comment, sign in
Databricks

1,073,008 followers
1mo
Report this post
Our Data Science Agent turns Databricks Assistant into an autonomous partner for data science and analytics. Fully integrated with Databricks Notebooks and the SQL Editor, our Data Science Agent accelerates analysis by exploring data, running and fixing code, training models, and summarizing insights — all from a single prompt. See how to get started: https://lnkd.in/gFBh7VSh
7 Comments
Like Comment
To view or add a comment, sign in
TheNextGenTechInsider.com

188 followers
1mo
Report this post
🌟 New Blog Just Published! 🌟 📌 Docker for Data Science: Why It’s a Game-Changer 🚀 ✍️ Author: Hiren Dave 📖 In the rapidly evolving landscape of data science , the gap between a prototype that dazzles on a laptop and a production system that runs reliably on a cluster is widening. Docker bridges that gap...... 🕒 Published: 2025-10-18 📂 Category: Tech 🔗 Read more: https://lnkd.in/dd_Qkt2j 🚀✨ #dockerdatascience #reproducibledatasc #datasciencecontain
Like Comment
To view or add a comment, sign in
PRAVEEN SINGH 🇮🇳

@ AWS Data Engineer || Python ||PySpark|| AWS service (EC2||S3 bucket||IAM|| Glue|| Lambda|| Athena|| Redshift|| EMR.. etc.)||SnowFlake || Airflow|| Git/GitHub|| Pandas ||MySQL || PowerBI || Jira|| Agile||
3w
Report this post
🚀 Master Spark Like a Pro — From DAG to Optimization! Ever wondered how Spark actually transforms your code into parallel execution magic? Let’s break it down 👇 ✨ Driver is the brain — it plans, schedules, and manages. ⚙️ Executors are the muscles — they execute tasks in parallel. 🧩 Each transformation — whether narrow or wide — decides how efficiently your data moves across partitions. But here’s the real power tip 💡 ✅ Avoid shuffles whenever possible. ✅ Optimize partitions with repartition() or coalesce() wisely. ✅ Keep an eye on skewness — one bad partition can crash your performance! 🔍 Remember: A good Spark engineer doesn’t just write transformations — they engineer performance. I recently explored a detailed Spark guide covering everything — from Schema inference to Adaptive Query Execution (AQE) and Data Skew Optimization — and it’s a goldmine for anyone serious about Data Engineering. #DataEngineering #ApacheSpark #BigData #ETL #PerformanceTuning #PySpark #LearningJourney

2 Comments
Like Comment
To view or add a comment, sign in
Akshay Boddhul

Senior Data Engineer | Databricks Certified | PySpark • Spark • Delta Lake | Building Lakehouse Platforms That Save Millions
1mo
Report this post
Quick Spark Cache Tip for Faster Iterations 🔄 Senior Data Engineers, iterating on ML models? Cache your DataFrame after heavy transforms to avoid recomputing – it’s a game-changer for dev cycles. Simple win: df_transformed = df.filter(...).withColumn(...) # Your ETL magic df_transformed.cache() # Persist in memory df_transformed.count() # Triggers caching Uncache when done: df_transformed.unpersist(). In Databricks, watch the UI for eviction stats. Saves hours! ⏱️ Caching saved your day? Share! 👇 #DataEngineering #ApacheSpark #Caching #BigData #Databricks #SeniorEngineer #ETL #Performance #TechTips #Analytics
Like Comment
To view or add a comment, sign in
Sam Cain

Startups @ Databricks
1mo
Report this post
Ok, this is by far the coolest thing Databricks has ever released. Think Cursor—but purpose-built for data scientists—with governed data context and enterprise control via Unity Catalog. The Data Science Agent in Databricks Assistant plans, writes, runs, and fixes code from a single prompt, orchestrating EDA, modeling, and iterative error repair directly in notebooks and the SQL Editor. For Startup data teams, this is everything you want from an agentic IDE—plus your data’s governance and context—compressing hours into minutes for EDA, forecasting, churn modeling, etc. Super exciting! https://lnkd.in/estZepYY

Databricks Assistant Data Science Agent in Action!

https://www.youtube.com/

4 Comments
Like Comment
To view or add a comment, sign in
Jyoti Kumari

Azure Data Engineer | PySpark • Databricks • ADF • SQL • Python | Building reliable & high-performing data pipelines | AZ-900 Certified
4w
Report this post
Day 30: Metadata-Driven Pipelines – Working Smarter, Not Harder 🚀 In data engineering, the challenge isn’t just moving data — it’s building pipelines that adapt. Metadata-driven pipelines help us achieve exactly that: They read metadata (like table names, schemas, and source locations) instead of hardcoding everything. This allows scalable, reusable, and flexible pipelines. Adding a new table or source? Often, no code change is needed — just update the metadata. Think of it as giving your pipeline a map and compass instead of writing directions for every single journey. 🗺️ Metadata-driven design = less manual work, fewer errors, faster delivery. #DataEngineering #Databricks #ETL #MetadataDriven #Automation #Spark
Like Comment
To view or add a comment, sign in
Praneeth Lingoju

Data Engineer
4w
Report this post
⚙️ How I Built My First Data Pipeline (and Messed Up Along the Way) When I built my first real data pipeline, I thought it would be easy: Read data → clean it → store it. Yeah… not really. 😅 I used PySpark, S3, and Airflow, ran it locally first, everything looked perfect… until I moved it to EMR and Airflow. Here’s what I learned the hard way: 💥 One Giant Output File: Spark took forever to write huge data. Lesson: Partition by date or region. ⚠️ No Trigger Check: DAG ran daily assuming new data existed. One day, the API didn’t refresh. Lesson: Always check before running — trigger files in S3 save headaches. 🧠 Overcomplicating Airflow: Multiple branches, conditions, retries… nightmare to debug. Lesson: Keep it simple first. ✅ Key Takeaways: Build small steps, add sanity checks & alerts, and test with sample data first. 💡 What about you? Have you ever messed up a pipeline or project and learned something valuable? Drop your story in the comments — I’d love to hear! 🔗 Read the full story on Substack: [https://lnkd.in/gMqYQKTK] #dataengineer #TechJourney #GrowWithPraneeth #Data
Like Comment
To view or add a comment, sign in
Ankit Yadav 🇮🇳

Senior Data Engineer at PwC | Ex- Deloitte | Ex-LTTS | 3x Azure Certified | 2x DataBricks Certified |Transforming Data into Insights for Impact 💡
1mo Edited
Report this post
Why 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝗔𝘂𝘁𝗼 𝗟𝗼𝗮𝗱𝗲𝗿 Is My Go-To for Scalable Ingestion If you're managing data pipelines with millions of files, stop manually tracking what’s been processed—seriously. Databricks Auto Loader is a total game-changer. It automatically detects and ingests new files (#CSV, #JSON, #Parquet, Avro—you name it) with minimal configuration. No more brittle scripts or manual file tracking. 🛠️Features That Make My Life Easier:- - ✅ File Type Filtering Even if you don’t control the source folder, Auto Loader lets you ingest only the formats you care about. Say goodbye to noisy data. - ✅ Glob Pattern Directory Filtering Dynamically read from multiple subfolders—no hardcoded paths, no headaches. - ✅ cloudFiles.cleanSource` Options Keep your landing zone tidy with flexible cleanup modes: - OFF: Leave files untouched - DELETE: Remove after retention - MOVE: Archive to another path 💬 Have you used Auto Loader in production? What’s your favorite feature or use case? Let’s trade notes! #Databricks #AutoLoader #BigData #DataEngineering #CloudComputing #Spark #ETL #DataPipeline #DataEngineer #DataFactory #Big4
4 Comments
Like Comment
To view or add a comment, sign in

70 followers

View Profile Follow

Transcript

More Relevant Posts

Databricks Assistant Data Science Agent in Action!

https://www.youtube.com/

Explore content categories