Auto-generated transcript below:
Good morning, everyone. And thanks for joining iterative ventures series of AMA’s. With Zack from Zing Data today, I’m particularly interested in Zack’s product, because this is a pain point that I experienced myself, working at Facebook in analytics back in 2015. The tool that he’s building is exactly what I’ve always wanted, which is a mobile platform to
pull up analytic charts. So then when I go into my meetings, etc, I have evidence to show in a very efficient way. So that I’d love to.
First of all, thank you for joining us in this podcast. Yeah, absolutely. And, you know, Zack, we’ve known each other, you know, since 20.1, I’d love to dive in a little bit detail as to how did you come up with this idea? How did you get started? What was the inspiration?
Zack Hendlin (Zing Data) 0:55
Yeah, so I think whenever you’re starting a company, you either get frustrated enough with something or excited enough about something that you start building. And so that’s what happened here, I was in a really, really boring hour long plus meeting, where we were trying to figure out where to hire salespeople where we had free signups or in paid signups, I was VP of product at a company at the time. And we all had our phones, none of us had our computers. And so we were basically having this long discussion about where we hire salespeople without the benefit of looking at the data on where signups were and where people converted and all that. And I knew that that data existed, because I built out the data warehouses and all that stuff, and put in the time with engineers on the team to get that infrastructure in place. But we were having a meeting and making decisions without the benefit of that. And it was like, Why isn’t there an easier way to use this, we had Jupyter Notebooks, we had MetaBase, we had all this other stuff, none of it actually worked in the lightweight on your phone way that actually a lot of people work today. Actually. That was the motivation. And then I started kind of working on it. nights and weekends with a friend of mine who had met 10 years before at MIT, we went to grad school together, started building out the initial version and put it up on product time. And when you have like real companies sign up like not Gmail addresses, like, some of which are in like the fortune 1000, then we were like, Okay, there’s something here. And then we went out and jumped into it full time, raised, raised around two and a half million dollars, and started kind of really building out the team making the product a lot better. And kind of that’s where we are today. And then over time we started as, as this like, mobile platform where you would deterministically say, hey, I want the count of this thing grouped by that thing, and you tap through it. And then recently, over the last three months, with the possibilities that open AI and touching beauty and other things have opened up, we said, well, actually dictating to your phone, where you’re not going to do a lot of typing is a great interface to ask questions. And so we’ve added in natural language queries. So you can ask a question, we’ll translate it to SQL, run that query on your data, and then show you a graph or a chart or whatever it is. So it makes it way more lightweight to say, Hey, who are my top five salespeople, or what orders have happened, but not yet been fulfilled, if you’re in a fulfillment center, and do that without needing to know SQL without needing to be at a computer, in a way more lightweight way. And we have use cases that I wouldn’t have even thought of I thought it’d be used by VPs of product that Series C companies to monitor product analytics like I wanted. And we have folks in farm settings, we have folks in energy, when they’re deciding, you know what to extract or store. We have real estate agents like this super interesting mix of use cases. That kind of turned us onto the idea that there are a lot of people who are out in the world doing their work, real estate agents, retail workers, the folks who fix your utility lines, if they go down, the folks who fix your roads, the folks who are in logistics and supply chains. And all of those folks actually will be able to do their job better if they have the right data on, hey, where are the outages? Or which parts of the road need need work? Or do we have the parts to fix this thing in stock in a warehouse nearby? They all can do their jobs better if they have data? And in order to do that you need to make it super simple
and work well on mobile. And so that’s what we’ve done. And AI is one of the ways we make that, you know, a notch easier than has historically been the case. That’s remarkable. And, you know, the fact that you can find all these other use cases, especially outside the realm of, you know, software driven companies, etc, right, you know, for people who’s on the role on the farm, etc. It’s fascinating 80% of the world’s workforce, there’s a some research by emergence capital 80% of the world’s workforce is deskless are not primarily at a desk. So especially if you go to you know, Southeast Asia, a lot of small businesses will actually operate primarily from their mobile phones, we have a chain of retail stores in Argentina that uses using, and they actually have never logged in. On a desktop, they added a Postgres database from an Android device, and they query and go to different stores and do that all from their phone. They’ve never logged in to
Zing, from web. And, you know, would I have forecasted that? No, I would have said, most people will probably set it up on web, and then use it on their phone. But there’s actually folks who were literally running their business with from their phone, and I think you see, even recently, Stripe, I believe it was had a thing where you can accept payments, by literally tapping a card on an iPhone, and you can run a business without necessarily needing as expensive of infrastructure as you used to. And all those transactions are generating data. The question is, how do you make it useful, you might have a standalone app for Salesforce or a standalone app for Google Analytics. But the fact is, a lot of the value comes from putting those things together into like a data warehouse. So you can say, hey, the person who saw this ad purchased and then purchased again, and then had this customer support question, you put that all in one place, and then you set alerts when things happen, makes business much more real time. And so the idea that we’re going for is, instead of a dashboard that you go check, and need to spend a bunch of time building, how do you just make it way more lightweight? To ask a question, I’m sure when you were at Facebook, and someone said, Hey, Richard, can you go like create a dashboard for this thing? Or hey, can you go do some analysis on this? Sometimes it was really hard and took your expertise. But other times, you’re probably like, hey, if there were really good tools that people could self serve this, this would save me a lot of time. So I think we’re on solving kind of that easier set of use cases. Yeah, I was just gonna say that, you know, what you’re creating sounds like you’re about to replace my former job. So well, you know, I think that that’s super exciting that, you know, we can have this efficiency boost. And I’d love to dive into, you know, later on left love to dive into kind of areas that you’re currently focusing on, and, you know, deepening those use cases. But before that, I’d love to dive into your personal experience, because this is a community event. And so, you know, I’d love to, for you to share a little bit about your experience as to when you got started, what were some of the learning experiences, pitfalls, and how you came to here, basically, yeah, so before I started saying, I, let’s see my first job out of college, I was a management consultant.
So I got very good at formatting slides. And really kind of understanding how businesses work and a lot of inefficiencies in that and realized that like, this desire to build stuff was missing. So from from that work, right, as a consultant, you’re always giving advice, you’re not actually the decision maker, you’re not the person building the thing. You’re the person advising the people who built this stuff. And so ended up go in business school and all that stuff, and worked at Facebook, on speech recognition shipped their first work in speech recognition, actually applied to voice clips in Messenger, no longer a live feature, but it now powers captions on newsfeed and video understanding for newsfeed which is super valuable. And actually, ads with video, captions turned on have like 11% more watch time than videos that don’t have captions. So turned out at Facebook scale. That’s a very valuable problem to have solved. also worked on Facebook’s first work in speech recognition, speech recognition and then move there first of all large format called Canvas. And I think it’s called instant experiences. It’s still there. And those those sort of formative experiences in building lightweight interfaces that worked well. Right where you didn’t need to, like do something hard at a computer, but you could say something and get captions or record a video and get captions or we could rank it better because we knew what was in that video.
sort of introduced me to this idea that if you can make a consumer level usable product, useful for a business, that’s, that’s pretty powerful. And I think, you know, Canva has done a good job of that Robin Hood and making stock trading step change easier and work well on mobile. Those are sort of good examples that we looked at, when we were thinking through what the right experience would be to use data data on mobile. And the interesting thing is a real I looked around for tools that would do that. So I didn’t, I didn’t start off saying like, Oh, I need to go start a company, I started off saying, Hey, I have this problem, that nothing out there solves Power BI didn’t solve thoughtspot, didn’t solve sigma computing didn’t solve MetaBase didn’t solve Jupyter, notebooks. And so and what occurred to me was, it’s actually really hard to build consumer level easy to use interfaces for data. And historically, those haven’t been the overlap of people who are really smart and understand statistics, and the robustness of optimizing a query so that it runs fast. And all of that overlapping with people who are like, it’s lightweight, and easy to use, and delightful and are, you know, thinking about like Instagram level, beauty or delight. That’s not a huge overlap of people. And most of the industry, most of like the BI and Data Tools industry has been selling to data teams who are very, very, very technical. And so as a function of that, they would add more hardcore technical features. And I would liken it to Adobe Premiere or After Effects, right, the super full featured video editors, versus Tik Tok, or iMovie. And if you can create that really easy to use thing that you can use on your phone to edit a video, you’re actually opening it up to way more people. And so the existing players, you can’t take Final Cut Pro, and have that work great on a phone, right? There’s just too many features, it’s too much stuff to have be lightweight. So you had to start from what what’s the ideal experience on mobile. And so that’s actually what we did. We built mobile before we ever built support for Web. And the reason was, we knew that was gonna be harder to solve, like long running queries, or you lose a connection, you have an intermittent internet connection. So we solved all those pieces as part of the initial version of the app. Because if you don’t solve those things, queries, just timeout. It doesn’t work. You run a query that takes a while iOS will kill the background process. And then you’re like, Oh, well, I guess I can’t analyze that on mobile. If you think about those things, from day one, though, you can build this experience that works well. And and really thinking through we always, my designer,
probably is tired of you saying this, but like, how can we make this easier? How can we make this fewer tabs? How can we make this fewer steps? And that’s kind of you know, you probably saw that a ton at Facebook? And we thought about that a lot? Have you built a product where a user doesn’t need to read a manual or read the documentation, if if they don’t want to? That was very ingrained to like building products that like Facebook or LinkedIn, where I also worked, but is less common amongst folks who are building kind of technical tools for data people. And so we tried to sort of bridge those two things. We were starting to say, yeah, you hit on a very good, or many very good points. Those are all pain points that I’ve experienced myself. Right. So, you know, one of the difficulties, which justified my previous job was, you know, I could translate what business users wanted to a query and potentially optimize it and draw data, etc. And so there’s definitely a learning curve there, which it seems like your product is currently solving. So with that said, How did you land on this component AI? Because you know, when I looked at your app, one of the things that really jumped out, obviously, we dealt with the hype of chat, GPT, generative AI, etc. That’s one thing that can bridge the gap with, you know, the previous pain point that we talked about. So what was your journey? Like in discovering that? Yeah, so we actually did not start with AI as the product. And I think the reason is, nobody actually wants to buy AI as a product. They want to buy real time data analytics, they want to buy alerts when something goes wrong. They want to buy, you know, operational improvements or more revenue. That’s what they the outcome that they care about. And so we started by saying, Well, what’s this awesome mobile experience? And what’s a really lightweight way to ask a question that doesn’t require SQL that doesn’t every
require a desktop that doesn’t require an analyst to do it. And I think that was valuable because it grounded us in what problem we were solving for who, instead of just being like, Hey, we’re going to wrap some stuff around AI. And it also gives you a fallback, because sometimes open AI are some of the other large language models don’t give you exactly what you need. They don’t give you exactly the SQL code that you would need to run that would compile to answer a question. And so we actually didn’t start with AI as a core component, until we saw that it was actually getting good enough for us to pull in. And then we integrated with open AI to do the English to SQL translation. It’s still in beta. But we do have folks using that we have a steel manufacturing company using it, we have a chain of convenience stores where they want to look at what products are running out of stock. And the cool thing is, it allows you to actually dictate a question and saying, like, who are my top five salespeople in January? Or what products are out of stock? And you can do that without having to pre create a dashboard without being dependent on anybody on a data team to answer those questions. And so you’re radically expanding the number of people at an organization who can use all the work that you put into building the data warehouse on a daily basis. And the fascinating thing that we’ve seen is literally, chain of convenience stores, I won’t say their name, but you’ve probably heard of them or bought Slurpees from them. And you know, in their case, it’s not just like a manager who’s looking at aggregated data, it’s actually people throughout the organization who are able to now get their jobs done better, because they’re able to ask questions using natural language. So what we found when we were incorporating AI, with open AI, is actually calling the calling their API’s is pretty easy. The hard part is, well, what if someone has a ton of tables? And that’s actually more than you can send to open AI? How do you pre select which ones are going to be relevant? If there’s 100 tables, or if someone has timestamps as an example, and eye opening, it gives you back output that groups by timestamp? Well, you probably will rich video probably catch on to this very quickly. But for timestamps and dates, if they’re in down to the millisecond. And you group by millisecond level data, that’s actually not a graph that most business people are gonna want to see. So we automatically handle that converted do the right casting, realize that it’s a time series realize that that should be graphed where that is on the x axis, because that’s actually the output that you want, in order to know, hey, inventory, this product dropped to zero, or sales of this thing, are at a new high. And so there’s a lot of stuff around the API call that you need to do both before and after, to actually get a product that that ends up being useful. And I think that’s where there’s gonna be a ton of value created. Because the there’s a number of models out there, obviously, open AI has one there’s there’s other ones are some open source stuff. That’s the trying to also tackle a similar problem. But a lot of it comes down to how do you make it useful for someone? How do you put it into their workflows? If you think about
it, Canva. Right, there are other tools, or let’s take figma I think figma is a really good example. There are other tools you could use, there used to be a thing that was very popular called sketch, I used to use it at Facebook pretty broadly. And it was like single player. So how do you make design multiplayer? Well, that’s the thing. figma did really, really well. And it made design more useful. And so we have like app mentions and shared questions and all that kind of fun stuff. So once you ask a question, like what products are out of stock, you can tag someone and they can go take action to restock it. And so we think a lot about not just like AI in a product, really like the job someone is trying to get done. They’re trying to understand something and take action based on that. AI is a way for them to ask that question efficiently in certain cases, but there’s all these other things that they need to do once they have that answer, right like assignment to someone to go take action, reorder a thing, dispatch someone to go do maintenance on this building where there’s a
critical number of complaints that there’s a problem with the building. So those are all like workflows that are outside of kind of the API call to open AI narrowly defined. That is super exciting. So it sounds like you are combining, you know from the top funnel right? Thinking about funnel AI
Richard Chen (Iterative Ventures) 20:00
To pull up analytics and telling you, you know, what’s needed, what to do alerts, etc, and then dragging that down on a task level, right, almost like JIRA saying that, hey, you know, you have to do this XYZ because of you know, these graphs and so on so forth these analytics insights, that’s super exciting. So, you also mentioned about, you know, your product is in the beta phase, etc. What are some of the, you know, challenges with the current AI? You know, what is it? That’s something that you’d like to see? Yeah, so I think there’s a lot of folks who, who expect it, the LLM ’s and tried GPT and other stuff, are immediately going to
Zack Hendlin (Zing Data) 20:41
render whole categories of of jobs irrelevant. I have a much more nuanced view on that, which is,
the models are good for
what we’ve observed about 40% of the questions that people ask in zinc. So
when people ask natural language questions on their own data, and this is their data in the field, not like idealized use cases, the actual questions people ask, they get good results, which we define as them getting a graph or a chart or whatever, which they save
something like 40% of the time, the other 60% of the time, I it’s not able to figure out which table or which field or the join, that should happen, or,
or sometimes opening, I will actually give you stuff that won’t run and won’t compile and all that. So I think there is enthusiasm that is a little bit ahead of where the technology is today. The things I’d like to see kind of as, as these, these MLMs evolve, and his text to SQL and other engines that are kind of based on those evolve, is much better kind of semantic understanding, and a much better social understanding. So I’ll talk about those first semantic understanding. It’s sort of like, how do you understand that sales and revenue?
If a user says, Hey, show me revenue by region, but you actually have sales and you have state? Well, how do you actually figure out that sales and revenue are actually very close to each other in a vector space. So you can probably disambiguate in that way. And that state actually could probably be used as region, at least unless a user says, hey, I want to group it in some other way. And so there’s some stuff there that does it. And there’s like Word to VAC, and Google has vectors based representations. And a lot of that underlies some of these MLMs. But I think it can be better. And I think it will be better over time, especially on a company’s individual data. So it’s saying, hey, in in the internet, as a whole, we know, sales and revenue are like close concepts, great. But at an individual company, the way you define revenue, is a revenue that is booked, is it revenue that is actually I already been paid to your company? Is it revenue over the next 12 months that’s recorded in that field. And a lot of that new months, isn’t understood by these models. And so if you just blindly use them, you can kind of get results that
don’t give you quite the precise answer. So better semantic understanding. And then the second piece that I think a lot of these models miss out on. And it’s it’s by design, but I think is the thing we think a lot about is what is the context from your co workers and everybody that you interact with your history of questions that could actually make these things work a lot better. So in Facebook world, you’d look at how many of your friends posted content that other friends liked. And if a lot of your other friends liked that content, it was somewhat likely that you would as well. And so it would show up higher in your in your newsfeed. And I think in the in the in the world of acquiring data. And using AI to do it. There’s this element of like, well, what questions your colleagues are asking what questions they’ve saved that are likely going to be relevant for you. And for the most part, MLMs don’t know about that, because they’re trained on these like very large, kind of broad public type data sets. And I think the nuance will be fine tuning those or putting overlays on those to kind of personalize them more to an individual’s use case. So if you ask a question, which is like, hey, at iterative ventures, what were our best performing investments over the last year, based on their current Mark tomorrow?
To get representation or whatever, evaluation, that it knows what fields those are for you. Whereas if I say, hey, as an angel investor, what were my best performing investments? Maybe there’s actually no answer because I don’t have that data, because I’m not getting investor updates, or whatever it is. And so sometimes the models today will give an answer that is confident but wrong. And representing that uncertainty better. And pulling in supplemental data from like, your social network, or your individual company is going to make it a lot better. If you think about ranking and newsfeed, if I just had a bunch of content, but I didn’t know how popular it was, or I didn’t know if any of your friends liked it, that ranking is going to be much less interesting to you than if I actually know your social graph, your even if it’s your your work, social graph, to use that to rank, hey, the data scientist right next to you, is looking at
churn and here’s how they’ve defined it. And here’s a query that you can build off of, instead of needing to start from scratch. And I think those types of I’ll call them social overlays, will end up being a way that instead of each person asking a question from scratch, where maybe it resolves in different ways, and they get different outputs, you’re actually much better able to kind of standardize on Hey, this is how we define this thing. Here’s this metric as its as it’s commonly used across the organization. And that super exciting thing is, once you pull in that social information, you can start saying, what are my colleagues setting up alerts on? What are the things that they think are important, and you can actually proactively to what users say, oh,
other store managers want to know when inventory is about to run out on fast selling items in their store,
your store manager, maybe you want to know the same thing, click here to turn that on. So I think it’s gonna become much more proactive and much more socially aware, if you will, like it will pull in data from your your social graph, rather than just being this general like this general answer. Right? So you can think about a question of like, is it a good financial decision to buy a house? Well, it probably depends on what your savings are, and probably depends on what a region you’re looking to buy a house and historically how that’s going and maybe where interest rates are gonna go in the future. And maybe if you have kids, or expect to have kids and need more space, whatever it is, that answer is different for different people based on their context. Whereas I think the LLMS today are trained on like, relatively broad corpuses. And I think a lot of fine tuning is going to make them much more useful. Yeah. So from what you’re saying is basically that the current challenges is that it’s not personalized enough, it’s not fit to the context enough, just like you said, when I was using chat, GPT, it gave out a very general answer, it seems like now that we’ve moved away from, you know, keyword search, Google, etc. And we added this layer where we can have human or natural language processing, etc. It’s still there, you know, misses these, you know, I guess you call it social graph Knowledge Graph with, you know, let’s say, what’s important to a business, for example, where’s your revenue generation? You know, how do you currently evaluate your business? What are your coworkers thinking about? How are their ideas, you know, can be potentially helpful to you at cetera, et cetera. So we need to bring in deeper context to fit better with, you know, our organization.
Zack Hendlin (Zing Data) 29:10
But maybe it resolves in different ways. And they get different outputs, you’re actually much better able to kind of standardize on, hey, this is how we define this thing. Here’s this metric as its, as it’s commonly used across the organization. And the super exciting thing is, once you pull in that social information, you can start saying, what are my colleagues setting up alerts on? What are the things that they think are important, and you can actually proactively to what users say, oh, other store managers want to know when inventory is about to run out on fast selling items in their store. You’re a store manager, maybe you want to know the same thing, click here to turn that on. I think it’s gonna become much more proactive and much more socially aware, if you will, like it will pull in data from your your social graph, rather than just being this general like this general answer. Right? So you can think about a question of like, is it a good financial decision to buy a house? Well, it probably depends on what your savings are, it probably depends on what a region you’re looking to buy a house and historically how that’s going and maybe where interest rates are gonna go in the future. And maybe if you have kids, or expect to have kids and need more space, whatever it is, that answer is different for different people based on their context. Whereas I think the MLMs today are trained on like, relatively broad corpuses. And I think a lot of fine tuning is going to make them much more useful.
Richard Chen (Iterative Ventures)
Yeah. So from what you’re saying is basically that the current challenges is that it’s not personalized enough, it’s not fit to the context enough, just like you said, when I was using chat, GPT, it gave out a very general answer, it seems like now that we’ve moved away from, you know, keyword search, Google, etcetera. And we added this layer where we can have human or natural language processing, etc. It’s still there, you know, missus these, you know, I guess you call it social graph Knowledge Graph with, you know, let’s say, what’s important to a business? For example, where’s your revenue generation? You know, how do you currently evaluate your business? What’s the driver, etc, and then fitting into, or even fitting into, let’s say, internally, right, working, etc? What are your co workers thinking about? How are their ideas, you know, can be potentially helpful to you, et cetera, et cetera. So we need to bring in deeper context to fit better with, you know, our organization. And, you know, just society in general, right. I’ll open up for a question now, from the audience is, if there’s any, this is such a fascinating talk. Thank you.
Zack Hendlin (Zing Data)
One other one other thing I’ll share just briefly, that we’ve seen and is kind of a really cool outgrowth of this is, once you ask these questions, with open AI, or tapping through them visually, you’re then actually, in a certain sense giving, creating training data. So if you think about zooming out as like this system, that you’re asking questions, you’re doing things and the more of that that you do, the more signals that you actually can generate for what proactively a user might want to know about. And so the long game here is not just that everybody has to kind of go ask questions went off. But if if there is something that I’m checking every day, a particular query or like 10 times a day, and then whenever it goes up, I tag someone and ask them why a good action item to train the system is, well, how can we automate that right? How is that something where we can learn that you’re frequently checking this thing? And instead actually send that to proactively? And so I think that’s going to be a really cool direction. I think Glenn has a question. In the chat, so Glen Hastings just asked, it sounds like the 60% of failed natural language questions. It’s obvious how they fail. How do you deal with answers that look right? But aren’t right, the uncanny valley. So what we’ve done first off, any we we’ve very clearly labeled that it’s beta. And in fact, the majority of queries that people do in zing are not using natural language. They’re actually this more deterministic kind of tapping and holding through stuff in Korean that way. What we do is flag what the actual SQL was, so you show like, Hey, you can see here’s what was run as a way for people to kind of sanity check. So that’s, that’s one way, we’re working on exposing that in an even easier way. So you don’t have to look at the SQL. And you can kind of understand visually what exactly what was run. The other thing though, which kind of gets back to the previous point I had on like learning over time and getting labeled that is like, if you ask a question, and then you slightly change how you ask the question, because it didn’t get the result you want. You slightly change it again. And then you save that kind of final version, you say you you liked that had the right result? That actually right now does it feed back into the model? Over time, though, if you if someone else at your company has a very similar question, that should actually be probably the first draft that’s run. And so it kind of goes back to like this, this macro optimization of how you make the system smarter over time. I don’t think the system today is getting that much smarter over time. I think it will, as as things evolve. But it glad it’s a great question and tricky one. You could have a thing run and use the wrong field. And that could lead you to the wrong result. So the way we handle that is we actually let folks turn on and off different tables, and fields. So you’re actually constraining kind of what you can pick between in a way that’s going to be more relevant. So a quick example here would be maybe I have three different definitions of sales, I have booked sales, I have sales that where people have actually paid me I have sales with tax, sale sales without tax. And when I say show me sales by Representative, which of those do I mean? Well, if I only turn on one of those, or a label those very clearly, or I disambiguate, and say, hey, there are three things that not sales are five things that match sales, which one do you want, that’s how we think we can get from, you know, the State of the Union today to, you know, 90 plus percent. But that’s also going to require better ways to interact with some of these MLMs, or Codex type technologies. There’s a bit of q&a and chat GPD. It ends up not being super specific, though, to your own data in your own use cases, nor is it aware of kind of what has been done previously at your organization. So feeding those in as inputs would be potential ways that we can improve that.
Richard Chen (Iterative Ventures)
Yeah, I really resonate with you Zack, especially when I’m trying to use chat up to right some of them are prototypes, some of the code right now is still gives you a lot of wrong answers. And it sounds like, you know, the way I look at it is kind of like this is the early machine. And it’s prone to breaking and it’s not perfect. And so it still needs a lot of hands on. But at the same time, I am abstracted away from having to write the query myself write the code, etc. So that’s where, you know, where the benefit is, is that it provides you this rough draft, right. But as overtime, like you said, bringing in the element of social context, you know, the broader landscape, you know, the company, organizational level, I guess, you know, drivers, etc, by bringing those contexts is how we can refine those answers to exactly what you need. Right? It sounds like
Zack Hendlin (Zing Data)
100%. And I think the questions for saying the questions that someone saves, right, those are an indicator, that’s a positive label. If you think about labeled training data, we don’t we don’t we don’t use that today. But one of the things we’re thinking about is how we can kind of inform a model based on the questions you’ve saved before that are reliable, that are good, even if you didn’t ask those questions using natural language. Maybe you ask those questions using SQL, or a data scientist on your team did? Or maybe you asked his questions deterministically by tapping through stuff. And so you now have created the Save Question that’s labeled training data, which we could use to inform natural language queries in the future, and improve the likelihood that those return exactly what you want them to.
Zack Hendlin (Zing Data)
I think one of the biggest challenges if you use chat GPT is it can be confident and wrong. And representing that uncertainty is actually pretty important. My background was in statistics, and so statisticians think a lot about uncertainty and how you represent is this change in the data a meaningful change? And to do that you we have statistical models and all that sort of stuff to say, is this change statistically significant? And I think in the same way, for LEMs, you probably want some way to say, hey, there’s a level of confidence around this answer that is high or low. And maybe that’s derived on how many places it’s sourced from the authority of those sources, that then would give you a different level of, of confidence. So for instance, if you had three revenue fields, and I spit out a result, we could say, Hey, this is low confidence, because there are three different fields that could all match, choose a specific one. And your confidence then goes up, because you’ve de risked that that uncertain part?
Richard Chen (Iterative Ventures)
That’s a really interesting thought it sounds like, you’re basically giving what the model spits out a grading system, right? So instead of saying that, hey, this is either right or wrong, you’re saying that well, this, you know, it’s kind of right. But it’s not exactly entirely to the context that said, right. So you kind of give this grading system for the model to retrain, etc, right over time?
Zack Hendlin (Zing Data)
Yeah, there’s some actually, there’s two kinds of examples that come to mind. One is PageRank. And Google that decided, you know, this, this page is more relevant, because there’s a lot of sites on the internet that are high authority that are linking to it. So it’s probably a relevant result, right, in a certain sense, more reliable votes, decrease uncertainty, or make you more certain that that’s a good result. Twitter’s machine learning team also has some interesting work out where they will look at a post and the look at I think it’s like folks with different political spectrum views. And if they realize that it’s been reported as inaccurate, in a way, that’s a concern, I think it’s consistent across political groups, that it’s unlikely to be accurate, because these groups disagree on, let’s say a lot of things. But they both agree that this data is or this post or whatever is inaccurate, or, you know, should be reported. And so there’s mechanisms, and they’re some of them are kind of fancy, and there’s a lot of math behind them. But mechanisms to say, hey, is this thing likely reliable or not, without necessarily needing to judge and know, its its truthfulness, but rather look at the data on people that engage with it, how they report it, given all the other ways, they may differ, if there’s consistency in that, that you can use that as a proxy for how confident you are in something. But it’s actually even easier if I literally have like three fields with slightly different names. And I’m having to guess at one of them to complete this query. Well, that’s the thing I can know much more deterministically, the solution for?
Richard Chen (Iterative Ventures)
That’s super interesting. Basically, the way I heard from you is that not only should you be considering about the utility of a chat GPT, or some of these generative AI models, but also thinking about the secondary effect, how does that relate to, you know, you know, other participants in this network, etc, as well. So that’s super fascinating. Any other question from the audience.
Zack Hendlin (Zing Data)
And I’ll give a quick plug, if anybody wants to try this stuff out, you can try it out for free at ZingData.com. And you could do something as lightweight as hook it up to a Google sheet. So you can try using natural language on your own, on your own Google Sheet all the way up to big databases like Trino, and snowflake and all that kind of fun stuff. BigQuery. Read Amazon Redshift, all that kind of stuff. But we have a lot of folks who have a Google Sheet with, you know, maybe their financial goal for the next year, their plan, or some people who will hook it up to data from their Apple Watch and say, Hey, which day did i i get the most steps in? Or they’ll track data from their nest thermostat. You know, which show me the days that I use the most energy over the last month. And those are all lightweight ways that you can kind of play with it without needing to have like big data sources set up or even pull it into like your business use case.
Richard Chen (Iterative Ventures)
That’s amazing. I’ve been looking for solutions like this.