Is AI trustworthy or are we being sold a bill of goods?

1,835 Views | 37 Replies | Last: 1 day ago by rjhtamu
KingofHazor
How long do you want to ignore this user?
I'd like to know the opinions of you guys much more knowledgeable than I am on AI. This post was prompted by seeing this article and my own frustrations in trying to use ChatGPT, Grok, and Gemini.

The article:

Attorney in TD Jakes Defamation Case Sanctioned $76K For Using 'AI Hallucinations' In Court Documents - Protestia

My own frustrations in trying to use AI are similar. The AIs listed above frequently make stuff up in response to my questions, ignore the parameters I try to set, and seem incapable of doing the very things that they are supposed to do.

As just one example, I've asked them to provide various types of weather data. I assumed that they could identify the sites containing the raw data and then summarize them for me. But they don't. Instead, they provide something that sounds vaguely familiar to but different than what I requested, and then tell me to go to weather.gov for more info.

Another example is I have tried to use them to assist me in teaching my grand nephews and nieces US history. At first, I was amazed. They suggested I upload mere pictures of the text pages covered in each lesson, and then they could provide a student worksheet and a lesson plan. A first glance at those documents was impressive. However, a closer look showed that the documents included matters not in the text at all, and also included matters that were simply wrong.

I am finding those sites unreliable and not very valuable. It is clear that they don't "think" or "reason" at all, but simply regurgitate text from the internet based on some complex algorithm. They are only as good as their algorithms, and for me those algorithms seem deeply flawed.

What am I missing or doing wrong?
kyledr04
How long do you want to ignore this user?
AG
It's definitely got limits and needs lot of supervision. I use it a lot but often times I find its best at mundane but repetitive tasks where you mostly already know the expected output.

But it will make stuff up or draw wrong conclusions while sounding authoritative. I can't trust our customer support with it unless our SMEs review it first.
LOYAL AG
How long do you want to ignore this user?
AG
I will run client financial statements through Chat GPT and ask for analysis in several key categories. Then I pull out the things I think are most important and talk about those with the client. It's shown to be very good at identifying trends and at finding areas of concern based on baseline information I didn't program in. I don't let it write anything the clients sees but it's very adept and helping me identify where to look.
IrishAg
How long do you want to ignore this user?
I think it's going to be trustworthy but I also think there are a lot of caveats (like any new paradigm shifting technology at it's infancy) that people are forgetting about. Here are a few:

First, this is really at it's infancy with the standardization. So that means there is going to be a lot of variance from vendor to vendor, and model to model on how something answers a question. So people just copy and pasting and answer expecting it to be a source of truth is as crazy as blindly copying and pasting from wikipedia and expecting it to be a source of truth.

Second, in relation to the previous point, how recent is data that the LLMs have access to. A lot of LLMs don't have access to a lot of up to date information because of the nature of the internet (I'll get to in a bit), instead when a request goes outside of what they know, they just search the internet and don't rely on what is in their data sets. Which is problematic because of the third point

Third, companies HATE automation that scrapes their sites and pay lots of money to attempt to stop it. AI is just a bot when it's searching the internet, so that means the company that hosts it has paid money to have access to APIs where it can ingest the information or the LLM just uses a bot to scrape the information off of sites on the internet. Either way, what you get is a wild variance of the information that comes back because as a user you might expect the AI will be smart enough to go to known sites to get the info, but if those sites are blocking or obfuscating data then the AI will fall back to the next site and so on and so on.

I have no idea what kind of back end deals the different companies have, but if they aren't paying for data from sites when they search, then they are pissing off the site owners (who want humans and not bots on their sites). So even if they get data, I'm sure they get a lot of cease and desist notices.


Long story long, TLDR, accuracy of recent events/data will be an issue for LLMs for a while, because of the issue of how they're getting to that data and the fact that most companies won't want and will actively attempt to stop them scraping their sites/apps for that data (if they aren't paying access).
fig96
How long do you want to ignore this user?
AG
It's really good at some things and not as great at other things that people unfortunately keep trying to use it for.

Really great for analyzing a set of data, writing snippets of code, rewriting things you've written, summarizing paragraphs or meetings, formatting text or documents, and prototyping features for apps or websites. But you need to build in checks and balances and often have to keep correcting things it does, I've used it for helping me code fairly simple HTML functions and it would do great for the first part then completely leave off or break things as we added onto it.

It's not great for things like therapy, self help, queries where you need accurate information, etc., because it tends to go totally off script and sometimes completely make up sources. Again, built in checks and verification are needed to make sure what it's telling you is accurate.

I hate it for creating things like art and design because it's sloppy and gives people the idea that the most important thing is the idea and not the work that goes into creating the thing.

As a whole I think it'll end up being less than it's cracked up to be, effective for a lot of things and helping people be more efficient but not able to wholesale replace many employees like a lot of folks believe. I do worry that we're raising a generation that isn't going to be able to think critically and creatively for themselves, personally I'm going to be very careful with how I allow my kids to use it.
Proposition Joe
How long do you want to ignore this user?
I think right now one of it's biggest drawbacks is it's been designed to be too confident about it's replies.
Feeder Road
How long do you want to ignore this user?
AG
LOYAL AG said:

I will run client financial statements through Chat GPT and ask for analysis in several key categories. Then I pull out the things I think are most important and talk about those with the client. It's shown to be very good at identifying trends and at finding areas of concern based on baseline information I didn't program in. I don't let it write anything the clients sees but it's very adept and helping me identify where to look.

Just curious, do you disclose this to your clients or have it in your engagement letter? (have a business with a similar use case and contemplating with my lawyer what is and is not appropriate)
stick95
How long do you want to ignore this user?
AG
It depends on what you term as AI. ChatGPT/Grok/etc is decent at simple structured tasks but is wildly inaccurate.

Agentic AI coded with the proper boundaries and refining itself can be very powerful and accurate. We've decided to really push boundaries as a software development teams from an AI standpoint. Our developers now have a team of AI agents... a BA, technical architect, developer, devops, pull request agent and code reviewer. Each of the agent roles focused on their sole task, and DOCUMENTING exactly what they are doing, which beats the hallucinations and inaccuracy. BA writes up a feature with full documentation, architect writes the tech spec with full documentation (recommends sub tasks if the issue is too complex), developer writes the code and unit tests, pull request makes the pull request with full documentation, code review reviews the code. Passing documentation along to each of these steps is key.

The agents also update one another. The architect updates the BA if it finds holes in the requirement so that it will include that next time. Code review updates the architect and developer.

We are still feeling things out, but we are shipping features probably 2-3 times faster. But we are also getting way more output for that work as well in terms of documentation and process refinement.
fig96
How long do you want to ignore this user?
AG
We're in very early stages of developing our processes to start designing and building agentic features so this is interesting to read.

What type of product are you working on?
LOYAL AG
How long do you want to ignore this user?
AG
Feeder Road said:

LOYAL AG said:

I will run client financial statements through Chat GPT and ask for analysis in several key categories. Then I pull out the things I think are most important and talk about those with the client. It's shown to be very good at identifying trends and at finding areas of concern based on baseline information I didn't program in. I don't let it write anything the clients sees but it's very adept and helping me identify where to look.

Just curious, do you disclose this to your clients or have it in your engagement letter? (have a business with a similar use case and contemplating with my lawyer what is and is not appropriate)


The couple of clients I've done it with have consented verbally. The rest of the current clients I'm going to approach beforehand to get their consent. Going forward it'll be in the engagement letter. I do have a paid subscription which is supposed to prevent their data from making the open source product. I guess worst case you could remove their name and just run financials through without anything to identity who they belong to.
fig96
How long do you want to ignore this user?
AG
Sorry, are you saying you're currently putting their data into ChatGPT that includes PII?

Cause if so, yikes. As a client I'd expect general anonymity even if you're using a paid subscription.
htxag09
How long do you want to ignore this user?
AG
I struggle with it because of how dumb of mistakes I've seen it make. Things like asking it for market cap of 6 companies and it's completely wrong. So I ask a follow up question and it corrects the numbers.

Another funny example from this week….a coworker is doing a walking challenge with his family. He asked "do you get more steps by walking or running for 70 minutes." The response was something like the average person gets 120 steps per minute when walking and 180 when running. So you'll get more steps by walking for 60 minutes. lol what?
Feeder Road
How long do you want to ignore this user?
AG
fig96 said:

Sorry, are you saying you're currently putting their data into ChatGPT that includes PII?

Cause if so, yikes. As a client I'd expect general anonymity even if you're using a paid subscription.

I dont think what he is talking about is PII. But client data always has confidentiality considerations and if he's a CPA he has professional standards to abide by. Financial professionals have been running this stuff through software for years to automate tasks and analysis, so this isn't anything new in that sense. For example, a lot agreements ask clients to authorize "use of cloud-based services" for things people find perfectly acceptable and normal. Quickbooks, Microsoft 365, etc....both of which now have "AI" built in. However, if you say, are you ok with me running payroll or financials through AI to perform analysis you'll get both sides of the spectrum of responses.
LOYAL AG
How long do you want to ignore this user?
AG
fig96 said:

Sorry, are you saying you're currently putting their data into ChatGPT that includes PII?

Cause if so, yikes. As a client I'd expect general anonymity even if you're using a paid subscription.


Yeah I didn't explain that well at all. I'm not running data with names on them. So no, there isn't any PII on it. Y'all actually made me go back and look at the couple of sets of data I've run and it is nameless. In both cases I didn't even mention anything industry specific in the chat. Good conversation though. The clients were interested in the results so I'll keep doing it with their permission but I'll be clear it's anonymized.
fig96
How long do you want to ignore this user?
AG
Gotcha, that sounds much better. What kind of analysis are you using it for?
fig96
How long do you want to ignore this user?
AG
He clarified so all makes sense there.

That said, there's a big difference between cloud based services and storage (which to be fair some people were super concerned about, mostly unwarranted) vs a service who's technology is literally built around learning from your data.

Also worth noting that many companies turn off the AI services provided by MS and others (mine included).
LOYAL AG
How long do you want to ignore this user?
AG
fig96 said:

Gotcha, that sounds much better. What kind of analysis are you using it for?


Business trends primarily. Revenue growth, expenses as related to revenue, months of cash on hand. It's good at seeing a range channel and highlighting where a given period is in relation to that channel. So if you show it three years of raw material costs and tell it you manufacture widgets it can tell you what your range channel looks like and your average but how you compare to your industry.

bthotugigem05
How long do you want to ignore this user?
AG
I always say that the current generation of LLMs have the confidence of a drunk guy at a bar with a small business idea ("no man, you don't get it...THIS WILL MAKE MILLIONS"). That said, a lot of it comes down to the prompting. If you give it a general prompt you'll probably get an answer back with a lot of BS in it. If you give it a really specific prompt you'll probably be happier with the results. If you ever have a prompt that you find yourself using over and over (every company should be creating their own prompt libraries at this point), give the prompt to your LLM and ask how it could be improved to use fewer resources and reduce hallucinations. You'll be pleasantly surprised at how well these LLMs can guide you to make your prompts more efficient.

I work in commercial strategy so a lot of my use for LLMs are as strategic partners, helping me come up with future scenarios and stuff like that. One of the most fun things I like to do for quarterly business reviews (on a company-approved LLM where we can upload sensitive documents) is upload my powerpoint presentation and tell the LLM to be my boss (I have a really good summary of his personality that I use in the prompt) and give me 10-15 questions it would ask based on the content in the presentation. There have been quarters where it's forecasted just about every question my boss asked, and, even if it doesn't, I've found just about all of the questions are relevant and better prepare me for the reviews.
bthotugigem05
How long do you want to ignore this user?
AG
Another thing I do is have some explicit instructions for the model. I tell it to provide links to all sources referenced in its reply and, if the model itself is coming up with something on its own, to put those lines in italics. It's not always perfect but it definitely guides my eyes to the spots I need to double-check first.
fig96
How long do you want to ignore this user?
AG
Very cool, definitely a good tool for data analysis. The products I work on are doing some kind of related things, looking at pricing and finding places where there's unexpected changes in margin, volume, etc., and finding supporting data around those numbers.
SteveA
How long do you want to ignore this user?
AG
Quote:

I think right now one of it's biggest drawbacks is it's been designed to be too confident about it's replies.

It isn't confident. It's simply using vectors to determine what word should come next in a sequence, for a given prompt. It does not think or reason. It's just math.
stick95
How long do you want to ignore this user?
AG
fig96 said:

We're in very early stages of developing our processes to start designing and building agentic features so this is interesting to read.

What type of product are you working on?

Healthcare, both a web application and mobile. Send me a PM and I'd be happy to connect and exchange ideas.
LOYAL AG
How long do you want to ignore this user?
AG
SteveA said:

Quote:

I think right now one of it's biggest drawbacks is it's been designed to be too confident about it's replies.

It isn't confident. It's simply using vectors to determine what word should come next in a sequence, for a given prompt. It does not think or reason. It's just math.


I think we all get that in a generic sense but the AI tools have been caught making **** up. Why is that? Why, when I ask it how to pay a partner in a Partnership, does it tell me to put them on W2 payroll then when I ask if putting a partner on W2 payroll is legal does it tell me it's not? It isn't, by the way. Not too long ago an attorney got in hot water when he submitted what turned out to be fictitious cases to support his position in a case he was arguing. Why is that happening? I think that's where the concern comes from. It does some really incredible stuff but it also does some really boneheaded things in areas where it should know better.

We've had multiple stories of these agents changing code and ignoring shut down orders. How is that happening? Things like that just make it feel different than anything we've ever seen before.
G Martin 87
How long do you want to ignore this user?
AG
I've listened to a few recent interviews with AI scientists, and the consensus seems to be that nobody understands exactly why AI responses include fabricated and even contradictory statements. Obviously GIGO explains some of the examples, but not all of them.
javajaws
How long do you want to ignore this user?
AG
stick95 said:

It depends on what you term as AI. ChatGPT/Grok/etc is decent at simple structured tasks but is wildly inaccurate.

Agentic AI coded with the proper boundaries and refining itself can be very powerful and accurate. We've decided to really push boundaries as a software development teams from an AI standpoint. Our developers now have a team of AI agents... a BA, technical architect, developer, devops, pull request agent and code reviewer. Each of the agent roles focused on their sole task, and DOCUMENTING exactly what they are doing, which beats the hallucinations and inaccuracy. BA writes up a feature with full documentation, architect writes the tech spec with full documentation (recommends sub tasks if the issue is too complex), developer writes the code and unit tests, pull request makes the pull request with full documentation, code review reviews the code. Passing documentation along to each of these steps is key.

The agents also update one another. The architect updates the BA if it finds holes in the requirement so that it will include that next time. Code review updates the architect and developer.

We are still feeling things out, but we are shipping features probably 2-3 times faster. But we are also getting way more output for that work as well in terms of documentation and process refinement.

Until we get real AGI, AI agents are where you'll get your best results for sure. Generalized AI like ChatGPT/Grok are just glorified search engines IMO.

Once you have customized agents trained and tuned on data of your choosing as well as the ability to proactively take again and work with other agents...well there's certainly a lot of good use that can come from that over simply typing in some prompts to a generic engine.
nai06
How long do you want to ignore this user?
AG
I work in publishing so I have a pretty obvious bias against it.

As far as it replacing authors and writers, it does a pretty terrible job. when people use AI to write stories or books it is very obvious. One of the big problems is that services like ChatGPT end up just telling you what it thinks you want to hear. There are several examples on reddit of ChatGPT saying it was making edits on a manuscript only to find out it had done nothing.


From a moral and ethical standpoint, I have little respect for people that use AI in this manner. Just about every LLM out there is trained on stolen works/books. And those who seek to use it to write new content for them are lazy hacks IMO.
ramblin_ag02
How long do you want to ignore this user?
AG
Currently, LLM seem great at repetitive, tedious tasks with very low stakes. In my world that would mean things like writing sick notes or generic letters of medical conditions for work or housing use. The sort of thing where I have to write several paragraphs to say one simple thing, and if it messes it up then no big deal.

AI is derivative by design and therefore won't be good at any task requiring creativity. It might be able to help synthesize huge amounts of data much faster than a human, and then the human can find creative connections that they might not be able to otherwise.

AI is also really bad at high stakes tasks. If an AI is doing a simple surgery and kills you, then who gets blamed for that? The AI makers, the hospital that let it operate, some random human "supervisor"? As of now AI has no accountability or liability, and that doesn't work in high stakes situations like medicine, air travel, car travel, meteorology, or law

Edit: As I get older I'm learning that a large number of people have jobs that mostly consist of tedious tasks with very low stakes. As far as I'm concerned, these are the people that need to be most worried about AI
No material on this site is intended to be a substitute for professional medical advice, diagnosis or treatment. See full Medical Disclaimer.
wcb
How long do you want to ignore this user?
AG
Quote:

AI is derivative by design and therefore won't be good at any task requiring creativity.


Obviously you missed the thread where TexAgs asked AI to write a rap song about a burrito with a lisp...

(Seriously, look it up. Still cracks me up.)
aggiesed8r
How long do you want to ignore this user?
AG
It will end us all.
Astroag
How long do you want to ignore this user?
AG
KingofHazor said:

I'd like to know the opinions of you guys much more knowledgeable than I am on AI. This post was prompted by seeing this article and my own frustrations in trying to use ChatGPT, Grok, and Gemini.

The article:

Attorney in TD Jakes Defamation Case Sanctioned $76K For Using 'AI Hallucinations' In Court Documents - Protestia

My own frustrations in trying to use AI are similar. The AIs listed above frequently make stuff up in response to my questions, ignore the parameters I try to set, and seem incapable of doing the very things that they are supposed to do.

As just one example, I've asked them to provide various types of weather data. I assumed that they could identify the sites containing the raw data and then summarize them for me. But they don't. Instead, they provide something that sounds vaguely familiar to but different than what I requested, and then tell me to go to weather.gov for more info.

Another example is I have tried to use them to assist me in teaching my grand nephews and nieces US history. At first, I was amazed. They suggested I upload mere pictures of the text pages covered in each lesson, and then they could provide a student worksheet and a lesson plan. A first glance at those documents was impressive. However, a closer look showed that the documents included matters not in the text at all, and also included matters that were simply wrong.

I am finding those sites unreliable and not very valuable. It is clear that they don't "think" or "reason" at all, but simply regurgitate text from the internet based on some complex algorithm. They are only as good as their algorithms, and for me those algorithms seem deeply flawed.

What am I missing or doing wrong?

If you haven't, you need to build separate GPT prompts/personas for each of these tasks that give it confined boundaries.

For example, in the history teacher gpt, you may want to include language that says it shouldn't venture outside of the text or knowledge base provided (I think you can also help with this by turning off web search). Additionally, you can require that it provide cites to where its getting the information so that you can validate the information (or even validate the information its providing by confirming it via multiple sources.

_______________________________________________________


If ya ain't cheatin, you ain't tryin!!!
KidDoc
How long do you want to ignore this user?
AG
ramblin_ag02 said:

Currently, LLM seem great at repetitive, tedious tasks with very low stakes. In my world that would mean things like writing sick notes or generic letters of medical conditions for work or housing use. The sort of thing where I have to write several paragraphs to say one simple thing, and if it messes it up then no big deal.

AI is derivative by design and therefore won't be good at any task requiring creativity. It might be able to help synthesize huge amounts of data much faster than a human, and then the human can find creative connections that they might not be able to otherwise.

AI is also really bad at high stakes tasks. If an AI is doing a simple surgery and kills you, then who gets blamed for that? The AI makers, the hospital that let it operate, some random human "supervisor"? As of now AI has no accountability or liability, and that doesn't work in high stakes situations like medicine, air travel, car travel, meteorology, or law

Edit: As I get older I'm learning that a large number of people have jobs that mostly consist of tedious tasks with very low stakes. As far as I'm concerned, these are the people that need to be most worried about AI

I use it a lot to generate differential dx for more rare and unusual conditions or to quickly review treatment protocols for stuff I only see a few times in my career. Recent examples include: treatment new onset hyperthyroidism, diff dx of metabolic acidosis in FTT child, evaluation for possible PANDAS.

It also helps to generate the bulk of letters of rec for residency- can upload CV's and personal statements and it writes a letter than I just need to touch it up and proof it.

No material on this site is intended to be a substitute for professional medical advice, diagnosis or treatment. See full Medical Disclaimer.
JJxvi
How long do you want to ignore this user?
AG
LOYAL AG said:

SteveA said:

Quote:

I think right now one of it's biggest drawbacks is it's been designed to be too confident about it's replies.

It isn't confident. It's simply using vectors to determine what word should come next in a sequence, for a given prompt. It does not think or reason. It's just math.


I think we all get that in a generic sense but the AI tools have been caught making **** up. Why is that? Why, when I ask it how to pay a partner in a Partnership, does it tell me to put them on W2 payroll then when I ask if putting a partner on W2 payroll is legal does it tell me it's not? It isn't, by the way. Not too long ago an attorney got in hot water when he submitted what turned out to be fictitious cases to support his position in a case he was arguing. Why is that happening? I think that's where the concern comes from. It does some really incredible stuff but it also does some really boneheaded things in areas where it should know better.

We've had multiple stories of these agents changing code and ignoring shut down orders. How is that happening? Things like that just make it feel different than anything we've ever seen before.


It's a misunderstanding of what it tries to do. It in no way has a directive or impetus to find and provide the answer your question (although it may do some research first as part of a separate process). Its whole purpose is to predict what a human would respond to the prompt. If it knows the answer, its fine because it responds as a human would (ie it has a solid prediction prepared) and gives the answer. If it doesnt know, it doesnt have the reasoning skills to say "I dont know" and only will do so if not knowing is part of its prediction for the prompt. What it does is just strings words together that sound good, like a human would say. Its very similar to how a con artist might answer questions.

Hardworking, Unselfish, Fearless
Astroag
How long do you want to ignore this user?
AG
JJxvi said:

LOYAL AG said:

SteveA said:

Quote:

I think right now one of it's biggest drawbacks is it's been designed to be too confident about it's replies.

It isn't confident. It's simply using vectors to determine what word should come next in a sequence, for a given prompt. It does not think or reason. It's just math.


I think we all get that in a generic sense but the AI tools have been caught making **** up. Why is that? Why, when I ask it how to pay a partner in a Partnership, does it tell me to put them on W2 payroll then when I ask if putting a partner on W2 payroll is legal does it tell me it's not? It isn't, by the way. Not too long ago an attorney got in hot water when he submitted what turned out to be fictitious cases to support his position in a case he was arguing. Why is that happening? I think that's where the concern comes from. It does some really incredible stuff but it also does some really boneheaded things in areas where it should know better.

We've had multiple stories of these agents changing code and ignoring shut down orders. How is that happening? Things like that just make it feel different than anything we've ever seen before.


It's a misunderstanding of what it tries to do. It in no way has a directive or impetus to find and provide the answer your question (although it may do some research first as part of a separate process). Its whole purpose is to predict what a human would respond to the prompt. If it knows the answer, its fine because it responds as a human would (ie it has a solid prediction prepared) and gives the answer. If it doesnt know, it doesnt have the reasoning skills to say "I dont know" and only will do so if not knowing is part of its prediction for the prompt. What it does is just strings words together that sound good, like a human would say. Its very similar to how a con artist might answer questions.




This is a wild misunderstanding and mischaracterization of LLMs.
_______________________________________________________


If ya ain't cheatin, you ain't tryin!!!
SteveA
How long do you want to ignore this user?
AG
Are you using it in a proprietary system, or the general GPT through a browser? Your use case would be good for a RAG system, but I don't know if I would just ask chatGPT about any medical conditions.
KidDoc
How long do you want to ignore this user?
AG
SteveA said:

Are you using it in a proprietary system, or the general GPT through a browser? Your use case would be good for a RAG system, but I don't know if I would just ask chatGPT about any medical conditions.

Grok over the last few months. The corporate overlords are going to restrict AI access in a few weeks and force us to use Insightli which is based off Gemini. I have not been as impressed with Insightli and I've let them know and they have made adjustments to their credit.

They all list the sources so if I'm worried about inaccuracies it is very easy to see where the data is coming from. It just saves me a ton of time from digging through actual articles or uptodate.com.
No material on this site is intended to be a substitute for professional medical advice, diagnosis or treatment. See full Medical Disclaimer.
Page 1 of 2
 
×
subscribe Verify your student status
See Subscription Benefits
Trial only available to users who have never subscribed or participated in a previous trial.