newslettering
This commit is contained in:
parent
596bd869d2
commit
19ec01baa3
|
@ -31,4 +31,28 @@
|
|||
+ a thought experiment about false positives
|
||||
+ The future of GPT-likes, a prediction
|
||||
** Text
|
||||
Hi everyone, it's been months since my last newsletter but there's a lot to cover as we're getting back to talking about what's been happening in the world of large language models (LLMs)
|
||||
Hi everyone, it's been months since my last newsletter but there's a lot to cover as we're getting back to talking about what's been happening in the world of large language models (LLMs).
|
||||
|
||||
So, obviously, a lot of this is going to be about chatGPT. That's unavoidable. It's the big flashy product that's growing to tens and tens of millions of users, over a hundred million if their hype is to be believed, and it's the main thing that everyone is talking about. It was supposed to change the world: but has it?
|
||||
|
||||
When it comes to education, certainly the fear of chatGPT has already had a large impact. Now that we're reaching the end of the school year here in the US we're seeing a lot of newstories crop up about "AI detection tools" getting used to flunk swaths of students.
|
||||
|
||||
The biggest story was about Texas A&M-Commerce: https://www.nbcdfw.com/news/local/tamu-commerce-instructor-accuses-class-of-using-chatgpt-on-final-assignments/3260731/
|
||||
|
||||
But I've also been finding stories like this showing up on reddit a lot: https://www.reddit.com/r/ChatGPT/comments/13qt26p/my_english_teacher_is_defending_gpt_zero_what_do/ (with regard to comments, caveat lector)
|
||||
|
||||
So the Texas A&M story is just wild, right? Like the professor, who apparently is a 30 y/o who just got his PhD a couple of years ago so this isn't a tenured boomer on his way out, is pasting in answers into chatGPT and asking "did you write this?" which is an absolutely non-sensical thing to do. As I'm fond of quoting Pauli: it's not even wrong.
|
||||
|
||||
But the stories involving tools like gptzero are more interesting because they are, at least in principle, capable of detecting whether text is generated by gpt3/3.5/4. So what does it mean when almost every student in a class is getting accused of plagiarism? Well this is an example of what sometimes is called the "false positive paradox" which itself is an example of "base rate fallacy": https://en.wikipedia.org/wiki/Base_rate_fallacy
|
||||
|
||||
The idea is pretty simple: let's imagine, just to make the numbers look nice, when zerogpt flags an essay as generated text it's right 95% of the time and the other 5% is a false positive. Now that sounds pretty good, right? But the problem is that we have to worry about how many cheaters there are in the first place! Imagine a class of two hundred students where fully half of them cheating. So then you're going to get 100*0.95 = 95 cheaters correctly flagged and 100*0.05 = 5 non-cheaters incorrectly flagged. Well, duh, that's what we expect from the 95% accuracy, right?
|
||||
|
||||
Now consider, what if out of that two hundred only 10 of them are cheaters? Then you're going to get, roughly, all ten of those cheaters identified but also 190*0.05 = 9.5 (rounding up) another ten students who aren't cheaters flagged as well.
|
||||
|
||||
So the efficacy of a tool like zerogpt is almost entirely dependent on whether you think that cheating is prevelant---but even in the best case scenario you will be punishing students who are not cheating!
|
||||
|
||||
Consider, then,
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue