newslettering

This commit is contained in:
clarissa 2023-05-25 17:47:20 -07:00
parent 596bd869d2
commit 19ec01baa3
1 changed files with 25 additions and 1 deletions

View File

@ -31,4 +31,28 @@
+ a thought experiment about false positives
+ The future of GPT-likes, a prediction
** Text
Hi everyone, it's been months since my last newsletter but there's a lot to cover as we're getting back to talking about what's been happening in the world of large language models (LLMs)
Hi everyone, it's been months since my last newsletter but there's a lot to cover as we're getting back to talking about what's been happening in the world of large language models (LLMs).
So, obviously, a lot of this is going to be about chatGPT. That's unavoidable. It's the big flashy product that's growing to tens and tens of millions of users, over a hundred million if their hype is to be believed, and it's the main thing that everyone is talking about. It was supposed to change the world: but has it?
When it comes to education, certainly the fear of chatGPT has already had a large impact. Now that we're reaching the end of the school year here in the US we're seeing a lot of newstories crop up about "AI detection tools" getting used to flunk swaths of students.
The biggest story was about Texas A&M-Commerce: https://www.nbcdfw.com/news/local/tamu-commerce-instructor-accuses-class-of-using-chatgpt-on-final-assignments/3260731/
But I've also been finding stories like this showing up on reddit a lot: https://www.reddit.com/r/ChatGPT/comments/13qt26p/my_english_teacher_is_defending_gpt_zero_what_do/ (with regard to comments, caveat lector)
So the Texas A&M story is just wild, right? Like the professor, who apparently is a 30 y/o who just got his PhD a couple of years ago so this isn't a tenured boomer on his way out, is pasting in answers into chatGPT and asking "did you write this?" which is an absolutely non-sensical thing to do. As I'm fond of quoting Pauli: it's not even wrong.
But the stories involving tools like gptzero are more interesting because they are, at least in principle, capable of detecting whether text is generated by gpt3/3.5/4. So what does it mean when almost every student in a class is getting accused of plagiarism? Well this is an example of what sometimes is called the "false positive paradox" which itself is an example of "base rate fallacy": https://en.wikipedia.org/wiki/Base_rate_fallacy
The idea is pretty simple: let's imagine, just to make the numbers look nice, when zerogpt flags an essay as generated text it's right 95% of the time and the other 5% is a false positive. Now that sounds pretty good, right? But the problem is that we have to worry about how many cheaters there are in the first place! Imagine a class of two hundred students where fully half of them cheating. So then you're going to get 100*0.95 = 95 cheaters correctly flagged and 100*0.05 = 5 non-cheaters incorrectly flagged. Well, duh, that's what we expect from the 95% accuracy, right?
Now consider, what if out of that two hundred only 10 of them are cheaters? Then you're going to get, roughly, all ten of those cheaters identified but also 190*0.05 = 9.5 (rounding up) another ten students who aren't cheaters flagged as well.
So the efficacy of a tool like zerogpt is almost entirely dependent on whether you think that cheating is prevelant---but even in the best case scenario you will be punishing students who are not cheating!
Consider, then,