miscellaneous-docs/newsletter8.org

14 KiB
Raw Permalink Blame History

Draft

Hi everyone, hope the end of the term has been treating everyone well. Although, just because the term is done doesn't mean there's a pause in news about AI. I want to spend the bulk of this issue talking about that report from the federal Department of Education about AI and education, but first we have some stories to cover over the past few weeks.

For example, there's this fun local story about our own non-emergency response here in Portland: https://www.wweek.com/news/city/2023/06/17/ai-call-taker-will-begin-taking-over-police-non-emergency-phone-lines-next-week/

Apparently we're moving to a automatic transcription -> natural language processing pipeline to route non-emergency calls to the police. Yes, this is about non-emergency calls, so it's not likely that we're going to have people die because of this but

umm

we understand, right, exactly who the people this will fail are, right? Automatic speech recognition is much more likely to fail for people who speak english with any kind of accent outside of a very narrow subset, which my gut tells me are the people who are more likely to need help navigating something like a non-emergency services help line in the first place.

So I'm not thrilled about this, especially since it's apparently had serious problems in development that kept it from being rolled out before now. This is, I think, a bad kind of automation: it's not processing information for a person to review it's trying to make decisions in our place.

Also, speaking of help lines no matter what this help line can't be as bad as the eating disorder helpline that fired its staff and tried to replace them with a chatbot! Here's a story about them making the change and then here's a story about how it failed and had to be disabled in just a matter of days. Now as the first story points out, this wasn't an LLM it was a more classicly designed chatbot, but the lessons here are basically the same: for the love of Minsky stop trying to automate things that are delicate, contextual, situations and if you're going to do it have an actual human review the work first.

The fact that this eating disorder chatbot failed isn't the interesting part, to me, but rather that so little thought is still going into the testing and deployment of these systems. Were I empress of AI deployment my first decree would be that you have to test any natural language processing against the machinations of a troll-brained 12 y/o. My second, unfunded, decree would be that you should really just pay actually good QA people to try and break your chatbots and natural language interfaces: treat it like bug-finding in video games where you're trying the absolutely most asinine things possible to break the system and produce unexpected behavior.

In actual LLM related news though this was definitely food for thought: https://vulcan.io/blog/ai-hallucinations-package-risk

So as backstory for those who aren't programmers: for most programming languages of any significant popularity there are tools that help you find and install libraries other people have written for the language. By "libraries" we basically just mean software that helps simplify some task so you don't have to build a solution from scratch. For example, to run a web server you're going to want libraries that let you send messages over the internet, that implement the HTTP protocol that the web uses, that keep track of cookies or handle user account creation if needed, &c. If every programmer had to build all of that from scratch every time we'd literally never get anything done. Downloading and installing libraries can be kind of a pain though because libraries depend on other libraries which depend on other libraries which dot dot dot and so we have tools like pip in Python, npm for JavaScript, or cabal for Haskell that in principle should handle all of those dependencies for you. You just need to know the name of the library and if it's been uploaded to the central repository for the language then it will be downloaded as well as everything it needs.

Okay so now to explain the attack: the idea is that chatGPT can potentially generate code that has fake calls to non-existent libraries in response to queries, much like how it can generate citations to papers that don't exist when there isn't actually a high-confidence answer learned from the corpus. So enterprising attackers could potentially try asking code generation questions, note down the false libraries recommended, then quickly create malware with the names of these non-existent libraries and upload those to pip or npm or whatever. So then, later, when someone performs a similar query and gets the non-existent library recommended to them they'll try to install it and rather than it not existing it will appear to work and the attacker wins.

I don't really know how likely this is to work in practice, but it's an attack vector that had never even occurred to me before and that's really interesting. Again, I'm not actually sure what the defense against such an attack is. I mostly expect it might fail because I don't know how often you'll actually get the exact same fake library recommended. But does it need to work all the time? In a world where tons of developers are using the same LLM to generate code snippets, maybe it only needs to work 1/10000 times to be worth it.

Finally in the linkdump portion of this piece: https://lcamtuf.substack.com/p/llms-are-better-than-you-think-at

This is a short but interesting little article about how "reinforcement learning with human feedback", the secret sauce that makes chatGPT, Alpaca, Vicuna, and the like so good at taking in our prompts and almost magically doing the right thing with them, is easier to game than it may seem at first.

Okay let's start talking about that piece from Department of Education on AI and education. Now, I swear I'm mostly going to have positive things to say about this report but am I going to start off with a nitpick based on this paragraph:

AI can be defined as “automation based on associations.” When computers automate reasoning based on associations in data (or associations deduced from expert knowledge), two shifts fundamental to AI occur and shift computing beyond conventional edtech: (1) from capturing data to detecting patterns in data and (2) from providing access to instructional resources to automating decisions about instruction and other educational processes. Detecting patterns and automating decisions are leaps in the level of responsibilities that can be delegated to a computer system. The process of developing an AI system may lead to bias in how patterns are detected and unfairness in how decisions are automated. Thus, educational systems must govern their use of AI systems. This report describes opportunities for using AI to improve education, recognizes challenges that will arise, and develops recommendations to guide further policy development.

When they're talking about detecting patterns and automating reasoning based on associations they're talking specifically about machine learning not AI broadly. Admittedly, most AI these days is machine learning but I think it's still important to note that AI is sort of just the general part of computer science that deals with creating adaptive algorithms that handle situations where exact solutions aren't known. In other words, it's about creating programs that solve problems rather than programmers solving the problem and implementing the solution to that instance in code.

So why I like this report is that I think they start off on the right track with stuff like this

pg. 6

Understanding that AI increases automation and allows machines to do some tasks that only people did in the past leads us to a pair of bold, overarching questions:

  1. What is our collective vision of a desirable and achievable educational system that

leverages automation to advance learning while protecting and centering human agency?

  1. How and on what timeline will we be ready with necessary guidelines and guardrails, as

well as convincing evidence of positive impacts, so that constituents can ethically and equitably implement this vision widely?

I do think you should probably read this for yourself but I'll say that if you're already educated on some of the issues of AI and ethics the meat of the matter starts on page 15 with this part

AI models allow computational processes to make recommendations or plans and also enable them to support forms of interaction that are more natural, such as speaking to an assistant. AIenabled educational systems will be desirable in part due to their ability to support more natural interactions during teaching and learning. In classic edtech platforms, the ways in which teachers and students interact with edtech are limited. Teachers and students may choose items from a menu or in a multiple-choice question. They may type short answers. They may drag objects on the screen or use touch gestures. The computer provides outputs to students and teachers through text, graphics, and multimedia. Although these forms of inputs and outputs are versatile, no one would mistake this style of interaction with the way two people interact with one another; it is specific to human-computer interaction. With AI, interactions with computers are likely to become more like human-to-human interactions (see Figure 4). A teacher may speak to an AI assistant, and it may speak back. A student may make a drawing, and the computer may highlight a portion of the drawing. A teacher or student may start to write something, and the computer may finish their sentence—as when todays email programs can complete thoughts faster than we can type them.

Additionally, the possibilities for automated actions that can be executed by AI tools are expanding. Current personalization tools may automatically adjust the sequence, pace, hints, or trajectory through learning experiences. Actions in the future might look like an AI system or tool that helps a student with homework or a teaching assistant that reduces a teachers workload by recommending lesson plans that fit a teachers needs and are similar to lesson plans a teacher previously liked. Further, an AI-enabled assistant may appear as an additional “partner” in a small group of students who are working together on a collaborative assignment. An AI-enabled tool may also help teachers with complex classroom routines. For example, a tool may help teachers with orchestrating the movement of students from a full class discussion into small groups and making sure each group has the materials needed to start their work.

Like here we're not talking about the problems of current AI/ML deployment, which I think we've all gotten really familiar with, but rather are starting to focus on what could be and how we get there.

For example, like for awhile now I've been on team "LLMs are more useful as queries to documents than as Thing Knowers" but like even just these paragraphs can help us dig into why a little bit.

Imagine in, say, a year or two from now we've got open source language models that can take in even just—say—an entire chapter or two of a textbook as context. That's probably an overestimate of how far we are, but still let's go with in the next couple of years.

So if you can paste in an entire chapter as context for the LLM, what kinds of possibilities open up for interactions? Well, as a student you could ask for a study guide, a set of potential quiz questions, vocabulary to memorize, an outline of the chapter, &c. As an instructor you could ask for potential homework questions, quiz ideas, class activities, &c. As an instructor, the results aren't anything mindblowing but they're enough to get your curriculum design brain churning when you might be drawing a blank. As a student, they're useful for giving you different ways to review or practice the material.

For example, I was looking at the ccog page for a CS class, specifically our CS 201. I pasted into GPT4 the straight up text from the page and then started asking for study plans, outlines, self-quizzes, &c. It worked shockingly well and all it had was the additional ccog context in addition to the information contained in the corpus, which is fairly standard and well-defined material about low-level programming. The kind of patterns that can be learned from wikipedia, stack overflow, university course sites on programming and such. I was even able to generate a set of practice programming assignments for each topic with code skeletons and hints in comments. That would have been so useful when I was first learning programming.