The Elephant In The Alignment Room
When ChatGPT exploded onto the scene on November 30, 2022 (hard to believe that was less than 2 years ago!), the idea of AI “alignment” was suddenly a big deal. That means “alignment with human values.”
I thought: “wouldn’t it be nice if humans were aligned with human values”?
Even some values that seem uncontroversial, like “tell the truth,” are things that most people agree with but nobody actually does. Sorry, Jim-Bob, when your sister said “your new glasses are cool” what she actually meant was “your new glasses make you look like a dork”.