The Allen Institute for Artificial Intelligence announced that their system, Aristo, is now successfully tackling multiple-choice science questions at the eighth-grade level.
This is an incredible benchmark according to previous work in AI — one that 170 teams failed to meet when founder Paul Allen launched the Allen AI Science Challenge in 2016 for an $80,000 prize.
The Allen Institute is headed by UW professor Oren Etzioni.
Currently, Aristo is able to correctly answer 90% of questions from the eighth-grade New York Regents Science Exam. This is a significant increase compared to 2016 when no AI system was able to score over 60%.
These findings are significant because science exams explore important aspects of machine intelligence such as language processing and common-sense reasoning. Aristo’s improvement in performance comes from new technology taking the machine learning world by storm as well as from strategic changes by the team designing the system.
Peter Clark, the project lead for Aristo, also worked for 10 years on the system that was Aristo’s predecessor. According to Clark, several things have changed this time around. In the previous project, scientific knowledge had to be encoded manually whereas Aristo retrieves information automatically.
Additionally, Aristo aims to do most of the work through natural language processing which has been a new approach for the team. Aristo also started with elementary-level science instead of beginning with college-level science.
The most significant contributor has been new developments in natural language processing.
“The big boost in scores this year is largely credited to new tech from natural language processing called language models,” Clark said. “Language models try and predict what the next word or sentence you’re trying to say is from what you’ve said before. It turns out you can adapt these to predict science answers as well.”
This new technology is part of a growing trend of deep learning which is changing the capacity of machine learning. Deep learning is a method that uses neural networks for machine learning — a process that is loosely modeled off of the way the brain works.
The essence of this technology is that these networks can be trained on a set of inputs and learn from examples of appropriate outputs. Because of this, researchers have increased the capability to tackle complex issues of reasoning Aristo needs to address in order to answer multiple-choice science questions.
“The hardest thing is that many of the science questions you can’t just look up the answers on the web, you need to pull two or three bits of information together,” Clark said.
Some of these complexities are things a person might take for granted, especially when it comes to common-sense knowledge. Clark gave an example of a simple question: What surface is more suitable for roller skating, gravel or blacktop?
To address this question, Aristo needs background about what rollerskating is, what appropriate surfaces for rollerskating are, and what distinguishes surfaces like gravel and blacktop. While a person could understand this question easily, it’s more challenging to develop the reasoning required in a machine.
“I’ve been working in AI for 35 years and I’ve seen more change in the last five years than I have in my whole career,” Clark said. “The thing that stands out for me is I feel Aristo’s success highlights the rapid changes the field has made.”
If you’re curious about Aristo, you can test it with elementary and middle school science questions through a live demo on the Allen Institute website.
Moving forward, the team has big ideas for expanding Aristo’s capabilities. Clark believes the next step for Aristo is to provide complete justifications for its answers so it can explain the science concepts to a student through a dialogue. Eventually, Aristo may also be able to move beyond only multiple-choice questions and simply answer direct questions about science.
“Success is not the end of the road,” Clark said. “Answering multiple-choice questions is only the beginning of what you can do with intelligent machines.”
Reach reporter Rhea John at science@dailyuw.com. Twitter: @rheamjo
