The Natural Language Processing program evaluates middle school science articles

UNIVERSITY PARK, PA — Students may soon have another teacher in their classroom, but from an unexpected source: artificial intelligence (AI). In two recent papers, computer scientists at Penn State examined the effectiveness of a form of artificial intelligence known as natural language processing for evaluating and providing feedback on science articles to students. They detailed their findings in the publishing arm of International Society of Learning Sciences Conference (ISLS) In the Chronicle International Conference on Artificial Intelligence in Education (AIED).

Natural language processing is a subfield of computer science in which researchers convert a written or spoken word into computable data, according to the lead researcher. Rebecca BassonioProfessor of Computer Science and Engineering at Penn State.

Led by Basuno, the researchers working on the ISLS paper extended the capabilities of an existing natural language processing tool called Bear Eval To evaluate ideas in students’ writing based on pre-defined and calculable rules of measurement. They called the new program PyrEval-CR.

“PyrEval-CR can provide middle school students with instant feedback on their science essays, which frees much of the assessment burden from the teacher, so that more writing assignments can be incorporated into middle school science curriculum,” Bassono said. At the same time, the software generates a summary report of the topics or ideas in articles from one or more semesters, so teachers can quickly determine whether students have truly understood a science lesson.

The beginnings of PyrEval-CR date back to 2004, when Passonneau worked with collaborators to develop pyramid method, where researchers manually annotate source documents to reliably rank written ideas in order of relevance. Beginning in 2012, Passonneau and its graduate students worked to automate Pyramid, leading to the creation of fully automated PyrEval, the precursor to PyrEval-CR.

Researchers tested the functionality and reliability of PyrEval-CR on hundreds of real-world middle school science articles from Wisconsin public schools. Sadhana PuntambikarD., a professor of educational psychology at the University of Wisconsin-Madison and a collaborator on both papers, has recruited science teachers and developed the science curriculum. It also provided historical student essay data that was necessary for the development of PyrEval-CR before it was published in the classroom.

“At PyrEval-CR, we created the same kind of model that PyrEval would have created from a few passages for expert writers, but we’ve expanded it to align with any logical evaluation standard for a given article,” Bassono said. “We did a lot of experiments to fine-tune the program, and then we confirmed that the evaluation of the program is closely related to the evaluation from a manual evaluation model developed and implemented by the Puntambekar laboratory.”

In the AIED paper, the researchers report technical details of how the PyrEval program was adapted to create PyrEval-CR. According to Passonneau, most software is designed as a set of modules, or building blocks, each with a different function.

One of the PyrEval modules automatically creates the rubric, called the pyramid, from four to five reference texts written in the same student essay prompt. In the new PyrEval-CR, a rubric or computable rubric is generated semi-automatically before students receive an essay prompt.

“PyrEval-CR makes things easier for teachers in actual classrooms who use assessment rules, but usually do not have the resources to create their own rubric and test if it can be used by different people and achieve the same assessment of student work,” said Basuno.

To evaluate essays, students’ sentences must first be broken down into individual sentences and then converted into fixed-length sequences of numbers, known as vectors, according to Basuno. To capture the meaning of sentences in converting them into vectors, an algorithm called Weighted Text Matrix Analysis is used. Basuno said the algorithm picked up basic similarities in meaning better than other methods tested.

The researchers adapted another algorithm, known as a weighted maximum independent set, to ensure that PyrEval-CR chooses the best parse for a given sentence.

“There are many ways to break up a sentence, and each sentence can be a complex sentence or a simple phrase,” said Basuno. Humans know if two sentences are the same by reading them. To simulate this human skill, we convert each of the evaluation rubric ideas into vectors, and create a graph where each matching node represents the student’s vector to the assessment model vectors, so that the program can find the optimal interpretation of the student’s essay.”

Ultimately, the researchers hope to deploy the assessment program in the classroom to make assigning and evaluating scholarly articles more practical for educators.

“With this research, we hope to support students’ learning in science classes, to give them enough support and feedback and then step back so that they can learn and accomplish on their own,” said Basuno. “The goal is to allow science, technology, engineering, and mathematics (STEM) teachers to easily implement writing tasks in their curriculum.”

In addition to Passonneau and Puntambekar, other contributors to the ISLS paper are: Purushartha Singh and ChanMin Kim, Penn State College of Electrical Engineering and Computer Science; and Dana Genesdillo, Samantha Baker, Choiseong Kang, and William Goss, from the University of Wisconsin-Madison. In addition to Passonneau and Puntambekar, other contributors to the AIED paper are Muhammad Wasih, Penn State College of Electrical Engineering and Computer Science. Singh, Kim and Kang.

The National Science Foundation supported this work.

Leave a Comment