According to CMU professor Chris Goranson, faculty lead for the project, the tool was developed by students in his Policy Innovation Lab class, a seven-week course where students are encouraged to come up with tech solutions to challenges encountered in government or government-related jobs. He said the tool, dubbed “GovScan,” was developed by Davis Craig and Tyler Faris, students in CMU’s Master of Science in Public Policy and Management program, along with Aakash Dolas and Eashwari Samant, who are in the Master of Integrated Innovation for Products and Services program.
Goranson said the tool can help public servants comb through PDF reports for relevant information needed for things like policy and funding proposals. Noting that policy analysts can typically spend three to four hours looking for data points within relevant reports, he said the platform uses large language modeling to find information in a matter of a few seconds.
“The basic idea was just to improve the usability of government reports,” he said. “That was the idea, was [seeing if] students could come up with a creative solution that would allow people to more efficiently search lots of information spread across lots and lots of different documents.”
Craig said the LLM tool utilizes a keyword search and a “semantic search,” which uses a technique called “vector embeddings” to determine the meaning of a question and scan indexed reports to find relevant ones, as well as data points within reports that are most applicable to a given project. The tool is then able to summarize results and provide citations for the information. Craig likened the tool to “the ‘Control-F’ search function on steroids.”
“We’ve been working on getting it more and more precise so we can be able to find more and more specific pieces of information,” he said. “That’s something that we’re continually working on.”
According to a news release emailed to Government Technology, the idea came after students made note of the challenges faced by analysts and researchers like Maya Mechenbier, a project lead for the U.S. Digital Service. In one scenario, students connected with government workers who’d been tasked with reviewing reports on child-care funding in 50 states, with each report containing hundreds of pages. In cases such as these, the release said, policy analysts need to be able to find particular data points within reports to analyze and compare the effectiveness of programs, which can be difficult for analysts to absorb due to the magnitude and variation of the information.
“Whether it’s for Medicaid or the Child Care Development Fund (CCDF) subsidy dollars, states’ plans are typically stored and made public in a PDF form,” Mechenbier said. “Fifty states might do 50 different things with their programs.”
According to the news release, the team has made their work available via a GitHub repository under an open-source license as they work to develop the platform further. The team noted that the application still needs additional testing but hopes the tool will be useful for research officers and policy analysts to do their jobs more effectively.
“It seems simple on the surface, but I think it could be a real game changer in terms of actually allowing people to be much more efficient at researching data,” Goranson said. “The way that usually happens is we just spend hours and hours and hours parsing through it, or maybe we try that with some [other] existing tool, but I think what the students created was something that really leveraged cutting-edge technologies to approach the problem slightly differently.”