Machine learning analysis of research citations highlights importance of federal funding for basic scientific research

B. Ian Hutchins

B. Ian Hutchins

Biomedical research aimed at improving human health is particularly reliant on publicly funded basic science, according to a new analysis boosted by artificial intelligence.

“What we found is that even though research funded by the National Institutes of Health makes up 10% of published scientific literature, those published papers account for about 30% of the substantive research — the important contributions supporting even more new scientific findings — cited by further clinical research in the same field,” says B. Ian Hutchins, a professor in the University of Wisconsin–Madison’s Information School, part of the School of Computer, Data & Information Sciences. “That’s a pretty big over-representation.”

Hutchins and co-authors Travis Hoppe, now a data scientist at the Centers for Disease Control and Prevention, and UW–Madison graduate student Salsabil Arabi, published their findings recently in the Proceedings of the National Academy of Sciences.

Published research papers typically include lengthy sections citing all the previous work supporting or referenced within the study. “Predicting substantive biomedical citations without full text,” the paper by Hutchins and Hoppe that you are reading about right now, cited no fewer than 64 other studies and sources in its “References” section.

Citations represent the transfer of knowledge from one scientist (or group of scientists) to another. Citations are extensively catalogued and tracked to measure the significance of individual studies and of the individuals conducting them, but not all citations included in any given paper make equally important contributions to the research they describe.

Read More