There’s an interesting new paper (Maslov, arXiv:0901.2640v1) up on the arXiv this month about using centrality metrics (in their case a modified PageRank) to analyze citation graphs in academic publishing. I’ll refrain from summarizing the paper as a related post on the arXiv physics blog has already done a great job. But the upshot is that there’s a lot of value in applying these kinds of metrics to citation networks.
This paper fit closely with work I did in the past looking at citation graphs in patent data (the complete set back to the 1970s) . In my case I was trying to assess the importance of inventors within a given field of innovation using a betweenness centrality metric (though PageRank/eiganvector centrality would have also been an appropriate choice). Like the Maslov paper illustrated, this approach had a very high degree of success in finding key individuals in given fields. As an example, I did a test on patents issued for technologies related to video games and the betweenness centrality metric showed Shigeru Miyamoto, the lead designer at Nintendo, as the top innovator in an inventor-to-inventor citation graph. This result appears to be supported by his biography which includes such honors as being named the “Walt Disney of electronic gaming” by TIME Magzine.
One problem not addressed in the Marsolv paper, however, is the translation from papers to people. The Marsolv approach only ranks papers though it makes inference about the rank of the people that wrote them. I considered this in my work with patents and found it problematic. Rather than looking at centrality across a paper-to-paper citation graph I decided to first derive a person-to-person graph that summed citation edges between inventors across the complete body of each inventor’s work. I was fortunate enough that the data I was working with had already attempted to disambiguate the inventors (no small feat!) so it was possible to translate between a paper citation graph and a people citation graph with relative ease. In a sense the person-to-person citation graph forms a sort of tacit social network extracted from the patent data.
I’m very much aware of the challenges in doing this with journals so I don’t fault them for not addressing this in their paper. However, it would be a great follow-on study to explore the difference in rankings using these approaches. I believe that there’s a need to continue thinking about the value of tacit social networks derived from sources like journals and patents, particularly in cases where the data is used to generate sociometric values like impact and importance.