10 October 2019

Research pick: "Parallel PageRank algorithm quicker to spot spam"

Nilay Khare and Hema Dubey of the Maulana Azad National Institute of Technology, in Bhopal, India, discuss how Google’s “PageRank” system can be used to detect spam web pages. That is pages created for nefarious purposes that attempt to gain a higher position in the search engine results pages (SERPs) through the false representation of their value and relevance to the person carrying out a search.

PageRank was developed by Google’s founders Larry Page and Sergei Brin back in 1996 at Stanford University, building on the foundations of other ranking algorithms that had been developed through the 1970s and onwards. PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.

Of course, the notion of “quality”, good or bad, is rather ephemeral and so over the years since the rise of Google, there is an ongoing struggle between webmasters who would wish their sites to be high up in the SERPs and so more visible and Google which endeavours to preclude spammy tactics that might game its system and allow webmasters of lower quality sites to achieve unwarranted high status in the ranks.

Khare and Dubey have developed an efficient and faster parallel PageRank algorithm that can harness the power of a computer’s graphics processing units (GPUs). Their results show a speed enhancement in calculating PageRank and so finding spam pages of up to 1.7 times that of the conventional parallel PageRank algorithm. The team even suggests in its conclusion that their approach is “immune” to spammy websites.

Khare, N. and Dubey, H. (2019) ‘Fast parallel PageRank technique for detecting spam web pages’, Int. J. Data Mining, Modelling and Management, Vol. 11, No. 4, pp.350–365

No comments: