Block level shingle analysis for determining duplicate content



Please enter up to 10 URLs (on separate lines):

Choose your desired shingle-size (number of words):

This tool spiders a page, extracts the indexable text on a block level, creates 'shingles' (groups of words) and compares those to the other URLs you have specified. This algorithm can be used to help determine how similar different pages are - or how unique the content on the page actually is.

It can make sense to run several URLs from the same site through this tool to determine the in-site duplicity. It can also make sense to run several related URLs through this tool to dermine how related their indexable content is.

The "k-shingle" algorithm is sometimes referenced in patents issued to search engines.

[ See latest shingle analysis reports | Discussion ]