Online cache of web site

Undervalued · 29 January 2023 at 10:38AM

I wonder if anybody can help please.....

I am trying to find out when a particular page on a publicly accessible website changed (in the last couple of months). The site is cached on Wayback Machine around twenty times in the last seven years at seemingly random intervals. The most recent is late November last year, then on two consecutive days in June. The November cache is useful up to a point in that it is very different to the information that appears now. However I really need to know more accurately when it changed, which may of course have happened in stages on various days between then and now.

Can anybody suggest anywhere else I can try? Also, it would be interesting to know how these cache dates are picked? Are they triggered by a change?

Thanks

MattMattMattUK · 29 January 2023 at 10:50AM

Undervalued said:

I wonder if anybody can help please.....

I am trying to find out when a particular page on a publicly accessible website changed (in the last couple of months). The site is cached on Wayback Machine around twenty times in the last seven years at seemingly random intervals. The most recent is late November last year, then on two consecutive days in June. The November cache is useful up to a point in that it is very different to the information that appears now. However I really need to know more accurately when it changed, which may of course have happened in stages on various days between then and now.

Can anybody suggest anywhere else I can try? Also, it would be interesting to know how these cache dates are picked? Are they triggered by a change?

Thanks

For Wayback somewhat randomly. The only way to get the data as accurately as you wish would be from the change logs on the server itself, or from Google's servers if it was data that they retained, but they do not retain old copies of most sites.

Neil_Jones · 29 January 2023 at 7:32PM

The likes of Wayback Machine respect (where available) what's called the Robots meta Tag.

https://yoast.com/robots-meta-tags/ for more information

What this basically means is back in the day if the website creator says don't come back after so many days, then the web spiders won't (although these days that's largely ignored anyway, but its still valid for older archive copies if it existed), otherwise it'll just visit it in the next major crawl.

There is a pattern to the Wayback Machine and how it works, it isn't just totally random as to how often a website gets crawled: There are so-called "Worldwide Web Crawls" that happen on occasion (which may be broken down into other crawls and happen at the same time) and this is what determines how often a website gets archived, though it can look as if its random, its more by luck than anything else.

Online cache of web site

Comments

Confirm your email address to Create Threads and Reply

🚀 Getting Started

Categories

Is this how you want to be seen?

Get our FREE Weekly email full of deals & guides - and it’s spam-free