📨 Have you signed up to the Forum's new Email Digest yet? Get a selection of trending threads sent straight to your inbox daily, weekly or monthly!

Online cache of web site

Options
Undervalued
Undervalued Posts: 9,594 Forumite
Part of the Furniture 1,000 Posts Name Dropper
edited 29 January 2023 at 10:39AM in Techie Stuff
I wonder if anybody can help please.....

I am trying to find out when a particular page on a publicly accessible website changed (in the last couple of months). The site is cached on Wayback Machine around twenty times in the last seven years at seemingly random intervals. The most recent is late November last year, then on two consecutive days in June. The November cache is useful up to a point in that it is very different to the information that appears now. However I really need to know more accurately when it changed, which may of course have happened in stages on various days between then and now.

Can anybody suggest anywhere else I can try? Also, it would be interesting to know how these cache dates are picked? Are they triggered by a change?

Thanks

Comments

  • I wonder if anybody can help please.....

    I am trying to find out when a particular page on a publicly accessible website changed (in the last couple of months). The site is cached on Wayback Machine around twenty times in the last seven years at seemingly random intervals. The most recent is late November last year, then on two consecutive days in June. The November cache is useful up to a point in that it is very different to the information that appears now. However I really need to know more accurately when it changed, which may of course have happened in stages on various days between then and now.

    Can anybody suggest anywhere else I can try? Also, it would be interesting to know how these cache dates are picked? Are they triggered by a change?

    Thanks
    For Wayback somewhat randomly. The only way to get the data as accurately as you wish would be from the change logs on the server itself, or from Google's servers if it was data that they retained, but they do not retain old copies of most sites.
  • Neil_Jones
    Neil_Jones Posts: 9,562 Forumite
    Part of the Furniture 1,000 Posts Name Dropper
    The likes of Wayback Machine respect (where available) what's called the Robots meta Tag.

    What this basically means is back in the day if the website creator says don't come back after so many days, then the web spiders won't (although these days that's largely ignored anyway, but its still valid for older archive copies if it existed), otherwise it'll just visit it in the next major crawl.

    There is a pattern to the Wayback Machine and how it works, it isn't just totally random as to how often a website gets crawled:  There are so-called "Worldwide Web Crawls" that happen on occasion (which may be broken down into other crawls and happen at the same time) and this is what determines how often a website gets archived, though it can look as if its random, its more by luck than anything else.
Meet your Ambassadors

🚀 Getting Started

Hi new member!

Our Getting Started Guide will help you get the most out of the Forum

Categories

  • All Categories
  • 351.1K Banking & Borrowing
  • 253.2K Reduce Debt & Boost Income
  • 453.6K Spending & Discounts
  • 244.1K Work, Benefits & Business
  • 599.1K Mortgages, Homes & Bills
  • 177K Life & Family
  • 257.5K Travel & Transport
  • 1.5M Hobbies & Leisure
  • 16.1K Discuss & Feedback
  • 37.6K Read-Only Boards

Is this how you want to be seen?

We see you are using a default avatar. It takes only a few seconds to pick a picture.