Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
The Complete Mystery of Madeleine McCann™ :: Reference :: WaybackMachine / CEOP shows Maddie missing on 30 April
Page 22 of 34 • Share
Page 22 of 34 • 1 ... 12 ... 21, 22, 23 ... 28 ... 34
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Can you explain further, I only see dates prior to the archive date. What links show October?
piforhire1- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
@BB
[size=31]Do we at least agree that the Wayback creates or replays (as they call) it on retrieval? Can it be possible that because the 30/04 folder (at the exact time) has data from later dates that it has created something different? How do we know that there wasn't a part of the replayed file created on 30/04 and it was McCann.htm[/size]
[size=31]Do we at least agree that the Wayback creates or replays (as they call) it on retrieval? Can it be possible that because the 30/04 folder (at the exact time) has data from later dates that it has created something different? How do we know that there wasn't a part of the replayed file created on 30/04 and it was McCann.htm[/size]
HKP- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
OK, so just tell me the file name of one of the other files, that was stored with a 30 April archive date, which was actually as you say a file created later?
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
It's a pity WBM have not given a full definitive statement on why they changed their original stance. I appreciate all the analysis, but we need a full technical disclosure from them which as far as I am aware we have not had.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
A bit of an aside but this is interesting in terms of showing how it works....
The only CDX records for the madeleine_01.jpg image are below, (and can be found [You must be registered and logged in to see this link.]):
If you retrieve the 12th May 2007 18:59:49 index.asp page, the rewritten url in the source is [You must be registered and logged in to see this link.] and generates an HTTP 302 Temporary Moved error and redirects to the 6th June version above. You can see just by clicking that link and then looking at the URL you end up on, (or via F12 debug)
In other words it does look forward as well as back. Doing some experiments changing the date to different values seems to show it will take the nearest hit, whether that is forwards of backwards.
The only CDX records for the madeleine_01.jpg image are below, (and can be found [You must be registered and logged in to see this link.]):
- Code:
uk,gov,ceop)/madeleine_01.jpg 20070606200249 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 2P6MO765MAYC5Y6J6JYCOTASO2J5USWY 18296
uk,gov,ceop)/madeleine_01.jpg 20070606200249 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 2P6MO765MAYC5Y6J6JYCOTASO2J5USWY 18296
uk,gov,ceop)/madeleine_01.jpg 20070703080810 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 2P6MO765MAYC5Y6J6JYCOTASO2J5USWY 18299
uk,gov,ceop)/madeleine_01.jpg 20070703080810 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 2P6MO765MAYC5Y6J6JYCOTASO2J5USWY 18299
uk,gov,ceop)/madeleine_01.jpg 20070708201654 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 2P6MO765MAYC5Y6J6JYCOTASO2J5USWY 18299
uk,gov,ceop)/madeleine_01.jpg 20070708201654 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 2P6MO765MAYC5Y6J6JYCOTASO2J5USWY 18299
uk,gov,ceop)/madeleine_01.jpg 20070809213612 http://www.ceop.gov.uk/madeleine_01.jpg image/jpeg 200 RIQCKVKUYFSV4BPLJG2RIPMQGGRBJRYX 18319
uk,gov,ceop)/madeleine_01.jpg 20110520141240 http://www.ceop.gov.uk/madeleine_01.jpg text/html 301 LKMVBZPE66B2OIJ6JAOWSIDE73IGL524 548
uk,gov,ceop)/madeleine_01.jpg 20110812142113 http://www.ceop.gov.uk/madeleine_01.jpg text/html 301 JR3HRCTNHTNZHKT4C3XFUUAW7DVKDJJS 551
If you retrieve the 12th May 2007 18:59:49 index.asp page, the rewritten url in the source is [You must be registered and logged in to see this link.] and generates an HTTP 302 Temporary Moved error and redirects to the 6th June version above. You can see just by clicking that link and then looking at the URL you end up on, (or via F12 debug)
In other words it does look forward as well as back. Doing some experiments changing the date to different values seems to show it will take the nearest hit, whether that is forwards of backwards.
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
portia question
Sorry, wrong thread; should be with the techno stuff elsewhere. See how clumsy I am!
Hello, from someone who doesn't know IT from the sun on her head:
Can someone kindly explain how information from October 2007 could end up in a/o be hauled back to a folder (?) dated April 30/4 2007,
if that folder did not exist on April 30th 2007?
Or if such a folder with that date was not created by someone at some moment?
So is not the question: who created that folder?
And who has been fiddling with it once we got onto its trail?
Hello, from someone who doesn't know IT from the sun on her head:
Can someone kindly explain how information from October 2007 could end up in a/o be hauled back to a folder (?) dated April 30/4 2007,
if that folder did not exist on April 30th 2007?
Or if such a folder with that date was not created by someone at some moment?
So is not the question: who created that folder?
And who has been fiddling with it once we got onto its trail?
Guest- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
I had not seen this link
[You must be registered and logged in to see this link.]
Can rustyjames just explain exactly that this query is doing and what the data reveals that is returned.
I am quite happy to be shown there is a bug in their system, it one exists. I just need to be shown it.
thanks.
[You must be registered and logged in to see this link.]
Can rustyjames just explain exactly that this query is doing and what the data reveals that is returned.
I am quite happy to be shown there is a bug in their system, it one exists. I just need to be shown it.
thanks.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
'Resistor' in the other place has made this caustic observation:
++++++++++++++
QUOTE
So let's go with "crawls from October are time stamped as April" for the moment (I'm still not convinced but let's go with it for the moment).
That's the whole bottom fallen out of Wayback's reputation, right there
They will be able to tell all the owners of the other sites affected, won't they?
They will be able to provide other examples
They will have to issue a major bug fix and update, because they can't risk it happening again, can they?
And they will have to make details of that bug fix public on their site so people know it's been fixed
Let's just wait and see...
++++++++++++++
QUOTE
So let's go with "crawls from October are time stamped as April" for the moment (I'm still not convinced but let's go with it for the moment).
That's the whole bottom fallen out of Wayback's reputation, right there
They will be able to tell all the owners of the other sites affected, won't they?
They will be able to provide other examples
They will have to issue a major bug fix and update, because they can't risk it happening again, can they?
And they will have to make details of that bug fix public on their site so people know it's been fixed
Let's just wait and see...
____________________
Dr Martin Roberts: "The evidence is that these are the pjyamas Madeleine wore on holiday in Praia da Luz. They were photographed and the photo handed to a press agency, who released it on 8 May, as the search for Madeleine continued. The McCanns held up these same pyjamas at two press conferences on 5 & 7June 2007. How could Madeleine have been abducted?"
Amelie McCann (aged 2): "Maddie's jammies!".
Tony Bennett- Investigator
- Posts : 16926
Activity : 24792
Likes received : 3749
Join date : 2009-11-25
Age : 77
Location : Shropshire
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Richard D. Hall wrote:I had not seen this link
[You must be registered and logged in to see this link.]
Can rustyjames just explain exactly that this query is doing and what the data reveals that is returned.
thanks.
My understanding is it is querying the index entries that exist for [You must be registered and logged in to see this link.] for 30th April 2007. The existence in there of news articles dated in the future, which matches BlueBag's versions of the page is what seems odd, and seems to suggest there is a problem with that index. (Some longer descriptions a few pages back).
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Re: TB quoting Resistor:
If that's the case, what would be the American Bar of Attorneys take on this matter?
They would be beside themselves with glee, wouldn't they, especially the ones who saw their clients incarcerated based on WBM evidence!
If that's the case, what would be the American Bar of Attorneys take on this matter?
They would be beside themselves with glee, wouldn't they, especially the ones who saw their clients incarcerated based on WBM evidence!
Guest- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
OK I just ran a test.
One of the files listed in the results as a "later" file is
news_items/article_20070625_ceop.htm
I tried searching for this file, and the date in the source code is
FILE ARCHIVED ON 16:42:45 Jun 20, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:31:07 Jun 21, 2015.
JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE
Sorry this is my error it should read
FILE ARCHIVED ON 16:43:44 Jun 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:47:32 Jun 21, 2015.
So even though files listed in this query seem to have 30 April tag, they still return a later - correct - archive date.
One of the files listed in the results as a "later" file is
news_items/article_20070625_ceop.htm
I tried searching for this file, and the date in the source code is
FILE ARCHIVED ON 16:42:45 Jun 20, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:31:07 Jun 21, 2015.
JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE
Sorry this is my error it should read
FILE ARCHIVED ON 16:43:44 Jun 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:47:32 Jun 21, 2015.
So even though files listed in this query seem to have 30 April tag, they still return a later - correct - archive date.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Another news file from this list
news_items/article_20071116_ceop.htm
FILE ARCHIVED ON 18:20:49 Dec 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:38:43 Jun 21, 2015.
news_items/article_20071116_ceop.htm
FILE ARCHIVED ON 18:20:49 Dec 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:38:43 Jun 21, 2015.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
I get:
which is correct ignoring the 30th April.
- Code:
<!--
FILE ARCHIVED ON 16:43:44 Jun 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:37:33 Jun 21, 2015.
JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
SECTION 108(a)(3)).
-->
which is correct ignoring the 30th April.
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
My apologies but I've got a busy couple of days at work coming up, and am then away with little/no Internet from Wednesday to Monday, so as much as I don't want to, I'm going to have to stop following this thread until I return, especially with regard to posting.
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Sorry my mistake, you are right,
news_items/article_20070625_ceop.htm
FILE ARCHIVED ON 16:43:44 Jun 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:47:32 Jun 21, 2015.
System looks like it is working fine. Even though some of these later pages appear with the 30 April index in rustyjames query, they still return the correct archive date.
news_items/article_20070625_ceop.htm
FILE ARCHIVED ON 16:43:44 Jun 30, 2007 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 22:47:32 Jun 21, 2015.
System looks like it is working fine. Even though some of these later pages appear with the 30 April index in rustyjames query, they still return the correct archive date.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Like I said I am happy to be shown there is a bug in the system, but I can't see one. Their system seems to return accurate archive dates. If somebody can give me the name of a file which can be shown to have been on CEOP server on a particular date (eg a date encoded news file) which then gives an OLDER archive date in the "FILE ARCHIVED" section of the returned WBM file, I will believe there is a bug in their system.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Here's another test
[You must be registered and logged in to see this link.]
FILE ARCHIVED ON 0:53:43 Feb 6, 2008 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 23:02:07 Jun 21, 2015.
JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
[You must be registered and logged in to see this link.]
FILE ARCHIVED ON 0:53:43 Feb 6, 2008 AND RETRIEVED FROM THE
INTERNET ARCHIVE ON 23:02:07 Jun 21, 2015.
JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Can somebody give me the name of the "October" file so I can try this out and see what archive date is returned?
Thanks
Thanks
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
I've had a look at the source code for the Latest News for Jan, Feb,April and Oct and a few more (June,July,Aug,Sept) not shown.There was no March crawler date. The May 20 is the new 30th April. I noticed that on the crawler date it mentions the previous and next crawler dates. I saved the full pages but edited the bits together.
It's not showing even though I put the Word doc through Notepad.
So on the 7th April it does mention the 30th April with the 11.58.03 time stamp.
It's not showing even though I put the Word doc through Notepad.
So on the 7th April it does mention the 30th April with the 11.58.03 time stamp.
whatsupdoc- Posts : 601
Activity : 953
Likes received : 320
Join date : 2011-08-04
aiyoyo- Posts : 9610
Activity : 10084
Likes received : 326
Join date : 2009-11-28
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
If I can use this link, Aiyoyo, to explain my point... this 12 th Oct shows dates of 9th Oct to 6th Feb.
whatsupdoc- Posts : 601
Activity : 953
Likes received : 320
Join date : 2011-08-04
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Why should that be useful?
It has an archive date of
FILE ARCHIVED ON 5:18:50 Oct 12, 2007 AND RETRIEVED FROM THE
Seems fine to me
It has an archive date of
FILE ARCHIVED ON 5:18:50 Oct 12, 2007 AND RETRIEVED FROM THE
Seems fine to me
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
rustyjames pointed out that the WBM reconstructs (replays on retrieval) pages rather than snapshots them then will it reconstruct within the folder (I would have thought so). If it doesn't find the exact data it uses the closest to it (so it says in the FAQs) so.... Is the McCann.htm the master in 20070430115803 and WBM has constructed around the files in the folder? Any thoughts
HKP- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Like I said earlier, I don't know exactly how their crawler works because I have not got the source code. Even with the source code it would take days to figure out how somebody else's software functions. I was incorrect to refer to the string 20070430115803 as a "folder", but when it is expressed as [You must be registered and logged in to see this link.] it looks like a folder. But to use a more generic language, when a file is archived, it becomes associated with a string of numbers which is derived from the real time clock the first time the file is identified. This string of digits remains associated with that file always. In the case of mccann.html, the string 20070430115803 is associated with it. People have been trying to work out exactly how the files are related to these indexes, and how their system functions. Every file that has been queried as far as I have seen returns an archive date ON or AFTER the date the file came into existence.
Some of the screenshots showing the date of October news items on a 30 Apr archived page are curious. If the news items are not dynamic, then this is strange. Am I right in saying that I cannot now produce this screenshot myself because their system has been changed? If so, two things come to mind. Are the screenshots genuine? - I would imagine so because we have two independent posters - secondly are we saying that in the history of WBM the only time it screws up is around the exact date of one of the most controversial news stories of all time? Also that it screwed up for a very short window of time, then started functioning perfectly again up till now?
Some of the screenshots showing the date of October news items on a 30 Apr archived page are curious. If the news items are not dynamic, then this is strange. Am I right in saying that I cannot now produce this screenshot myself because their system has been changed? If so, two things come to mind. Are the screenshots genuine? - I would imagine so because we have two independent posters - secondly are we saying that in the history of WBM the only time it screws up is around the exact date of one of the most controversial news stories of all time? Also that it screwed up for a very short window of time, then started functioning perfectly again up till now?
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
sallypelt wrote:[You must be registered and logged in to see this image.] Mimi Today at 9:13 pm
At about 2:00 mins the founder of Waybackmachine says they respond to people wanting to take stuff off.
[You must be registered and logged in to see this link.]
Mimi posted this on the other forum a few minutes ago. I hope she doesn't mind me posting it here. Apologies to Mimi for hijacking her post
So he will take pages off at request and then goes on the emphasize the importance of keeping past web pages. It seems they already had a huge credibility problem in 2011.
____________________
"And if Madeleine had hurt herself inside the apartment, why would that be our fault?" Gerry
[You must be registered and logged in to see this link.]
[You must be registered and logged in to see this link.]
lj- Posts : 3329
Activity : 3590
Likes received : 208
Join date : 2009-12-01
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Just some info. on how WBM works and lots of info. on it.
Where is the rest of the archived site? Why am I getting broken or gray images on a site?
Broken images (when there is a small red "x" where the image should be) occur when the images are not available on our servers. Usually this means that we did not archive them. Gray images are the result of robots.txt exclusions. The site in question may have blocked robot access to their images directory.
You can tell if the link you are looking for is in the Wayback Machine by entering the url into the Wayback Machine search box at archive.org (http://www.archive.org/web/web.php ). Whatever archives we have are viewable in the Wayback Machine.
The archived webpages are meant to be a "snap shot" of past Internet sites. Please note that while we try to archive an entire site, this is not always possible. That is why some images or links might be missing. Additionally some sites do not archive well and we cannot fix that. There is a list of common problems that make a site difficult to archive: [You must be registered and logged in to see this link.]
If you see a box with a red X or a broken image icon that means that we unfortunately do not have the images. Files over 10MB are not archived in this "snap shot" of the website.
The best way to see all the files we have archived of the site is: [You must be registered and logged in to see this link.]
Please note that there is a 6 - 14 month lag time between the date a site is crawled and the date it appears in the Wayback Machine.
Can I link to old pages on the Wayback Machine?
Yes! The Wayback Machine is built so that it can be used and referenced. If you find an archived page that you would like to reference on your Web page or in an article, you can copy the URL. You can even use fuzzy URL matching and date specification... but that's a bit more advanced.
Why isn't the site I'm looking for in the archive?
Some sites may not be included because the automated crawlers were unaware of their existence at the time of the crawl. It's also possible that some sites were not archived because they were password protected, blocked by robots.txt, or otherwise inaccessible to our automated systems. Siteowners might have also requested that their sites be excluded from the Wayback Machine. When this has occurred, you will see a "blocked site error" message. When a site is excluded because of robots.txt you will see a "robots.txt query exclusion error" message.
What does it mean when a site's archive data has been "updated"?
When our automated systems crawl the web every few months or so, we find that only about 50% of all pages on the web have changed from our previous visit. This means that much of the content in our archive is duplicate material. If you don't see ""*"" next to an archived document, then the content on the archived page is identical to the previously archived copy
How do you archive dynamic pages?
There are many different kinds of dynamic pages, some of which are easily stored in an archive and some of which fall apart completely. When a dynamic page renders standard html, the archive works beautifully. When a dynamic page contains forms, JavaScript, or other elements that require interaction with the originating host, the archive will not contain the original site's functionality.
Some sites are not available because of robots.txt or other exclusions. What does that mean?
The Internet Archive follows the [You must be registered and logged in to see this link.] for Managing Removal Requests And Preserving Archival Integrity
The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions anyway. However, Alexa Internet, the company that crawls the web for the Internet Archive, does respect robots.txt instructions, and even does so retroactively. If a web site owner decides he / she prefers not to have a web crawler visiting his / her files and sets up robots.txt on the site, the Alexa crawlers will stop visiting those files and will make unavailable all files previously gathered from that site. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due to robots.txt (you will see a "robots.txt query exclusion error" message). Sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site, and we endeavor to comply with these requests. When you come accross a "blocked site error" message, that means that a siteowner has made such a request and it has been honored.
Currently there is no way to exclude only a portion of a site, or to exclude archiving a site for a particular time period only.
When a URL has been excluded at direct owner request from being archived, that exclusion is retroactive and permanent.
Why are some sites harder to archive than others?
If you look at our collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren't archived at all. Here are some things that make it difficult to archive a web site:
As a general rule of thumb, simple html is the easiest to archive.
What type of machinery is used in this Internet Archive?
A few highlights from the Petabox storage system:
As of December 1, 2014 -
Density: 1.4 PetaBytes / rack
Power consumption: 3 KW / PetaByte
No Air Conditioning, instead use excess heat to help heat the building.
Raw Numbers as of August 2014:
• 4 data centers, 550 nodes, 20,000 spinning disks
• Wayback Machine: 9.6 PetaBytes
• Books/Music/Video Collections: 9.8 PetaBytes
• Unique data: 20 PetaBytes
• Total used storage: 50 PetaBytes
For more information go to [You must be registered and logged in to see this link.].
Do you collect all the sites on the Web?
No, we collect only publicly accessible Web pages. We do not archive pages that require a password to access, pages tagged for "robot exclusion" by their owners, pages that are only accessible when a person types into and sends a form, or pages on secure servers. If a site owner properly requests removal of a Web site through [You must be registered and logged in to see this link.], we will exclude that site from the Wayback Machine.
[You must be registered and logged in to see this link.]
Where is the rest of the archived site? Why am I getting broken or gray images on a site?
Broken images (when there is a small red "x" where the image should be) occur when the images are not available on our servers. Usually this means that we did not archive them. Gray images are the result of robots.txt exclusions. The site in question may have blocked robot access to their images directory.
You can tell if the link you are looking for is in the Wayback Machine by entering the url into the Wayback Machine search box at archive.org (http://www.archive.org/web/web.php ). Whatever archives we have are viewable in the Wayback Machine.
The archived webpages are meant to be a "snap shot" of past Internet sites. Please note that while we try to archive an entire site, this is not always possible. That is why some images or links might be missing. Additionally some sites do not archive well and we cannot fix that. There is a list of common problems that make a site difficult to archive: [You must be registered and logged in to see this link.]
If you see a box with a red X or a broken image icon that means that we unfortunately do not have the images. Files over 10MB are not archived in this "snap shot" of the website.
The best way to see all the files we have archived of the site is: [You must be registered and logged in to see this link.]
Please note that there is a 6 - 14 month lag time between the date a site is crawled and the date it appears in the Wayback Machine.
Can I link to old pages on the Wayback Machine?
Yes! The Wayback Machine is built so that it can be used and referenced. If you find an archived page that you would like to reference on your Web page or in an article, you can copy the URL. You can even use fuzzy URL matching and date specification... but that's a bit more advanced.
Why isn't the site I'm looking for in the archive?
Some sites may not be included because the automated crawlers were unaware of their existence at the time of the crawl. It's also possible that some sites were not archived because they were password protected, blocked by robots.txt, or otherwise inaccessible to our automated systems. Siteowners might have also requested that their sites be excluded from the Wayback Machine. When this has occurred, you will see a "blocked site error" message. When a site is excluded because of robots.txt you will see a "robots.txt query exclusion error" message.
What does it mean when a site's archive data has been "updated"?
When our automated systems crawl the web every few months or so, we find that only about 50% of all pages on the web have changed from our previous visit. This means that much of the content in our archive is duplicate material. If you don't see ""*"" next to an archived document, then the content on the archived page is identical to the previously archived copy
How do you archive dynamic pages?
There are many different kinds of dynamic pages, some of which are easily stored in an archive and some of which fall apart completely. When a dynamic page renders standard html, the archive works beautifully. When a dynamic page contains forms, JavaScript, or other elements that require interaction with the originating host, the archive will not contain the original site's functionality.
Some sites are not available because of robots.txt or other exclusions. What does that mean?
The Internet Archive follows the [You must be registered and logged in to see this link.] for Managing Removal Requests And Preserving Archival Integrity
The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions anyway. However, Alexa Internet, the company that crawls the web for the Internet Archive, does respect robots.txt instructions, and even does so retroactively. If a web site owner decides he / she prefers not to have a web crawler visiting his / her files and sets up robots.txt on the site, the Alexa crawlers will stop visiting those files and will make unavailable all files previously gathered from that site. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due to robots.txt (you will see a "robots.txt query exclusion error" message). Sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site, and we endeavor to comply with these requests. When you come accross a "blocked site error" message, that means that a siteowner has made such a request and it has been honored.
Currently there is no way to exclude only a portion of a site, or to exclude archiving a site for a particular time period only.
When a URL has been excluded at direct owner request from being archived, that exclusion is retroactive and permanent.
Why are some sites harder to archive than others?
If you look at our collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren't archived at all. Here are some things that make it difficult to archive a web site:
- Robots.txt -- We respect robot exclusion headers.
- Javascript -- Javascript elements are often hard to archive, but especially if they generate links without having the full name in the page. Plus, if javascript needs to contact the originating server in order to work, it will fail when archived.
- Server side image maps -- Like any functionality on the web, if it needs to contact the originating server in order to work, it will fail when archived.
- Unknown sites -- The archive contains crawls of the Web completed by Alexa Internet. If Alexa doesn't know about your site, it won't be archived. Use the Alexa Toolbar (available at [You must be registered and logged in to see this link.]), and it will know about your page. Or you can visit Alexa's Archive Your Site page at [You must be registered and logged in to see this link.].
- Orphan pages -- If there are no links to your pages, the robot won't find it (the robots don't enter queries in search boxes.)
As a general rule of thumb, simple html is the easiest to archive.
What type of machinery is used in this Internet Archive?
A few highlights from the Petabox storage system:
As of December 1, 2014 -
Density: 1.4 PetaBytes / rack
Power consumption: 3 KW / PetaByte
No Air Conditioning, instead use excess heat to help heat the building.
Raw Numbers as of August 2014:
• 4 data centers, 550 nodes, 20,000 spinning disks
• Wayback Machine: 9.6 PetaBytes
• Books/Music/Video Collections: 9.8 PetaBytes
• Unique data: 20 PetaBytes
• Total used storage: 50 PetaBytes
For more information go to [You must be registered and logged in to see this link.].
Do you collect all the sites on the Web?
No, we collect only publicly accessible Web pages. We do not archive pages that require a password to access, pages tagged for "robot exclusion" by their owners, pages that are only accessible when a person types into and sends a form, or pages on secure servers. If a site owner properly requests removal of a Web site through [You must be registered and logged in to see this link.], we will exclude that site from the Wayback Machine.
[You must be registered and logged in to see this link.]
Joss- Posts : 1960
Activity : 2154
Likes received : 196
Join date : 2011-09-19
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
'Resistor' in the other place is still, well, resisting...
=====================================
Resistor Today at 1:00 am
Err... I have just come home! Believe it or not, I do have a life outside this forum!
I really wish I could waltz in and give you all a sparkling explanation that explains it all to everyone's satisfaction, but I can't, sorry [You must be registered and logged in to see this image.] I have zero explanation for stuff being dated October in a file, ostensibly created in April, but if it's not as a result of some dynamic content, then yes. It is an anomaly and has to be explained.
We know that something was created on 30 April 2007 when the CEOP site was crawled, because pages were saved and an index (folder) created for it. I have no reason to doubt that date at all. 30 April 2007 at 11:58:03. For reasons I have already given (about 30 pages back now!) I trust server timers unless there is a very good reason for them to be "wrong".
And when they do go wrong, they go spectacularly wrong. It would be saying something like 1 January 1900 if it was really, really wrong.
So now the school of thought seems to be is that the site was again crawled sometime after late October 2007. In actual fact the first date after 27/10/07 is 6/2/08. Their Javascript that automatically appends confirms this. That seems an awfully long gap, looking at the other captures in the calendar, so is there a missing capture in the period from 12/10/07 to 6/2/08? If so, that capture wasn't saved in it's own folder during that period, it was diverted - by presumably some sort of software pointer error - to the April folder, where it went in with the April files that were already there.
(ETA - why April 30, and not Oct 12, the nearest one to it?)
If you try, in Windows, to move a file into a folder, and there is already a file there of the same name, it asks you what you want to do. If you choose to save the newer one, it overwrites the old one. I don't know what sort of server Wayback sits on, but I do have some experience of UNIX and APACHE and if you try to replace a file, it just overwrites it, with no warnings or dialogs. I can well imagine that the October-to-February version was saved to the April index and just overwrote the April version, but why it would do that in the first place, I have no idea.
So this now leaves us with when mccann.html was actually created and crawled. Was it there in April and went in the folder originally? Or was it only there in October-to-February and went in with the April stuff as the same time as the later homepage?
The bit that is gnawing at me is the lack of the second photograph. Madeleine_02.jpg, in the April version. It should have been there on the April page, but it wasn't, because the webpage showed a broken link. So if it was an October-to-February page was saved into the April index, the whole thing was not saved, because then we would not be minus a photo. Later versions have both photos. So clearly the saving process was not exactly the same in all cases.
Now a couple of nights ago, HKP very helpfully found some stuff in Wayback's own FAQ that tells us how the pages are replayed when they have only saved part of them. They try to reconstruct it as best they can from the next nearest version. For mccann.html that would have been May 13, which has both photos, but not the little flags that appear in even later versions. The May version tells us in the appended Javascript that the previous capture was April 30.
[You must be registered and logged in to see this image.]
If there wasn't a version of mccann.html in the folder on 30 April (as it was only added at some later point October-February) then how did a file created on 13 May manage to find it, to add it in as a previous version? Because the 13 May capture should have been the first one.
Sorry, it's late now and I am probably not expressing myself very clearly. I also need to be up early in the morning. I'll give this some more thought over the next couple of days, and email Wayback again, as so far they have not responded to any of my queries. I suspect it's the same for us all.
=====================================
Resistor Today at 1:00 am
Err... I have just come home! Believe it or not, I do have a life outside this forum!
I really wish I could waltz in and give you all a sparkling explanation that explains it all to everyone's satisfaction, but I can't, sorry [You must be registered and logged in to see this image.] I have zero explanation for stuff being dated October in a file, ostensibly created in April, but if it's not as a result of some dynamic content, then yes. It is an anomaly and has to be explained.
We know that something was created on 30 April 2007 when the CEOP site was crawled, because pages were saved and an index (folder) created for it. I have no reason to doubt that date at all. 30 April 2007 at 11:58:03. For reasons I have already given (about 30 pages back now!) I trust server timers unless there is a very good reason for them to be "wrong".
And when they do go wrong, they go spectacularly wrong. It would be saying something like 1 January 1900 if it was really, really wrong.
So now the school of thought seems to be is that the site was again crawled sometime after late October 2007. In actual fact the first date after 27/10/07 is 6/2/08. Their Javascript that automatically appends confirms this. That seems an awfully long gap, looking at the other captures in the calendar, so is there a missing capture in the period from 12/10/07 to 6/2/08? If so, that capture wasn't saved in it's own folder during that period, it was diverted - by presumably some sort of software pointer error - to the April folder, where it went in with the April files that were already there.
(ETA - why April 30, and not Oct 12, the nearest one to it?)
If you try, in Windows, to move a file into a folder, and there is already a file there of the same name, it asks you what you want to do. If you choose to save the newer one, it overwrites the old one. I don't know what sort of server Wayback sits on, but I do have some experience of UNIX and APACHE and if you try to replace a file, it just overwrites it, with no warnings or dialogs. I can well imagine that the October-to-February version was saved to the April index and just overwrote the April version, but why it would do that in the first place, I have no idea.
So this now leaves us with when mccann.html was actually created and crawled. Was it there in April and went in the folder originally? Or was it only there in October-to-February and went in with the April stuff as the same time as the later homepage?
The bit that is gnawing at me is the lack of the second photograph. Madeleine_02.jpg, in the April version. It should have been there on the April page, but it wasn't, because the webpage showed a broken link. So if it was an October-to-February page was saved into the April index, the whole thing was not saved, because then we would not be minus a photo. Later versions have both photos. So clearly the saving process was not exactly the same in all cases.
Now a couple of nights ago, HKP very helpfully found some stuff in Wayback's own FAQ that tells us how the pages are replayed when they have only saved part of them. They try to reconstruct it as best they can from the next nearest version. For mccann.html that would have been May 13, which has both photos, but not the little flags that appear in even later versions. The May version tells us in the appended Javascript that the previous capture was April 30.
[You must be registered and logged in to see this image.]
If there wasn't a version of mccann.html in the folder on 30 April (as it was only added at some later point October-February) then how did a file created on 13 May manage to find it, to add it in as a previous version? Because the 13 May capture should have been the first one.
Sorry, it's late now and I am probably not expressing myself very clearly. I also need to be up early in the morning. I'll give this some more thought over the next couple of days, and email Wayback again, as so far they have not responded to any of my queries. I suspect it's the same for us all.
____________________
Dr Martin Roberts: "The evidence is that these are the pjyamas Madeleine wore on holiday in Praia da Luz. They were photographed and the photo handed to a press agency, who released it on 8 May, as the search for Madeleine continued. The McCanns held up these same pyjamas at two press conferences on 5 & 7June 2007. How could Madeleine have been abducted?"
Amelie McCann (aged 2): "Maddie's jammies!".
Tony Bennett- Investigator
- Posts : 16926
Activity : 24792
Likes received : 3749
Join date : 2009-11-25
Age : 77
Location : Shropshire
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
There are TWO types of dates in the WBM.Portia wrote:Re: TB quoting Resistor:
If that's the case, what would be the American Bar of Attorneys take on this matter?
They would be beside themselves with glee, wouldn't they, especially the ones who saw their clients incarcerated based on WBM evidence!
1) A WBM assigned archived time stamp... this looks sometimes broken to me.
2) Dates in the content of the actual archived web page itself... forum post date/time, newspaper article dates, latest news dates etc... These are self certifying.
It is the second category that would still be used in court because the first can be argued as unreliable.
Guest- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
The bug was pointed out on page 2 of this thread Richard.Richard D. Hall wrote:Like I said I am happy to be shown there is a bug in the system, but I can't see one.
There was absolutely an October 2007 page indexed as 30th April 2007 11:58:03
There are screenshots of it before the 30th April folder was removed.
Many people saw it.
It doesn't matter what the WBM machine is doing now... it's what it was doing in 2007 that matters.
Guest- Guest
Re: Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
Also from the second page of this thread.whatsupdoc wrote:macdonut wrote:This is a red herring I think guys. If you look at the full ceop page as allegedly archived on 30th April:
[You must be registered and logged in to see this link.]
You'll see quite a number of news stories and links that are, in fact, dated in October 2007.
While I don't profess to understand how the web archive works, it clearly isn't accurate, at least on this occasion.
Agreed , macdonut. So we have two versions for the 30th April. The version I found did have references to October 2007 so at least one entry on 30th April was incorrect if not both.
I noticed the html code in Doug D post was a comment and the date could have been edited in and captured.
The link to the page is given and many people tried it to verify it.
Guest- Guest
Page 22 of 34 • 1 ... 12 ... 21, 22, 23 ... 28 ... 34
Similar topics
» The McCanns family trip to Sagres 30th April
» Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
» Madeleine: The Last Hope? - Panorama UPDATED 7.30 25th April (only certain areas) and 8.30 pm Mon 30th April 2012
» 'Look for her here' Missing-person hunter weighs in on Maddie sightings worldwide THERE’S one place in the Maddie case the cops need to reexamine, according to an expert on missing people.
» Sun 25th April - Madeleine McCann’s parents Kate and Gerry reveal heartache at missing Maddie as 10th anniversary approaches and brands it ‘a horrible marker of stolen time’
» Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
» Madeleine: The Last Hope? - Panorama UPDATED 7.30 25th April (only certain areas) and 8.30 pm Mon 30th April 2012
» 'Look for her here' Missing-person hunter weighs in on Maddie sightings worldwide THERE’S one place in the Maddie case the cops need to reexamine, according to an expert on missing people.
» Sun 25th April - Madeleine McCann’s parents Kate and Gerry reveal heartache at missing Maddie as 10th anniversary approaches and brands it ‘a horrible marker of stolen time’
The Complete Mystery of Madeleine McCann™ :: Reference :: WaybackMachine / CEOP shows Maddie missing on 30 April
Page 22 of 34
Permissions in this forum:
You cannot reply to topics in this forum