Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
The Complete Mystery of Madeleine McCann™ :: Reference :: WaybackMachine / CEOP shows Maddie missing on 30 April
Page 26 of 33 • Share
Page 26 of 33 • 1 ... 14 ... 25, 26, 27 ... 29 ... 33
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
HKP wrote:
Well just when we thought it was safe to go back in the water, splash!!!!!!
2007 certainly was a busy year for wayback / ceop 65% of all captures since 2005 were completed in 2007.
Anybody notice anything with these monthly captures? What happened in September I wonder!!!!!!
Jan 731
Feb 526
Mar 238
Apr 3936
May 461
Jun 698
July 689
Aug 1101
Sept 3657
Oct 896
Nov 40
Dec 163
What was that saying again about coincidences Kate?
If someone would do me the honour of posting on cmomm I'd be mighty greatful [You must be registered and logged in to see this image.]
Well just when we thought it was safe to go back in the water, splash!!!!!!
2007 certainly was a busy year for wayback / ceop 65% of all captures since 2005 were completed in 2007.
Anybody notice anything with these monthly captures? What happened in September I wonder!!!!!!
Jan 731
Feb 526
Mar 238
Apr 3936
May 461
Jun 698
July 689
Aug 1101
Sept 3657
Oct 896
Nov 40
Dec 163
What was that saying again about coincidences Kate?
If someone would do me the honour of posting on cmomm I'd be mighty greatful [You must be registered and logged in to see this image.]
Mo- Posts : 76
Activity : 82
Likes received : 2
Join date : 2014-07-25
Age : 69
Location : Nottinghamshire
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
whodunit wrote:CEOP homepage capture April 27, 2007
[You must be registered and logged in to see this image.]
mccann.html capture dated May 13, 2007
[You must be registered and logged in to see this image.]
CEOP homepage capture May 14, 2007.^^^^
I'm not really sure what goes on with the index, but if I had to guess---perfectly permissible since Nuala is also guessing---I'd say that WBM tampering with and 're-indexing' the captures on this specific date over the last couple of weeks has caused it to go haywire.
What is not at issue, the thing that has not changed since this whole thing started is the embedded coding. If you dig around Steve Marsden's posts at FB you can find his downloaded coding for the original April 30 page that ignited this controversy. At the moment I cannot find it, but within that coding, under the 'Next/Previous Capture' heading you will find this text "You are Here: 11:58:03 April 30, 2007".
Now look at the caps I made of the same codes embedded in the pages that remain after the great re-shuffling. Both for the CEOP homepage and for mccann.html, April 30, 2007 is sitting right where you would expect to find it if the capture is true and correct. It is indeed the NEXT homepage capture after April 27 and the PREVIOUS capture to May 13. As for mccann.html, we find April 30, 2007 as the capture PREVIOUS to the extant May 14 capture.
If the April 30 capture is out of place in the contiguous sequences of captures you would expect to find evidence of this in the coding. We do not.
whodunit- Posts : 467
Activity : 913
Likes received : 448
Join date : 2015-02-08
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
I have said a number of times that I feel this could very well have been intended as a huge distraction - and it worked. Took us away from Amaral's fund and appeal and whatever else.Angelique wrote:Hi jozijozi wrote:I too feel like you so am following and not contributing as I am not techi either.....what I would like to know though if it is a glitch then why has it captured it on the 30th April (How can it put it into a file before it was ever on the net in the first place )???Angelique wrote:I know absolutely nothing about IT I can't even run my Mac without help so this is only my opinion having read all the threads and "onlyinamerica" and the posts by Dr. Roberts who I have to say convinces me that something is wrong, but we don't know for sure what is wrong.
Is it possible that all the articles dated incorrectly were placed with the CEOP page to obscure/hide/camouflage it?
As far as I can tell once captured by the WBM it's there forever but if China can hack the Pentagon 14 times then anything can be hacked.
ETA Just to clarify I mean "as in if all those captures are wrong then so is the CEOP page for Missing Madeleine"
What if somebody put the wrong date on the file knowing it would cause this or am I being realy blonde here ?!!!!!
No I don't believe you are "being blonde" - I think this is also a possibility. In fact, I think this is what Textusa thinks too!
Team Mc Cann are well known for using distractions / red herrings / confusion
HelenMeg- Posts : 1782
Activity : 2081
Likes received : 213
Join date : 2014-01-08
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Neither Steve Marsden or myself are connected to Team Mc Cann . Steve made this discovery when he was researching something else. .To suggest otherwise is in Clarrie's words ludicrous and unhelpful. I am sure Jill Havern will vouch for Steve as she has done elsewhere on this thread.HelenMeg wrote:I have said a number of times that I feel this could very well have been intended as a huge distraction - and it worked. Took us away from Amaral's fund and appeal and whatever else.Angelique wrote:Hi jozijozi wrote:I too feel like you so am following and not contributing as I am not techi either.....what I would like to know though if it is a glitch then why has it captured it on the 30th April (How can it put it into a file before it was ever on the net in the first place )???Angelique wrote:I know absolutely nothing about IT I can't even run my Mac without help so this is only my opinion having read all the threads and "onlyinamerica" and the posts by Dr. Roberts who I have to say convinces me that something is wrong, but we don't know for sure what is wrong.
Is it possible that all the articles dated incorrectly were placed with the CEOP page to obscure/hide/camouflage it?
As far as I can tell once captured by the WBM it's there forever but if China can hack the Pentagon 14 times then anything can be hacked.
ETA Just to clarify I mean "as in if all those captures are wrong then so is the CEOP page for Missing Madeleine"
What if somebody put the wrong date on the file knowing it would cause this or am I being realy blonde here ?!!!!!
No I don't believe you are "being blonde" - I think this is also a possibility. In fact, I think this is what Textusa thinks too!
Team Mc Cann are well known for using distractions / red herrings / confusion
cloak'ndagger- Posts : 118
Activity : 133
Likes received : 3
Join date : 2014-08-06
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
And if you 'thought' a computer/crawly thing could 'capture' an 'event/page', say, on 29 October, 2013, and 'think' 'now, WHERE will i store/archive this'?
"Ah, i know, there's a folder/archive, dated 5th March, 2001, i'll 'stick' it THERE'!
Then THINK again, people!
Computers DO, what they are TOLD, programmed, to DO!
If a 'capture' is made on 7th September, 2009, then 'that' is put in the 7th September, 2009 'archive'
WHERE, we can 'find' it.
ps: I 'think'
"Ah, i know, there's a folder/archive, dated 5th March, 2001, i'll 'stick' it THERE'!
Then THINK again, people!
Computers DO, what they are TOLD, programmed, to DO!
If a 'capture' is made on 7th September, 2009, then 'that' is put in the 7th September, 2009 'archive'
WHERE, we can 'find' it.
ps: I 'think'
jeanmonroe- Posts : 5818
Activity : 7756
Likes received : 1674
Join date : 2013-02-07
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@HelenMeg---"I have said a number of times that I feel this could very well have been intended as a huge distraction - and it worked. Took us away from Amaral's fund and appeal and whatever else.
Team Mc Cann are well known for using distractions / red herrings / confusion"
I can't agree. Any discussion of evidence which supports an alternate view of the crime, one which severely contradicts the official pronouncements of TM, can only help Amaral. Discussions like these push the consensus among the general population to become more aligned with Mr. Amaral.
Team Mc Cann are well known for using distractions / red herrings / confusion"
I can't agree. Any discussion of evidence which supports an alternate view of the crime, one which severely contradicts the official pronouncements of TM, can only help Amaral. Discussions like these push the consensus among the general population to become more aligned with Mr. Amaral.
whodunit- Posts : 467
Activity : 913
Likes received : 448
Join date : 2015-02-08
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Çloak'ndagger, are you a friend/acquainted/connected to S. Marsden? You did say recently that he would be posting here to explain a few things. Still no sign of him though.cloak'ndagger wrote:Neither Steve Marsden or myself are connected to Team Mc Cann . Steve made this discovery when he was researching something else. .To suggest otherwise is in Clarrie's words ludicrous and unhelpful. I am sure Jill Havern will vouch for Steve as she has done elsewhere on this thread.HelenMeg wrote:I have said a number of times that I feel this could very well have been intended as a huge distraction - and it worked. Took us away from Amaral's fund and appeal and whatever else.Angelique wrote:Hi jozijozi wrote:I too feel like you so am following and not contributing as I am not techi either.....what I would like to know though if it is a glitch then why has it captured it on the 30th April (How can it put it into a file before it was ever on the net in the first place )???Angelique wrote:I know absolutely nothing about IT I can't even run my Mac without help so this is only my opinion having read all the threads and "onlyinamerica" and the posts by Dr. Roberts who I have to say convinces me that something is wrong, but we don't know for sure what is wrong.
Is it possible that all the articles dated incorrectly were placed with the CEOP page to obscure/hide/camouflage it?
As far as I can tell once captured by the WBM it's there forever but if China can hack the Pentagon 14 times then anything can be hacked.
ETA Just to clarify I mean "as in if all those captures are wrong then so is the CEOP page for Missing Madeleine"
What if somebody put the wrong date on the file knowing it would cause this or am I being realy blonde here ?!!!!!
No I don't believe you are "being blonde" - I think this is also a possibility. In fact, I think this is what Textusa thinks too!
Team Mc Cann are well known for using distractions / red herrings / confusion
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@jeanmonroe--"If a 'capture' is made on 7th September, 2009, then 'that' is put in the 7th September, 2009 'archive'
WHERE, we can 'find' it. ps: I 'think"
Of course it does! Otherwise what is the point this archive?
WHERE, we can 'find' it. ps: I 'think"
Of course it does! Otherwise what is the point this archive?
whodunit- Posts : 467
Activity : 913
Likes received : 448
Join date : 2015-02-08
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Yes I am co admin with Steve. He has been too busy to post much . As far as he is concerned it is Q.E.D as regards the WBM capturing the CEOP on April30th .. I did think he would have something further to add but so far he does not appear to have. .Ladyinred wrote:Çloak'ndagger, are you a friend/acquainted/connected to S. Marsden? You did say recently that he would be posting here to explain a few things. Still no sign of him though.cloak'ndagger wrote:Neither Steve Marsden or myself are connected to Team Mc Cann . Steve made this discovery when he was researching something else. .To suggest otherwise is in Clarrie's words ludicrous and unhelpful. I am sure Jill Havern will vouch for Steve as she has done elsewhere on this thread.HelenMeg wrote:I have said a number of times that I feel this could very well have been intended as a huge distraction - and it worked. Took us away from Amaral's fund and appeal and whatever else.Angelique wrote:Hi jozijozi wrote:I too feel like you so am following and not contributing as I am not techi either.....what I would like to know though if it is a glitch then why has it captured it on the 30th April (How can it put it into a file before it was ever on the net in the first place )???Angelique wrote:I know absolutely nothing about IT I can't even run my Mac without help so this is only my opinion having read all the threads and "onlyinamerica" and the posts by Dr. Roberts who I have to say convinces me that something is wrong, but we don't know for sure what is wrong.
Is it possible that all the articles dated incorrectly were placed with the CEOP page to obscure/hide/camouflage it?
As far as I can tell once captured by the WBM it's there forever but if China can hack the Pentagon 14 times then anything can be hacked.
ETA Just to clarify I mean "as in if all those captures are wrong then so is the CEOP page for Missing Madeleine"
What if somebody put the wrong date on the file knowing it would cause this or am I being realy blonde here ?!!!!!
No I don't believe you are "being blonde" - I think this is also a possibility. In fact, I think this is what Textusa thinks too!
Team Mc Cann are well known for using distractions / red herrings / confusion
cloak'ndagger- Posts : 118
Activity : 133
Likes received : 3
Join date : 2014-08-06
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Is it over-simplifying the matter to suggest that the original search of the WBM was truthful ie there was a CEOP page captured on 30th April that showed MBM's disappearance, and that since then WBM has been busily - without any transparent explanation to suggest otherwise - obfuscating that record?
Or am I wrong?
Or am I wrong?
suzysu- Posts : 52
Activity : 83
Likes received : 25
Join date : 2014-10-06
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Original Press Release dated and uploaded by CEOP on 20070810:SixMillionQuid wrote:Sorry but why do those four links quoted take you directly to the ceop page? Where are the Wayback archived versions?Tony Bennett wrote:I follow your argument - and, speaking as a non-tecchie, if neither whodunit nor anyone else can supply a good answer to your point, I would declare:Nuala wrote:@ Tony Bennett
As a non-techie, what arguments from Whodunit persuaded you that these captures were also correct:
[You must be registered and logged in to see this link.]
Because if the mccann.html capture was correct, then those are correct as well, along with the thousands of other CEOP website examples also given the same 30 Apr 2007 date and time.
The above examples are only a tiny sample of the masses of news articles given a date of 30 Apr 2007, when the said articles hadn't even been published on that date. Note that the date of the articles is the date CEOP gave them when they published them, so 20070810 isn't a date from Wayback, it's a date from CEOP.
CEOP dated them 20070810 and when Wayback archived them it gave them a date of 30 Apr 2007.
I think you would agree that it's impossible for an article dated 20070810 and therefore not even in existence on 30 Apr 2007 to have been crawled by Wayback and correctly dated on 30 Apr 2007.
I think even a non-techie can see that.
So can you tell me what persuaded you that those news articles are in fact correctly dated as being in existence on 30 Apr 2007?
'Advantage Nuala'
[You must be registered and logged in to see this link.]
[You must be registered and logged in to see this image.]
Here are the 4 Wayback Source Directory links claiming a 20070430 archive date that Nuala was talking about:
[You must be registered and logged in to see this link.]
[You must be registered and logged in to see this link.]
[You must be registered and logged in to see this image.]
Here they are in the Wayback Calendar - 4 entries archived between 27th August and February 9th 2008
[You must be registered and logged in to see this link.]
[You must be registered and logged in to see this image.]
All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims.
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
suzysu- Posts : 52
Activity : 83
Likes received : 25
Join date : 2014-10-06
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Thank you Syn for not finding my question maddening :)Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
I accept that they're a non profit organisation, but even then, they are HUGE and, as we have been told, their data is (or has been) relied upon in court.
In order to preserve their integrity, wouldn't one expect (and have a right to expect) that if they have 'archived incorrectly' they have a duty to explain the error? If they don't, how can their data ever be relied upon in the future?
This is making a mockery of their entire raison d'etre.
suzysu- Posts : 52
Activity : 83
Likes received : 25
Join date : 2014-10-06
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Please excuse my ignorance when I ask what the debate is with regards to this Wayback site? I only joined this site yesterday and had minimal time to look through it. But, I noticed there seems to be quite a stir caused by the fact the Madeleine McCann case ended up on this Wayback site. What exactly is the issue, if any? What is Wayback? Can someone please enlighten this newcomer?
Suspicious Mind- Posts : 10
Activity : 10
Likes received : 0
Join date : 2015-07-07
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
REPLY: Wayback Machine is a vast archive which preserves for posterity actions which take place on the internet - such as the creation and alteration of websites, and individual pages on those websites.Suspicious Mind wrote:Please excuse my ignorance when I ask what the debate is with regards to this Wayback site? I only joined this site yesterday and had minimal time to look through it. But, I noticed there seems to be quite a stir caused by the fact the Madeleine McCann case ended up on this Wayback site. What exactly is the issue, if any? What is Wayback? Can someone please enlighten this newcomer?
A Brit living in the U.S. called Steve Marsden said he found a record of CEOP - the Child Exploitation and Online Protection Centre - having created a page about Madeleine McCann before 11.58am on 30 April 2007. This, if true, would suggest that something bad happened to Madeleine before then and not on 3 May 2007 as the McCanns claim.
The view espoused especially by posters Nuala and Syn on this forum is that this was an unfortunate (but as yet unspecified) 'glitch' in Wayback's system, i.e. a mistake.
____________________
Dr Martin Roberts: "The evidence is that these are the pjyamas Madeleine wore on holiday in Praia da Luz. They were photographed and the photo handed to a press agency, who released it on 8 May, as the search for Madeleine continued. The McCanns held up these same pyjamas at two press conferences on 5 & 7June 2007. How could Madeleine have been abducted?"
Amelie McCann (aged 2): "Maddie's jammies!".
Tony Bennett- Researcher
- Posts : 16906
Activity : 24770
Likes received : 3749
Join date : 2009-11-25
Age : 76
Location : Shropshire
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
suzysu wrote:Thank you Syn for not finding my question maddening :)Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
I accept that they're a non profit organisation, but even then, they are HUGE and, as we have been told, their data is (or has been) relied upon in court.
In order to preserve their integrity, wouldn't one expect (and have a right to expect) that if they have 'archived incorrectly' they have a duty to explain the error? If they don't, how can their data ever be relied upon in the future?
This is making a mockery of their entire raison d'etre.
@suzysu - they are not huge. From wikipedia there are about 200 employees, a large proportion of which are involved in book scanning. Their raison d'être is to create a digital library of cultural artifacts to try and prevent this current era becoming a digital dark age. See below from their "About" page:
Why the Archive is Building an 'Internet Library'
Libraries exist to preserve society's cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it's essential for them to extend those functions into the digital world.Many early movies were recycled to recover the silver in the film. The Library of Alexandria - an ancient center of learning containing a copy of every book in the world - was eventually burned to the ground. Even now, at the turn of the 21st century, no comprehensive archives of television or radio programs exist.
But without cultural artifacts, civilization has no memory and no mechanism to learn from its successes and failures. And paradoxically, with the explosion of the Internet, we live in what Danny Hillis has referred to as our "digital dark age."
The Internet Archive is working to prevent the Internet - a new medium with major historical significance - and other "born-digital" materials from disappearing into the past. Collaborating with institutions including the Library of Congress and the Smithsonian, we are working to preserve a record for generations to come.
Open and free access to literature and other writings has long been considered essential to education and to the maintenance of an open society. Public and philanthropic enterprises have supported it through the ages.
The Internet Archive is opening its collections to researchers, historians, and scholars. The Archive has no vested interest in the discoveries of the users of its collections, nor is it a grant-making organization.
At present, the size of our Web collection is such that using it requires programming skills. However, we are hopeful about the development of tools and methods that will give the general public easy and meaningful access to our collective history. In addition to developing our own collections, we are working to promote the formation of other Internet libraries in the United States and elsewhere.
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
Syn - I generally agree with most of your posts, but can you explain what you consider a "subset issue" to be as my view is the continued use of the terminology "subset" is a case of Chinese whispers.
For reference my take on it in response to Tony where he quoted the manual section you'd highlighted is here - [You must be registered and logged in to see this link.]
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Posted with HKP's permission from MMM:
+++++++++++++++++
QUOTE
Here’s a dilemma, in looking at the captures something else jumps out, please read the extract from Wikipedia this hopefully will be self explanatory, note the highlighting is mine.
QUOTE WIKIPEDIA
Robots Exclusion Standard
The robots exclusion standard, also known as the robots exclusion protocol or robots.txt protocol, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies the instruction format to be used to inform the robot about which areas of the website should not be processed or scanned. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. Not all robots cooperate with the standard including email harvesters, spambots and malware robots that scan for security vulnerabilities. The standard is different from, but can be used in conjunction with Sitemaps, a robot inclusion standard for websites.
When a site owner wishes to give instructions to web robots they place a text file called robots.txt in the root of the web site hierarchy (e.g. [You must be registered and logged in to see this link.] This text file contains the instructions in a specific format (see examples below). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the web site. If this file doesn't exist, web robots assume that the web owner wishes to provide no specific instructions, and crawl the entire site.
A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.
UNQUOTE WIKIPEDIA
When applying this standard to the ceop captures the results are very interesting in April 07 there was 102 robot.txt urls captured at least one for every day (obviously some days were more (75 on 25th for some reason) and others were singular like the 26th & 28th (it should be noted not every day was crawled, 15 in total including 30th). Now given what we have read above the 30th needs to be looked at.
30/04/07 No robot.txt urls captured,
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs??? Obviously there are still question marks around captures with future dates that still need to be explained.
I'd appreciate Resistor's opinion and a post onto CMOMM.
UNQUOTE
[Post re-formatted, and edited for clarity, by a Mod]
[You must be registered and logged in to see this image.]
Hongkong Phooey
Posts: 192
Join date: 2014-08-30
[You must be registered and logged in to see this image.]
+++++++++++++++++
QUOTE
Here’s a dilemma, in looking at the captures something else jumps out, please read the extract from Wikipedia this hopefully will be self explanatory, note the highlighting is mine.
QUOTE WIKIPEDIA
Robots Exclusion Standard
The robots exclusion standard, also known as the robots exclusion protocol or robots.txt protocol, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies the instruction format to be used to inform the robot about which areas of the website should not be processed or scanned. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. Not all robots cooperate with the standard including email harvesters, spambots and malware robots that scan for security vulnerabilities. The standard is different from, but can be used in conjunction with Sitemaps, a robot inclusion standard for websites.
When a site owner wishes to give instructions to web robots they place a text file called robots.txt in the root of the web site hierarchy (e.g. [You must be registered and logged in to see this link.] This text file contains the instructions in a specific format (see examples below). Robots that choose to follow the instructions try to fetch this file and read the instructions before fetching any other file from the web site. If this file doesn't exist, web robots assume that the web owner wishes to provide no specific instructions, and crawl the entire site.
A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. Links to pages listed in robots.txt can still appear in search results if they are linked to from a page that is crawled.
UNQUOTE WIKIPEDIA
When applying this standard to the ceop captures the results are very interesting in April 07 there was 102 robot.txt urls captured at least one for every day (obviously some days were more (75 on 25th for some reason) and others were singular like the 26th & 28th (it should be noted not every day was crawled, 15 in total including 30th). Now given what we have read above the 30th needs to be looked at.
30/04/07 No robot.txt urls captured,
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs??? Obviously there are still question marks around captures with future dates that still need to be explained.
I'd appreciate Resistor's opinion and a post onto CMOMM.
UNQUOTE
[Post re-formatted, and edited for clarity, by a Mod]
[You must be registered and logged in to see this image.]
Hongkong Phooey
Posts: 192
Join date: 2014-08-30
[You must be registered and logged in to see this image.]
Rufus T- Posts : 269
Activity : 312
Likes received : 3
Join date : 2013-06-18
Location : Glasgow
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Tony Bennett wrote:REPLY: Wayback Machine is a vast archive which preserves for posterity actions which take place on the internet - such as the creation and alteration of websites, and individual pages on those websites.Suspicious Mind wrote:Please excuse my ignorance when I ask what the debate is with regards to this Wayback site? I only joined this site yesterday and had minimal time to look through it. But, I noticed there seems to be quite a stir caused by the fact the Madeleine McCann case ended up on this Wayback site. What exactly is the issue, if any? What is Wayback? Can someone please enlighten this newcomer?
A Brit living in the U.S. called Steve Marsden said he found a record of CEOP - the Child Exploitation and Online Protection Centre - having created a page about Madeleine McCann before 11.58am on 30 April 2007. This, if true, would suggest that something bad happened to Madeleine before then and not on 3 May 2007 as the McCanns claim.
The view espoused especially by posters Nuala and Syn on this forum is that this was an unfortunate (but as yet unspecified) 'glitch' in Wayback's system, i.e. a mistake.
Thanks for the reply Tony!
That machine must take some looking into as I would guess there are a lot of alterations on a daily basis? Even so, if it is true that this guy found such a page, it is a pretty scary thought, unless it was a glitch as thought by the posters you mentioned. If it is true, it makes you wonder what the hell is going on with regards to this family. It's a bit like the Jane Standing news bulletin from New York where she reports WTC 7 having gone down yet it is still standing in the background as she reports live on the BBC. Maybe there are a lot of people out there having premonitions.
Suspicious Mind- Posts : 10
Activity : 10
Likes received : 0
Join date : 2015-07-07
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
To give a crude and basic example of how computers can seem to do odd things.
When I'm logged out of this forum the timings on posts are in the 12 hour format. I guess the server/computer the forum sits on has its clock set in 12 hour format.
When I log in the posts change to 24 hour format. I guess the forum then takes the time as set in the forum profile settings or what my PC clock setting is (24h)
Of course this Wayback issue is vastly more complex but I think ultimately it probably is an error. I think what makes a lot here more angry is that Wayback aren't obligated to explain the issue. Maybe they might have to reveal flaws/limitations in their crawler program to do so?
They wouldn't really want to do that if they can help it. I'm sure the code or 'recipe' of the crawler program is a closely guarded secret like the Coca Cola recipe.
When I'm logged out of this forum the timings on posts are in the 12 hour format. I guess the server/computer the forum sits on has its clock set in 12 hour format.
When I log in the posts change to 24 hour format. I guess the forum then takes the time as set in the forum profile settings or what my PC clock setting is (24h)
Of course this Wayback issue is vastly more complex but I think ultimately it probably is an error. I think what makes a lot here more angry is that Wayback aren't obligated to explain the issue. Maybe they might have to reveal flaws/limitations in their crawler program to do so?
They wouldn't really want to do that if they can help it. I'm sure the code or 'recipe' of the crawler program is a closely guarded secret like the Coca Cola recipe.
TheTruthWillOut- Posts : 733
Activity : 754
Likes received : 19
Join date : 2011-09-26
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Nuala- Posts : 130
Activity : 130
Likes received : 0
Join date : 2015-06-19
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
suzysu wrote:Thank you Syn for not finding my question maddening :)Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
I accept that they're a non profit organisation, but even then, they are HUGE and, as we have been told, their data is (or has been) relied upon in court.
In order to preserve their integrity, wouldn't one expect (and have a right to expect) that if they have 'archived incorrectly' they have a duty to explain the error? If they don't, how can their data ever be relied upon in the future?
This is making a mockery of their entire raison d'etre.
Very welcome suzysu :) Always happy to try and answer any questions no matter how non techie :) There is a lot about this subject that I do not fully understand myself too :)
In answer to your other questions, I see RustyJames has already kindly responded and explained better than I could to explain archive.org's raison d'etre :)
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
rustyjames wrote:Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
Syn - I generally agree with most of your posts, but can you explain what you consider a "subset issue" to be as my view is the continued use of the terminology "subset" is a case of Chinese whispers.
For reference my take on it in response to Tony where he quoted the manual section you'd highlighted is here - [You must be registered and logged in to see this link.]
I think you are right in that what I posted was to do with replay mode and I have recently written to archive.org (not mentioning the CEOP pages at all) and have asked questions regarding something that I found that suggests that timestamp issues have been encountered with archives when repackaging subsets of ARC data to (W)ARC files and in some cases back to ARC. I am hoping that they reply.
This is what led me down the route of asking the question which may or may not lead to anything.
It is lengthy I'm afraid but I have afeeling that it is something that you would understand as I am sure you have mentioned ARC/WARC previously
WARC spec clarification on transformed WARCs
3 posts by 2 authors
[You must be registered and logged in to see this link.]
14/01/2009
Other recipients: [You must be registered and logged in to see this link.]
hi WARC Tools,
can you please clarify the WARC spec with regard to the
WARC-Date field (part 1) and warcinfo records in WARCs
transformed from ARCs (part 2) for us? these issues came up when
comparing Heritrix (2.0.2) and warc tools (r242) arc2warc
output.
----------------------------------------------------------------
part 1
----------------------------------------------------------------
according to the WARC spec[1] ISO/DIS 28500 (v0.18):
5.4 WARC-Date
"The timestamp shall represent the instant that
data capture for record creation began."
this may mean that the creation date of the WARC file itself
(from an original ARC) would not be captured. also, WARC files
converted from ARCs which predate the WARC format might have a
WARC-Date field which predates the WARC format.
is this what we want?
this issue came up when comparing the output of:
1) Heritrix's Arc2Warc.java class, and
2) WARC Tools' arc2warc
given an ARC file whose date is:
2008-12-19 23:22:43
converting the ARC to a WARC with Heritrix gives:
WARC-Date: 2009-01-05T22:25:39Z
in the first record (a warcinfo record), which is the creation
date.
while converting to a WARC with warc tools gives:
WARC-Date: 2008-12-19T23:22:43Z
in the first record (which is a response record - see part 2).
so, do we want the WARC-Date field in the warcinfo record
to be the date of the first record, or the creation date
of the WARC file itself?
attachments:
arc2warc-arc.txt: head of Original ARC file
arc2warc-h2.txt : head of WARC from Heritrix's Arc2Warc.java
arc2warc-wt.txt : head of WARC from WARC tools arc2warc
----------------------------------------------------------------
part 2:
----------------------------------------------------------------
even more conspicuously, the warc tools transformed WARC gives
the first record as type:
WARC-Type: response
with a target URI of:
WARC-Target-URI: [You must be registered and logged in to see this link.]
which yields a significantly different record than the Heritrix
transformed WARC, which gives a 'warcinfo' record as the initial
record of the transformed WARC file. (see attachments)
furthermore, the WARC spec states in section "4 File and record
model":
All 'warcinfo' 'request', 'metadata' and 'revisit'
records shall not have a payload.
but Heritrix's Arc2Warc class outputs a warcinfo record that has
a "Filedesc:" payload.
please let us know what you think of these differences so we can
determine how best to converge.
thanks,
/st...@archive.org
[1] [You must be registered and logged in to see this link.]
Attachments (3)
arc2warc-arc.txt
1 KB View Download
arc2warc-h2.txt
1 KB View Download
arc2warc-wt.txt
1 KB View Download
Gordon Paynter
19/01/2009
Other recipients: [You must be registered and logged in to see this link.], [You must be registered and logged in to see this link.]
Hi Steve:
While I cannot answer your questions myself, I did send them to Clement
at BNF, who made the following response (which I hope he will not mind
my sharing). I hope you find it useful.
Gordon
Hi Gordon,
I send you few comments on the questions on WARC (Part 2 precedes Part
1)
Part 2:
As far as I know, the Warcinfo record has been designed to play the
role of the "filedesc" of the ARC format.
However, the Warcinfo record of a migrated WARC file shall describe the
migration process (and it is not possible to have two warcinfo records
within the same WARC file).
On the other hand, an ARC filedesc record can't be considered as a real
"response", so it shall not be migrated in a WARC "response" record.
A solution may be to create a Warcinfo record describing a migration
process,
AND
to create a metadata record containing the content of the ARC filedesc
record.
On the question of the payload:
The payload in the WARC standard is defined as a "Data object referred
to, or contained by a WARC record as a meaningful subset of the content
block" (p. 3).
Defining a "meaningful subset" is useful, because one could want to
check data integrity of the payload (that is the file harvested on the
Net, without http responses), or identify its format.
In the Warcinfo record given as an example of the output of Heritrix's
ARC2WARC class, the text written after the headers seems to be only the
block of the record, so there is no inconsistency with the standard.
Part 1:
It seems to be a very critical issue.
To my opinion, a WARC response record migrated from a ARC record shall
have the same date than the previous ARC record.
That is:
a ARC record whose date is 2008-12-19 23:22:43
shall be migrated in a response record with WARC-Date:
2008-12-19T23:22:43Z
On the other hand, the migrated WARC response record should be linked
to the Warcinfo record describing the migration process, whose date
should be WARC-Date: 2009-01-05T22:25:39Z
The date of the metadata record containing the "filedesc" shall also be
2009-01-05T22:25:39Z, but it will be necessary to put the original date
of the ARC filedesc record somewhere else in the WARC metadata record.
This solution allows to record:
- the original harvest date
- the migration date
- and it seems a good solution for access tools such as Wayback
Machine
It has three shortcomings:
- this solution is not formally written in the standard (but the
standard gives no rule to manage migrated WARC files)
- the WARC response record dates predate the WARC format (but it is not
a real problem, to my opinion)
- it is not very consistent with the way we shall treat conversion
records (they shall have the WARC date of their creation, not of the
creation of the original WARC record, see the example in the standard p.
24).
-... but it seems to me the best solution!
I hope these few ideas will be useful, please say me what are your
opinion on these topics.
Clément
- - - - - - - - - -
Clément Oury
Digital Curator
Digital Legal Deposit
Bibliothèque nationale de France
Quai François-Mauriac
75706 Paris Cedex 13
tel. 33 (0)1 53 79 46 93
>>> "st...@archive.org"
- show quoted text -
siznax
28/01/2009
Gordon and Clement,
thanks for your thoughtful response.
your suggestions sound perfectly reasonable. i'll try
to restate them below so that you can confirm that we
have reached a consensus.
given the following WARC states, the following conditions
should apply:
1) original WARC
warcinfo record should serve as ARC "filedesc" record,
with optional WARC generation
2) migrated WARC (ARC->WARC)
a) warcinfo record should serve as migration description,
warcinfo/WARC-Date should be migrated WARC creation date
b) metadata record should contain content of ARC "filedesc"
record, metadata/WARC-Date should be migrated WARC creation
date, ARC "filedesc" date should also be in this record,
and possibly the WARC generation could be indicated here
c) response records should have the same date as each
corresponding ARC record
3) second-generation WARC (WARC->ARC->WARC)
a) same conditions as (2), and
b) warcinfo record should indicate WARC generation
i believe we would need to agree then on the form of the
fields for:
2b) original ARC "filedesc" date in migrated WARC metadata
record, e.g. metadata/"ARC-Filedesc-Date" with ISO8601 date.
1,2b,3b) WARC generation specified in warcinfo record,
e.g. warcinfo/"WARC-Generation" with integer value
indicating; 0=original WARC, 1=migrated WARC,
2=second-generation WARC, etc.
i'm not sure if "WARC-Generation" is necessary, but it seems
potentially useful.
thanks so much,
/st...@archive.org
[You must be registered and logged in to see this link.]
Myriad of info on Heritrix and on ARC ->WARC ->ARC etc here but you have to sign up https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=4865
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
I've registered so I can post on this thread,.Nuala wrote:@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Can you show us all the robot.txt for 30/04 rather than the 29/04
Guest- Guest
Page 26 of 33 • 1 ... 14 ... 25, 26, 27 ... 29 ... 33
Similar topics
» Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
» The McCanns family trip to Sagres 30th April
» How Maddie's creche attendance was "arranged"
» WHAT REALLY HAPPENED ON Sunday 29 APRIL 2007? - Today marks the 10th Anniversary of what many of us, sadly, believe to have been Madeleine's last day.
» Gary Hagland's seven months, Sep 2007 to April 2008, close to the heart of the private Madeleine McCann investigations
» The McCanns family trip to Sagres 30th April
» How Maddie's creche attendance was "arranged"
» WHAT REALLY HAPPENED ON Sunday 29 APRIL 2007? - Today marks the 10th Anniversary of what many of us, sadly, believe to have been Madeleine's last day.
» Gary Hagland's seven months, Sep 2007 to April 2008, close to the heart of the private Madeleine McCann investigations
The Complete Mystery of Madeleine McCann™ :: Reference :: WaybackMachine / CEOP shows Maddie missing on 30 April
Page 26 of 33
Permissions in this forum:
You cannot reply to topics in this forum