Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
The Complete Mystery of Madeleine McCann™ :: Reference :: WaybackMachine / CEOP shows Maddie missing on 30 April
Page 23 of 28 • Share
Page 23 of 28 • 1 ... 13 ... 22, 23, 24 ... 28
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
To give a crude and basic example of how computers can seem to do odd things.
When I'm logged out of this forum the timings on posts are in the 12 hour format. I guess the server/computer the forum sits on has its clock set in 12 hour format.
When I log in the posts change to 24 hour format. I guess the forum then takes the time as set in the forum profile settings or what my PC clock setting is (24h)
Of course this Wayback issue is vastly more complex but I think ultimately it probably is an error. I think what makes a lot here more angry is that Wayback aren't obligated to explain the issue. Maybe they might have to reveal flaws/limitations in their crawler program to do so?
They wouldn't really want to do that if they can help it. I'm sure the code or 'recipe' of the crawler program is a closely guarded secret like the Coca Cola recipe.
When I'm logged out of this forum the timings on posts are in the 12 hour format. I guess the server/computer the forum sits on has its clock set in 12 hour format.
When I log in the posts change to 24 hour format. I guess the forum then takes the time as set in the forum profile settings or what my PC clock setting is (24h)
Of course this Wayback issue is vastly more complex but I think ultimately it probably is an error. I think what makes a lot here more angry is that Wayback aren't obligated to explain the issue. Maybe they might have to reveal flaws/limitations in their crawler program to do so?
They wouldn't really want to do that if they can help it. I'm sure the code or 'recipe' of the crawler program is a closely guarded secret like the Coca Cola recipe.
TheTruthWillOut- Posts : 733
Activity : 754
Likes received : 19
Join date : 2011-09-26
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Nuala- Posts : 130
Activity : 130
Likes received : 0
Join date : 2015-06-19
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
suzysu wrote:Thank you Syn for not finding my question maddening :)Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
I accept that they're a non profit organisation, but even then, they are HUGE and, as we have been told, their data is (or has been) relied upon in court.
In order to preserve their integrity, wouldn't one expect (and have a right to expect) that if they have 'archived incorrectly' they have a duty to explain the error? If they don't, how can their data ever be relied upon in the future?
This is making a mockery of their entire raison d'etre.
Very welcome suzysu :) Always happy to try and answer any questions no matter how non techie :) There is a lot about this subject that I do not fully understand myself too :)
In answer to your other questions, I see RustyJames has already kindly responded and explained better than I could to explain archive.org's raison d'etre :)
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
rustyjames wrote:Syn wrote:No not maddening at all suzysu :) And no I don't find it odd at all. They are a non profit organisation who do not have to explain anything to anyone. They have had the decency to let us know that after further investigation the urls in question were archived incorrectly due to a subset issue which they are trying to resolve. They did not actually have to tell us anything.suzysu wrote:@Syn ".....All present and correct and NONE of them archived
on 30th April 2007 despite what the WB source directory claims."
I'm sure it's maddening to you, as a techie, to have to deal with non-tech questions, but don't you find it ODD that an organisation as otherwise-credible as WBM hasn't come out publicly with an explanation?
Syn - I generally agree with most of your posts, but can you explain what you consider a "subset issue" to be as my view is the continued use of the terminology "subset" is a case of Chinese whispers.
For reference my take on it in response to Tony where he quoted the manual section you'd highlighted is here - [You must be registered and logged in to see this link.]
I think you are right in that what I posted was to do with replay mode and I have recently written to archive.org (not mentioning the CEOP pages at all) and have asked questions regarding something that I found that suggests that timestamp issues have been encountered with archives when repackaging subsets of ARC data to (W)ARC files and in some cases back to ARC. I am hoping that they reply.
This is what led me down the route of asking the question which may or may not lead to anything.
It is lengthy I'm afraid but I have afeeling that it is something that you would understand as I am sure you have mentioned ARC/WARC previously
WARC spec clarification on transformed WARCs
3 posts by 2 authors
[You must be registered and logged in to see this link.]
14/01/2009
Other recipients: [You must be registered and logged in to see this link.]
hi WARC Tools,
can you please clarify the WARC spec with regard to the
WARC-Date field (part 1) and warcinfo records in WARCs
transformed from ARCs (part 2) for us? these issues came up when
comparing Heritrix (2.0.2) and warc tools (r242) arc2warc
output.
----------------------------------------------------------------
part 1
----------------------------------------------------------------
according to the WARC spec[1] ISO/DIS 28500 (v0.18):
5.4 WARC-Date
"The timestamp shall represent the instant that
data capture for record creation began."
this may mean that the creation date of the WARC file itself
(from an original ARC) would not be captured. also, WARC files
converted from ARCs which predate the WARC format might have a
WARC-Date field which predates the WARC format.
is this what we want?
this issue came up when comparing the output of:
1) Heritrix's Arc2Warc.java class, and
2) WARC Tools' arc2warc
given an ARC file whose date is:
2008-12-19 23:22:43
converting the ARC to a WARC with Heritrix gives:
WARC-Date: 2009-01-05T22:25:39Z
in the first record (a warcinfo record), which is the creation
date.
while converting to a WARC with warc tools gives:
WARC-Date: 2008-12-19T23:22:43Z
in the first record (which is a response record - see part 2).
so, do we want the WARC-Date field in the warcinfo record
to be the date of the first record, or the creation date
of the WARC file itself?
attachments:
arc2warc-arc.txt: head of Original ARC file
arc2warc-h2.txt : head of WARC from Heritrix's Arc2Warc.java
arc2warc-wt.txt : head of WARC from WARC tools arc2warc
----------------------------------------------------------------
part 2:
----------------------------------------------------------------
even more conspicuously, the warc tools transformed WARC gives
the first record as type:
WARC-Type: response
with a target URI of:
WARC-Target-URI: [You must be registered and logged in to see this link.]
which yields a significantly different record than the Heritrix
transformed WARC, which gives a 'warcinfo' record as the initial
record of the transformed WARC file. (see attachments)
furthermore, the WARC spec states in section "4 File and record
model":
All 'warcinfo' 'request', 'metadata' and 'revisit'
records shall not have a payload.
but Heritrix's Arc2Warc class outputs a warcinfo record that has
a "Filedesc:" payload.
please let us know what you think of these differences so we can
determine how best to converge.
thanks,
/st...@archive.org
[1] [You must be registered and logged in to see this link.]
Attachments (3)
arc2warc-arc.txt
1 KB View Download
arc2warc-h2.txt
1 KB View Download
arc2warc-wt.txt
1 KB View Download
Gordon Paynter
19/01/2009
Other recipients: [You must be registered and logged in to see this link.], [You must be registered and logged in to see this link.]
Hi Steve:
While I cannot answer your questions myself, I did send them to Clement
at BNF, who made the following response (which I hope he will not mind
my sharing). I hope you find it useful.
Gordon
Hi Gordon,
I send you few comments on the questions on WARC (Part 2 precedes Part
1)
Part 2:
As far as I know, the Warcinfo record has been designed to play the
role of the "filedesc" of the ARC format.
However, the Warcinfo record of a migrated WARC file shall describe the
migration process (and it is not possible to have two warcinfo records
within the same WARC file).
On the other hand, an ARC filedesc record can't be considered as a real
"response", so it shall not be migrated in a WARC "response" record.
A solution may be to create a Warcinfo record describing a migration
process,
AND
to create a metadata record containing the content of the ARC filedesc
record.
On the question of the payload:
The payload in the WARC standard is defined as a "Data object referred
to, or contained by a WARC record as a meaningful subset of the content
block" (p. 3).
Defining a "meaningful subset" is useful, because one could want to
check data integrity of the payload (that is the file harvested on the
Net, without http responses), or identify its format.
In the Warcinfo record given as an example of the output of Heritrix's
ARC2WARC class, the text written after the headers seems to be only the
block of the record, so there is no inconsistency with the standard.
Part 1:
It seems to be a very critical issue.
To my opinion, a WARC response record migrated from a ARC record shall
have the same date than the previous ARC record.
That is:
a ARC record whose date is 2008-12-19 23:22:43
shall be migrated in a response record with WARC-Date:
2008-12-19T23:22:43Z
On the other hand, the migrated WARC response record should be linked
to the Warcinfo record describing the migration process, whose date
should be WARC-Date: 2009-01-05T22:25:39Z
The date of the metadata record containing the "filedesc" shall also be
2009-01-05T22:25:39Z, but it will be necessary to put the original date
of the ARC filedesc record somewhere else in the WARC metadata record.
This solution allows to record:
- the original harvest date
- the migration date
- and it seems a good solution for access tools such as Wayback
Machine
It has three shortcomings:
- this solution is not formally written in the standard (but the
standard gives no rule to manage migrated WARC files)
- the WARC response record dates predate the WARC format (but it is not
a real problem, to my opinion)
- it is not very consistent with the way we shall treat conversion
records (they shall have the WARC date of their creation, not of the
creation of the original WARC record, see the example in the standard p.
24).
-... but it seems to me the best solution!
I hope these few ideas will be useful, please say me what are your
opinion on these topics.
Clément
- - - - - - - - - -
Clément Oury
Digital Curator
Digital Legal Deposit
Bibliothèque nationale de France
Quai François-Mauriac
75706 Paris Cedex 13
tel. 33 (0)1 53 79 46 93
>>> "st...@archive.org"
- show quoted text -
siznax
28/01/2009
Gordon and Clement,
thanks for your thoughtful response.
your suggestions sound perfectly reasonable. i'll try
to restate them below so that you can confirm that we
have reached a consensus.
given the following WARC states, the following conditions
should apply:
1) original WARC
warcinfo record should serve as ARC "filedesc" record,
with optional WARC generation
2) migrated WARC (ARC->WARC)
a) warcinfo record should serve as migration description,
warcinfo/WARC-Date should be migrated WARC creation date
b) metadata record should contain content of ARC "filedesc"
record, metadata/WARC-Date should be migrated WARC creation
date, ARC "filedesc" date should also be in this record,
and possibly the WARC generation could be indicated here
c) response records should have the same date as each
corresponding ARC record
3) second-generation WARC (WARC->ARC->WARC)
a) same conditions as (2), and
b) warcinfo record should indicate WARC generation
i believe we would need to agree then on the form of the
fields for:
2b) original ARC "filedesc" date in migrated WARC metadata
record, e.g. metadata/"ARC-Filedesc-Date" with ISO8601 date.
1,2b,3b) WARC generation specified in warcinfo record,
e.g. warcinfo/"WARC-Generation" with integer value
indicating; 0=original WARC, 1=migrated WARC,
2=second-generation WARC, etc.
i'm not sure if "WARC-Generation" is necessary, but it seems
potentially useful.
thanks so much,
/st...@archive.org
[You must be registered and logged in to see this link.]
Myriad of info on Heritrix and on ARC ->WARC ->ARC etc here but you have to sign up https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=4865
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
I've registered so I can post on this thread,.Nuala wrote:@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Can you show us all the robot.txt for 30/04 rather than the 29/04
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Good to see you HKP.
Rufus T- Posts : 269
Activity : 312
Likes received : 3
Join date : 2013-06-18
Location : Glasgow
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Interesting Syn. Yes I've said a few times I'd love to see analysis of the original .arc files - I would think they'd answer a lot of questions.
I've also wondered if they'd been migrated to .warc and whether there could be issues in that migration, but I would have thought they had that mapping of dates etc well defined prior to a migration.
It's a shame that warc wasn't used in 2007 as it records a lot of extra information and metadata.
I've also wondered if they'd been migrated to .warc and whether there could be issues in that migration, but I would have thought they had that mapping of dates etc well defined prior to a migration.
It's a shame that warc wasn't used in 2007 as it records a lot of extra information and metadata.
rustyjames- Posts : 293
Activity : 314
Likes received : 3
Join date : 2013-10-16
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Thanks Rufus T (for your help earlier as well)Rufus T wrote:Good to see you HKP.
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
What part of they have taken all the erroneous 30/04/2007 urls out of the WB archive whilst they try and resolve this issue do you not understand? Ergo Nuala nor anyone else cannot provide what you ask but safe to say it will be EXACTLY the same as it was for 29/04/2007HKP wrote:I've registered so I can post on this thread,.Nuala wrote:@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Can you show us all the robot.txt for 30/04 rather than the 29/04
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
I think it may be possible to look at the arc files for dates in and around 30/04/2007 via the Atlassion Jira website I posted earlier. Am looking into it.rustyjames wrote:Interesting Syn. Yes I've said a few times I'd love to see analysis of the original .arc files - I would think they'd answer a lot of questions.
I've also wondered if they'd been migrated to .warc and whether there could be issues in that migration, but I would have thought they had that mapping of dates etc well defined prior to a migration.
It's a shame that warc wasn't used in 2007 as it records a lot of extra information and metadata.
I agree, one would have thought that the data mapping would have been well defined but the convo on the google groups link suggests otherwise.
Yes re 2007 and WARC, they took the timestamp to 17 digits and a lot more info gleaned so if they then repackaged again back to ARC and 14 digits could that be where the errors occurred I wonder?
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
As a follow up to my last question (robots txt for 30/04/07) you would have thought that by capturing so many URLs (3876) that it would have at least captured it once, but alas it captured McCann.html insteadNuala wrote:@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Woops there goes that assumption again, nothing is safe to say it will be EXACTLY the same because in reality then you don't know. Since you're playing an assuming card let's assume that it didn't pick up a robots.txt and carried out a more rigorous sweep picking up all sorts maybe even mccann. htmlSyn wrote:What part of they have taken all the erroneous 30/04/2007 urls out of the WB archive whilst they try and resolve this issue do you not understand? Ergo Nuala nor anyone else cannot provide what you ask but safe to say it will be EXACTLY the same as it was for 29/04/2007HKP wrote:I've registered so I can post on this thread,.Nuala wrote:@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Can you show us all the robot.txt for 30/04 rather than the 29/04
What is safe to say is that the records show 28/04 was nothing like 30/04
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@ HKP
Can you show us all the robot.txt for 30/04 rather than the 29/04
The robots.txt file for 30 Apr isn't available.
But anyway it's irrelevant, because the point you made was that CEOP might have had a robots.txt that excluded mccann.html, which wasn't there on 30 Apr allowing mccann.html to be crawled.
As we can see on 29 Apr 2007 there was no exclusion request for mccann.html in the robots.txt anyway so the robots.txt not existing on 30 Apr would have made no difference.
As a follow up to my last question (robots txt for 30/04/07) you would have thought that by capturing so many URLs (3876) that it would have at least captured it once, but alas it captured McCann.html instead
As the Wayback data for 30 Apr 2007 is screwed up, it might be that nothing was actually captured on 30 Apr 2007.
BTW, I note the big grin, and just to say this might be a game to you, but it isn't a game to me. We're talking here about the disappearance of a little girl and I'm not interested in people trying to score points.
Anyone really wanting to get to the truth of what happened to Madeleine McCann would debate it rationally and maturely, I would hope.
Can you show us all the robot.txt for 30/04 rather than the 29/04
The robots.txt file for 30 Apr isn't available.
But anyway it's irrelevant, because the point you made was that CEOP might have had a robots.txt that excluded mccann.html, which wasn't there on 30 Apr allowing mccann.html to be crawled.
As we can see on 29 Apr 2007 there was no exclusion request for mccann.html in the robots.txt anyway so the robots.txt not existing on 30 Apr would have made no difference.
As a follow up to my last question (robots txt for 30/04/07) you would have thought that by capturing so many URLs (3876) that it would have at least captured it once, but alas it captured McCann.html instead
As the Wayback data for 30 Apr 2007 is screwed up, it might be that nothing was actually captured on 30 Apr 2007.
BTW, I note the big grin, and just to say this might be a game to you, but it isn't a game to me. We're talking here about the disappearance of a little girl and I'm not interested in people trying to score points.
Anyone really wanting to get to the truth of what happened to Madeleine McCann would debate it rationally and maturely, I would hope.
Nuala- Posts : 130
Activity : 130
Likes received : 0
Join date : 2015-06-19
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Well said Nuala. Ditto here tooNuala wrote:@ HKP
Can you show us all the robot.txt for 30/04 rather than the 29/04
The robots.txt file for 30 Apr isn't available.
But anyway it's irrelevant, because the point you made was that CEOP might have had a robots.txt that excluded mccann.html, which wasn't there on 30 Apr allowing mccann.html to be crawled.
As we can see on 29 Apr 2007 there was no exclusion request for mccann.html in the robots.txt anyway so the robots.txt not existing on 30 Apr would have made no difference.
As a follow up to my last question (robots txt for 30/04/07) you would have thought that by capturing so many URLs (3876) that it would have at least captured it once, but alas it captured McCann.html instead
As the Wayback data for 30 Apr 2007 is screwed up, it might be that nothing was actually captured on 30 Apr 2007.
BTW, I note the big grin, and just to say this might be a game to you, but it isn't a game to me. We're talking here about the disappearance of a little girl and I'm not interested in people trying to score points.
Anyone really wanting to get to the truth of what happened to Madeleine McCann would debate it rationally and maturely, I would hope.
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Quoting @ Syn
safe to say it will be EXACTLY the same as it was for 29/04/2007
Of course it will. And the idea that someone uploaded mccann.html on 30 Apr, and also uploaded a new robots.txt to exclude that page is ridiculous.
If they wanted to keep mccann.html secret they just wouldn't have uploaded it.
You don't upload a page you want to keep secret and then try and keep it secret with a robots.txt that is public (anyone can view it) and the exclusion request might be ignored by any crawler anyway.
Crazy idea.
safe to say it will be EXACTLY the same as it was for 29/04/2007
Of course it will. And the idea that someone uploaded mccann.html on 30 Apr, and also uploaded a new robots.txt to exclude that page is ridiculous.
If they wanted to keep mccann.html secret they just wouldn't have uploaded it.
You don't upload a page you want to keep secret and then try and keep it secret with a robots.txt that is public (anyone can view it) and the exclusion request might be ignored by any crawler anyway.
Crazy idea.
Nuala- Posts : 130
Activity : 130
Likes received : 0
Join date : 2015-06-19
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
HKP wrote:Woops there goes that assumption again, nothing is safe to say it will be EXACTLY the same because in reality then you don't know. Since you're playing an assuming card let's assume that it didn't pick up a robots.txt and carried out a more rigorous sweep picking up all sorts maybe even mccann. htmlSyn wrote:What part of they have taken all the erroneous 30/04/2007 urls out of the WB archive whilst they try and resolve this issue do you not understand? Ergo Nuala nor anyone else cannot provide what you ask but safe to say it will be EXACTLY the same as it was for 29/04/2007HKP wrote:I've registered so I can post on this thread,.Nuala wrote:@ HKP via Rufus T
Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs???
This is the robots.txt file for the CEOP website as archived on 29th April at 14:15:59:
User-agent: *
Disallow: /images/
Disallow: /pdfs/
Disallow: /role_profiles/
Nothing about excluding mccann.html there.
Also, just because the robots.txt wasn't crawled on 30 Apr 2007 doesn't mean it wasn't there. It would have been there on 30 Apr 2007, just not crawled on that date.
Note also:
1) robots.txt exclusion requests are just that, only requests. A robots.txt doesn't actually stop a crawler from crawling certain things, it just a request that they don't, so anyone wanting to hide anything wouldn't upload it and use a robots.txt to exclude it from crawlers.
2) robots.txt files are public, anyone can see them, all they have to do is enter the URL [You must be registered and logged in to see this link.] to view the file. So a robots.txt would not be used to " hide" mccann.html because it wouldn't actually hide it.
Can you show us all the robot.txt for 30/04 rather than the 29/04
What is safe to say is that the records show 28/04 was nothing like 30/04
Read and digest what Niuala has just posted. You are straw clutching to the extreme because you really want this to be some big setup by CEOP but it isn't. I concur that there is more to the McCann case than meets the eye and but you are in tinfoil hat territory with your surmises. You are struggling to understand basic concepts re robots.txt. You do Madeleine no favours whatsoever in clutching onto such way out and non-provable theories. If you want justice for her, you will have to think a lot more rationally and logically than you are currently doing.
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Nail on proverbial head there :)Nuala wrote:Quoting @ Syn
safe to say it will be EXACTLY the same as it was for 29/04/2007
Of course it will. And the idea that someone uploaded mccann.html on 30 Apr, and also uploaded a new robots.txt to exclude that page is ridiculous.
If they wanted to keep mccann.html secret they just wouldn't have uploaded it.
You don't upload a page you want to keep secret and then try and keep it secret with a robots.txt that is public (anyone can view it) and the exclusion request might be ignored by any crawler anyway.
Crazy idea.
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@ Syn
You are straw clutching to the extreme because you really want this to be some big setup by CEOP but it isn't. I concur that there is more to the the McCann case than meets the eye and but you are in tinfoil hat territory with your surmises. You are struggling to understand basic concepts re robots.txt. You do Madeleine no favours whatsoever in clutching onto such way out and non-provable theories. If you want justice for her, you will have to think a lot more rationally and logically than you are currently doing
Well said. I agree with all of that.
You are straw clutching to the extreme because you really want this to be some big setup by CEOP but it isn't. I concur that there is more to the the McCann case than meets the eye and but you are in tinfoil hat territory with your surmises. You are struggling to understand basic concepts re robots.txt. You do Madeleine no favours whatsoever in clutching onto such way out and non-provable theories. If you want justice for her, you will have to think a lot more rationally and logically than you are currently doing
Well said. I agree with all of that.
Nuala- Posts : 130
Activity : 130
Likes received : 0
Join date : 2015-06-19
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
It's a bit like a big daddy and giant haystacks tag team around here sometimes, you both have concentrated on robots.txts being exclusions what I actually posted was...Nuala wrote:@ Syn
You are straw clutching to the extreme because you really want this to be some big setup by CEOP but it isn't. I concur that there is more to the the McCann case than meets the eye and but you are in tinfoil hat territory with your surmises. You are struggling to understand basic concepts re robots.txt. You do Madeleine no favours whatsoever in clutching onto such way out and non-provable theories. If you want justice for her, you will have to think a lot more rationally and logically than you are currently doing
Well said. I agree with all of that.
"Given the above statement of ‘if this file doesn't exist web robots assume the web owner wishes to provide no specific instructions, and crawls the entire site’ is this what happened and the entire site was crawled picking up mccann.html & madeleine 01 & 02 jpgs??? Obviously there are still question marks around captures with future dates that still need to be explained."
What part of that statement did I say the McCann.html was on the robots.txt list? I suggested that for some reason the robots.txt file was not present and a more in depth sweep was conducted (the high number of captures would indicate a bigger sweep) which picked up McCann.html
As for the lecture, I've been around the Maddie forums since the 3As & Mirror I don't need you nit picking and espousing forum etiquette.
ETA Your non provable theory's statement. Pot...kettle...black
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@ HKP
I suggested that for some reason the robots.txt file was not present
I understood that. You must have missed me saying:
so the robots.txt not existing on 30 Apr would have made no difference.
I suggested that for some reason the robots.txt file was not present
I understood that. You must have missed me saying:
so the robots.txt not existing on 30 Apr would have made no difference.
Nuala- Posts : 130
Activity : 130
Likes received : 0
Join date : 2015-06-19
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
You seem to be missing something, 30/04 was like no other, something happened (do you even agree with that?) and an extraordinary amount of repetition and captures was conducted. Can you categorically state that a robots txt file from any previous crawl or current available (on that day) was 'read' and utilised? You are jumping to conclusions when stating that the robots txt on 30/04 makes no difference when you are struggling to understand (like us all) what was captured and what was not. I pointed something out and asked the question, your answer does not satisfactorily answer itNuala wrote:@ HKP
I suggested that for some reason the robots.txt file was not present
I understood that. You must have missed me saying:
so the robots.txt not existing on 30 Apr would have made no difference.
ETA and neither does your mate's
Guest- Guest
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Some people, in their endeavour to score points for some bizarre reason, fail to see the wood for the trees. They forget that we are on the same side in wanting justice for Madeleine. Everything has to be a conspiracy no matter what the evidence to the contrary and they wonder why 'antis' - hate that description as I prefer pro Madeleine, are labelled 'conspiraloons'Nuala wrote:@ HKP
I suggested that for some reason the robots.txt file was not present
I understood that. You must have missed me saying:
so the robots.txt not existing on 30 Apr would have made no difference.
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
HKP wrote:You seem to be missing something, 30/04 was like no other, something happened (do you even agree with that?) and an extraordinary amount of repetition and captures was conducted. Can you categorically state that a robots txt file from any previous crawl or current available (on that day) was 'read' and utilised? You are jumping to conclusions when stating that the robots txt on 30/04 makes no difference when you are struggling to understand (like us all) what was captured and what was not. I pointed something out and asked the question, your answer does not satisfactorily answer itNuala wrote:@ HKP
I suggested that for some reason the robots.txt file was not present
I understood that. You must have missed me saying:
so the robots.txt not existing on 30 Apr would have made no difference.
ETA and neither does your mate's
Sorry to be pedantic, and normally it does not bother me as I subscribe to the James Joyce ideology that lack of grammar and punctuation matter not but you not knowing when to use was and were is irking some what :) That is aside from your bullish assertion that a 30/04 capture really actually even existed. Twice now I and others have shown you that 30/04/2007 entries in the WB Source Directory do not correlate with the captured entries on the Wayback calendar but you have yet to counter what is fact. Why is that?
By the way, I have no idea who Nuala is, our only encounter has been on here re this subject, on which we agree. You assume we are mates and according to Resistor on your mainly frequented forum that we are some kind of tag team. Again, incorrect. I would, if given the opportunity welcome Nuala as a mate/friend :) We are two individuals who have very similar views re this 30/04/07 anomaly. I think Nuala makes a lot of sense and you really should read and digest her posts before you really make a fool of yourself. I have friends who are pro McCann which irks some of my fellow 'anti's' but we can have a civilised debate and I like them as people. You and I disagree re this CEOP issue but I bet we agree a lot re many other aspects of the McCann case. That's just the way the cookie crumbles. It would be very boring and unrealistic to be like the pros who all agree re absolutely everything :)
Syn- Posts : 109
Activity : 110
Likes received : 1
Join date : 2015-06-20
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Syn wrote:HKP wrote:You seem to be missing something, 30/04 was like no other, something happened (do you even agree with that?) and an extraordinary amount of repetition and captures was conducted. Can you categorically state that a robots txt file from any previous crawl or current available (on that day) was 'read' and utilised? You are jumping to conclusions when stating that the robots txt on 30/04 makes no difference when you are struggling to understand (like us all) what was captured and what was not. I pointed something out and asked the question, your answer does not satisfactorily answer itNuala wrote:@ HKP
I suggested that for some reason the robots.txt file was not present
I understood that. You must have missed me saying:
so the robots.txt not existing on 30 Apr would have made no difference.
ETA and neither does your mate's
Sorry to be pedantic, and normally it does not bother me as I subscribe to the James Joyce ideology that lack of grammar and punctuation matter not but you not knowing when to use was and were is irking some what :) That is aside from your bullish assertion that a 30/04 capture really actually even existed. Twice now I and others have shown you that 30/04/2007 entries in the WB Source Directory do not correlate with the captured entries on the Wayback calendar but you have yet to counter what is fact. Why is that?
By the way, I have no idea who Nuala is, our only encounter has been on here re this subject, on which we agree. You assume we are mates and according to Resistor on your mainly frequented forum that we are some kind of tag team. Again, incorrect. I would, if given the opportunity welcome Nuala as a mate/friend :) We are two individuals who have very similar views re this 30/04/07 anomaly. I think Nuala makes a lot of sense and you really should read and digest her posts before you really make a fool of yourself. I have friends who are pro McCann which irks some of my fellow 'anti's' but we can have a civilised debate and I like them as people. You and I disagree re this CEOP issue but I bet we agree a lot re many other aspects of the McCann case. That's just the way the cookie crumbles. It would be very boring and unrealistic to be like the pros who all agree re absolutely everything :)
I'll address a couple of your posts if I may.
@ 12:36 am you stated “ Some people, in their endeavour to score points for some bizarre reason fail to see the wood for the trees”
I’m not here to score points, I don’t resort to adding urls to Wayback (and that sort of thing) in an effort to prove a point (which it didn’t).
You then bring up that anti’s are referred to as ‘conspiraloons’ does this have any relevance to this thread if you want to call yourself that fair enough, please don’t tar anybody else who is looking for answers with that brush.
@ 01:04 am you start off with an English grammar lesson, this forum (as with others) is about the mystery surrounding Madeleine McCann, not the mystery surrounding posters grasp of grammar. Your point once again is irrelevant and there are perhaps many posters whose grasp of the English language and grammar is not as good as yours, does that make their points not as worthy?
Your statement of a “bullish assertion that a 30/04 capture really actually even existed” is countered by nobody has proven otherwise. There may be question marks around some of the content (look at my post re. robots.txt where I specifically say “obviously there are still question marks around captures with future dates that still need to be looked at” in the last line). For information, 30/04 was a crawl date in the years 2006, 2007, 2008 & 2013 according to the records.
The twice shown entries of future dated news that you are using as evidence, I don’t see anybody particularly arguing that you are wrong, the records show entries for news items that could not have been in existence, that needs an explanation from the Wayback guys not me. What it does show is that some urls are there in error, it does not show they are ALL in error which seems to be your argument (feel free to correct me on that matter).
Your final paragraph where you show some sort of ‘concern’ that I am making a ‘fool of myself’, although rather touching that you care I wouldn’t bother if I were you! You seem to somehow subscribe to the fact that people are not allowed to look at this issue and question it, I have been looking at the dataset, some technical information, digested other posters views and put forward some scenarios (mostly backed up with information or data), you on the other hand seem to want to stifle my debate for some unknown reason.
Guest- Guest
WOOOOSSH - 30 April 2007 never happened and to prove it, it's gone!
Just checked the IA 10 mins ago and noticed that the 30 April 2007 calendar dates for all the ceop.gov.uk related pages have been officially wooosshed. All have gone and all records now show one less trawl for each url/page. Haven't checked out the source directory - assume that would have to be altered as well.
Wonder if the IA now feel in a position to release an explanation?! Probably not. :please:
Wonder if the IA now feel in a position to release an explanation?! Probably not. :please:
skyrocket- Posts : 755
Activity : 1537
Likes received : 732
Join date : 2015-06-18
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
I should just add - I'm still on the fence despite the fact that the IA are obviously backing up their second response (error). If it was any other subject matter my scepticism would be satisfied but under the circumstances I'm not 100% convinced. Errors obviously occured and pages were dated 30 April by mistake - the only critical thing is whether there was a ceop trawl of any size on that date or not. I don't think the 'public' will ever know - perhaps it's time to move on?
skyrocket- Posts : 755
Activity : 1537
Likes received : 732
Join date : 2015-06-18
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
To use Al Gore's memorable turn of phrase in his film about climate change: 'An Inconvenient Truth'? - airbrushed, or deleted out of existence.skyrocket wrote:Just checked the IA 10 mins ago and noticed that the 30 April 2007 calendar dates for all the ceop.gov.uk related pages have been officially wooosshed.
We need a qualified and experienced neutral to calmly assess the entire history of this - and give us an expert opinion (sorry to all the many professed experts on this thread)
____________________
Dr Martin Roberts: "The evidence is that these are the pjyamas Madeleine wore on holiday in Praia da Luz. They were photographed and the photo handed to a press agency, who released it on 8 May, as the search for Madeleine continued. The McCanns held up these same pyjamas at two press conferences on 5 & 7June 2007. How could Madeleine have been abducted?"
Amelie McCann (aged 2): "Maddie's jammies!".
Tony Bennett- Investigator
- Posts : 16926
Activity : 24792
Likes received : 3749
Join date : 2009-11-25
Age : 77
Location : Shropshire
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
Like many others I find this subject a little hard to follow, but I have stuck with it and am managing to follow the ongoing debate as best I can. The one thing that is beyond my understanding is peoples inability to be civil, debate is good, discussion is good but sniping and bitching is not. It may be cheesy and twee but it really is nice to be nice.
Rufus T- Posts : 269
Activity : 312
Likes received : 3
Join date : 2013-06-18
Location : Glasgow
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
@ Syn "Sorry to be pedantic, and normally it does not bother me as I subscribe to the James Joyce ideology that lack of grammar and punctuation matter not but you not knowing when to use was and were is irking some what :) "
I would stick to your James Joyce ideology then as coming on here and criticising someone`s grammar in order to denigrate them IS .... irking ......................... somewhat !
I would stick to your James Joyce ideology then as coming on here and criticising someone`s grammar in order to denigrate them IS .... irking ......................... somewhat !
Richard IV- Posts : 552
Activity : 825
Likes received : 265
Join date : 2015-03-06
Re: Steve Marsden's WBM screenshot: The CEOP Home page for April 30, 2007 also refers to Missing Madeleine.
The wooshing was done a while ago, not long after they realised (told) that there was an issue. We can't be sure that anything we are picking up is the true reflection unless a capture by Stevo at the time was conducted and made availableTony Bennett wrote:To use Al Gore's memorable turn of phrase in his film about climate change: 'An Inconvenient Truth'? - airbrushed, or deleted out of existence.skyrocket wrote:Just checked the IA 10 mins ago and noticed that the 30 April 2007 calendar dates for all the ceop.gov.uk related pages have been officially wooosshed.
We need a qualified and experienced neutral to calmly assess the entire history of this - and give us an expert opinion (sorry to all the many professed experts on this thread)
As for a qualified and experienced nuetral, I can't help there as I fit none of those three categories.
Guest- Guest
Page 23 of 28 • 1 ... 13 ... 22, 23, 24 ... 28
Similar topics
» Claim by 'Stevo' - "CEOP show Maddie is missing on 30th April 2007"
» The McCanns family trip to Sagres 30th April
» How Maddie's creche attendance was "arranged"
» Shortly after Madeleine was reported missing, in June 2007, Gerry announced, “We want a big event to raise awareness that she is still missing. It wouldn’t be a one-year anniversary, it will be sooner than that”
» Was Madeleine seen after Sunday?
» The McCanns family trip to Sagres 30th April
» How Maddie's creche attendance was "arranged"
» Shortly after Madeleine was reported missing, in June 2007, Gerry announced, “We want a big event to raise awareness that she is still missing. It wouldn’t be a one-year anniversary, it will be sooner than that”
» Was Madeleine seen after Sunday?
The Complete Mystery of Madeleine McCann™ :: Reference :: WaybackMachine / CEOP shows Maddie missing on 30 April
Page 23 of 28
Permissions in this forum:
You cannot reply to topics in this forum