Further Light and Knowledge

FLAK Statistics, a graph of posts per day.
NEW! Archive of The View from the Foyer.
It is currently Thu May 23, 2013 9:15 am

All times are UTC [ DST ]




Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
PostPosted: Wed Jul 11, 2007 3:16 pm 
Election Made Sure
User avatar

Joined: Wed Aug 23, 2006 5:32 pm
Posts: 2046
Location: Scotland
[ EDIT: after five frustrating hours trying to overcome Google's broken image search, I finally found a good spidering program that will do the job properly. Hoorah! But it's a shame to waste a good rant, so here it is anyway: ]


I just wanted to rant about Google's 1000 search limit.

Google, Yahoo, etc. simply won't serve more than 1000 results for any query. This is usually (but not always) enough for text searches, since you can refine them with boolean operators. But images are a different story.

Image recognition software is extremely unreliable, so search engines rely on key words in the surrounding text. Key words are often useless, especially if you're looking for background images (i.e. what is the person in the picture standing in front of? What is on their T-shirt?), or images that show a common object in an unusual way. For these purposes, image searching is grunt work where you have to manually scan images, a hundred at a time, to find what you want. In effect, the 1000 results becaome the raw materials and YOU do the searching. Having a 1000 limit is like having a search engine that only indexes 1000 web sites.

The 1000 image limit is like a car that starts, accelerates for five seconds then stops and the driver gets out announcing he has arrived and nobody could possibly want to go any further. Or it's like getting a little peak into the candy shop then the door is slammed shut and the shutters are pulled down and the search engine hangs a big "closed" sign on the window.

I have several examples of images that I know are almost certainly out there, and I even know which domain they are on, and they have low file size and are out of copyright. But the site is very large (too large to manually load every page), and finding obscure images is not its purpose, so the webmaster won't help. Meanwhile it uses a javascript method that has defeated every spidering program I have yet tried. Google knows where the images are, but it's not telling. As a search engine it would make a great security guard.

There. I feel better for that rant. Thanks for listening.

_________________
P.S.
I agree with everything ever posted by Hellmut, Philo and Susan D.
...
answers to life's biggest questions


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 11, 2007 4:54 pm 
Election Made Sure
User avatar

Joined: Mon Sep 18, 2006 10:15 pm
Posts: 4832
Location: Saarbrücken
Sorry about this, Chris. What program did the trick?

_________________
Love before loyalty, people before the organization, and principle before the tribe.
Main Street Plaza


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 11, 2007 5:25 pm 
Election Made Sure
User avatar

Joined: Wed Aug 23, 2006 5:24 pm
Posts: 3747
Location: 30 minutes from 5 temples.
Hellmut wrote:
Sorry about this, Chris. What program did the trick?


Yeah. What program, and what was the Javascript method that was in your way?

_________________
As to marriage or celibacy, let a man take which course he will, he will be sure to repent.
-- Socrates


Top
 Profile  
 
 Post subject:
PostPosted: Wed Jul 11, 2007 8:58 pm 
Election Made Sure
User avatar

Joined: Wed Aug 23, 2006 5:32 pm
Posts: 2046
Location: Scotland
Well I'm only guessing it was Javascript. I downloaded about 6 or 8 different image spiders, and most either did nothing or they mangled up the URL then did nothing. E.g. www.domain.com/dir/page.html#105 became www.domain.com/dir/page.html#105page.html#105page.html#105 or something similar. I've seen similar behavior on pages that deliberately hide their images to avoid hotlinking, so I figured that was the most likely explanation. Or maybe the spiders just wern't smart enough to cope with hashes?

Either way, Abacre Photo Download seems to work perfectly. (Except that I'm using a keyboard with built in mousepad that left clicks when you don't want to, which means I just had a minor disaster, but I can't blame Abacre for that!) You just point it at a domain and it downloads all the images.

For the record, I've been looking for Victorian images. Some stock photo sites carry a limited range, but they charge a fortune: I've seen some sites that charge GBP60 for a single image! Which is crazy considering the image itself is public domain. What I really need is generic Victorian backgrounds, so I don't really care what the picture is of, I'm more interested in the walls, doors, rooftops, etc. To cut a long story short, project Gutenberg has a wonderful collection of images scattered around its boks, but there is no central way to find them. Google images can find some real beauties, but as I noted you have to search around 100 images to find 1 good one. And no matter what search term you use, over half of the images seem to be the same. In short, I know there are over ten thousand copyright free images there, but can only ever see the same one thousand. Which is where Abacre comes in.

I realize that leeching all the images is is a bit of a bandwidth hog, but no search engine will help so it's the only method I can find. And 'll only be doing it once and most of the pics are around 50k or smaller. And finally (continuing to justify my greed) the whole reason for doing this is to make a video game that promotes the exact same books that Gutenberg promotes, so I kind of think we're on the same side.

_________________
P.S.
I agree with everything ever posted by Hellmut, Philo and Susan D.
...
answers to life's biggest questions


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron

Protected by Anti-Spam ACP Powered by phpBB® © thefoyer.org, 2011