Tag Archive for 'search'

Where’s the case for data retention?

Google’s CEO Eric Schmidt announced today that he thinks the greatest danger to people’s privacy is not from leaks of people’s data as happened earlier this week to AOL users but rather from government snooping.

I have always worried the query stream is a fertile ground for governments to snoop on the people.

This is a very valid argument and it has to be said that it is definitely in Google’s best economic interest to ensure that no-one can access their massive databases of saved searches. The same cannot be said for Irish ISPs and telcos who are being tasked with keeping three years of log files on all their customers. There is almost no incentive for them to secure this data - it is nothing but a dead cost for them and one they wish would go away. This data will more than likely be leaked and sold time and time again by everyone from crooked Gardaí (the Irish police) to minimum wage call centre employees.

Having said that no lock is uncrackable and if someone wants to get at Google’s databases badly enough, they will find a way. The easiest way to thwart this is not to retain the data!

The myth of privacy

You do know that every search term you type into a search engine is saved by the search engine, don’t you? That time you searched for porn, or an ex boy/girlfriend, or information about an illness you thought you might have - all saved by the search engine.

This practice was brought sharply into focus when AOL purposefully posted 3 months of search data on the Internet. Usernames were replaced with numbers but it was still possible to identify some of the searchers. The New York Times runs a story today about a Ms Thelma Arnold, a 62 year old living in Lilburn, Georgia. Ms Arnold was searcher number 4417749 in AOL’s records but was readily identifiable based on her searches for “numb fingersâ€?, “60 single menâ€?, “dog that urinates on everythingâ€?, “landscapers in Lilburn, Ga,â€? several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.â€?

Marketers are going to have a ball with all this info!

Someone has helpfully taken a copy of the data and put a web interface on it to make it easier to query!

Michael Geist said it best when he said:

The article provides a powerful illustration not only of the severity of the AOL mistake (which remains online for all to see), but of why search companies simply should not be retaining this data for any significant period of time. The public privacy risks, whether self-inflicted, from hackers, or via law enforcement fishing expeditions, outweigh the private commercial benefits.

While Ms Arnold is quoted in the New York times article as saying

My goodness, it’s my whole personal life, I had no idea somebody was looking over my shoulder… We all have a right to privacy, Nobody should have found this all out.

You haven’t searched for anything you wouldn’t want people to know about recently, have you?

Salim Ismail interview coming up

I will be interviewing Salim Ismail, chairman & co-founder of PubSub in the next couple of days. Pubsub is a blog search engine or as Salim likes to say a “matching engine”.

I was amazed to learn, from talking to Salim here at the les Blogs 2.0 conference, that Salim lived and worked in Cork for around a year and Salim is another fan of Murphy’s stout!

If you have any questions you’d like me to ask Salim - please leave them in the comments

Microsoft follow Google into Book Search

I had a much longer post prepared about this but I lost it when I had a server crash (due to my playing around with my .htaccess file!).

Anyway, according to the BBC, Microsoft are following Google’s lead into the Book Search arena.

MSNBC’s report states that Microsoft are teaming up with Yahoo! and the Open Content Alliance and they hope to:

sidestep hot-button copyright issues for now by initially focusing mainly on books, academic materials and other publications that are in the public domain… to let users search about 150,000 pieces of published material. A test version of the product is promised for next year.

Google’s Print project, on the other hand, promises to index millions of books and to remove from the index any books whose author requests they do so.

In terms of usefulness, a search index of millions of books will be orders of magnitude better than one with a mere 150,000 books - now if only Google can overcome the silly legal objections.

Video search via rss

Yahoo! have posted a nice instruction set on their blog, detailing how to subscribe your copy of iTunes to video searches of interest so that you are constantly fed relevant updated video casts!

Basically, to do it you simply use the RSS url generator to generate the RSS feed for your video search, add that to iTunes and watch the videos as they arrive in iTunes (or if you have one of the new Video iPods, you can watch them on that!).

I will be talking about other uses for RSS tomorrow evening at the IT@Cork RSS event - hope to see you there!




Tom Raftery’s Social Media is Digg proof thanks to caching by WP Super Cache!