Searching Disconnected

Marco D'souza | 01 July 2006

In a day and age where everyone is trumpeting the virtues of broadband Internet and “always-on” connectivity, it’s interesting to come across a company that goes against the flow and creates a product based upon being disconnected from the Internet. Even more interesting is when it is headed by a veteran from the dotcom era: Rakesh Mathur, who graduated from IIT Mumbai and has been responsible for starting several technology companies back in the 90s, all of them being sold out to large players who continue to use their technologies in products today. When Rakesh was in Mumbai to promote his newest venture—Webaroo—I had the opportunity to meet with him and find out why he finds the disconnected world so fascinating and profitable.

CHIP: Tell us a bit about the companies you’ve started and been associated with
RM: I’ve been an entrepreneur since ’94 and Webaroo is my fourth company. The first was a company called Armedia, which was the first to have chip designs for video processing. This eventually became part of the standard that we now know as the MPEG-2 video format used in DVD movies. This company was acquired by Broadcom in 1999. The next was a company called Junglee which was a virtual database that enabled comparison shopping on the Internet. It was eventually sold to Amazon in 1998. We then stared a company in late 1999 called PurpleYogi (now called Stratify) that provides enterprise solutions in the management of unstructured data. Webaroo was started about two years ago and I actually started writing software for it about a year-and-a-half ago. So this is both my hobby and passion!

CHIP: Could you step us through Webaroo and what it’s all about?
RM: Webaroo is primarily an offline search tool. This idea was spawned due to a few trends that have showed up over the years. There are about 2 billion mobile users around the world with about 800 million cell phones shipped every year. How many people do you think search the Internet? The world over this number stands at about a billion. Whittling it further down, the number of people who search the Internet from a mobile device is about one million. This is an extremely small number.

Now as a case in point, even now with my Blackberry device, I find it difficult to search the Internet and receive relevant information. Data connectivity from a mobile device is itself an issue these days. And this is exactly why people don’t surf the Internet with mobile devices— global carriers are still not adept at dealing with data. This is no different compared to the experience I had five years ago. Even in a city like Seattle, I could not effectively use data-based services when I needed to. A couple of years ago, the city of Philadelphia announced that they were going to make Wi-Fi freely available: it is still a paid service today. The same story is with city of Sacremento. The basic problem today is that there is a severe underestimation of what it takes to provide a high quality of service in mobile data transmission. Forget data, even voice services are nowhere near perfect—you drive around Silicon Valley and mobile phone calls often drop.

So I figured that I should start a company that would enable searches on mobile devices. When we looked at the problem of searching, we realized that there wasn’t going to be a solution from the carriers anytime too soon.

This also tied into the growing trend of storage and memory doubling in capacity every year. Today, a 4 GB flash card costs about $80 and a 2 GB card about $20. Ten years ago, the capacity was 1 MB and 10 years before that it was a kilobyte. Ten years from now and it’ll be a terabyte.

So we began by asking ourselves the question: how small can the Internet be without losing its information character? To get an idea of what we were up against, the total number of web pages out there stands at about 20 billion. At 50 kilobytes a page, that’s about a million gigabytes. Over the past year-and-a-half, we figured out a way to take those million gigabytes and compress them to about 40 gigabytes, and still be able to satisfy a search.

Basically, search engines throw up results in millions while people consume them in their tens. All this led us to investigate methods where we could use intelligent algorithms to select the most relevant pages that would satisfy high quality search results on almost anything. And that’s what we’ve been able to do through Webaroo.

To begin with, we’ve created modules called “Web Packs” that could include topics such as Word Soccer or News, or Wikipedia. Our version of Wikipedia would fit into a single DVD, for example. This is a pretty compelling concept. Our objective with this is to ensure that users never have the experience of not being able to receive the information they need. These being the initial stages, we’re very delighted with the number of users who already have Webaroo without any of our bundling partnerships.

CHIP:
How do you manage to fit so much of information into such a small space?
RM: Our compression algorithms work on the principle of relevance, where they intelligently decide which pages are more important and saves only those pages, discarding the lesser relevant pages. Therefore in a statistical sense, they look through about 25,000 pages and pick just one. And it retains all text and images in the pages as well. By throwing away irrelevant pages at that level, we still have contextual information that is enough to satisfy almost any search. We’ve found that this works 99.99 percent of the time, though not all the time. Anyway, this is better than receiving a “Page not found” error!

CHIP: How do you manage to ensure that the information on Webaroo is relevant and updated?
RM: If you don’t have a connection to the Internet, not only are you unable to get updated information, you will not receive any kind of information. With Webaroo, you can get a large part of the information you need. With an Internet connection, you can also update the Web Packs which can be set up to refresh themselves automatically. And all of this is free—we do not charge anything for the Webaroo Packs or for the updates.

CHIP: How many Webaroo Web Packs are there at present?
RM: Webaroo is right now a two-month-old baby and there are about 30 packs available for download. This spans everything from news to cities to interests and hobbies. Work is in progress to support a set of user interests, at a granular level. Right now, I cannot comment on whether these packs can be user-created or not.

CHIP: Doesn’t Webaroo infringe on any information copyrights by displaying pages from other companies
RM: No. What Webaroo does is cache pages. There are hundreds of other applications that cache pages. Also, by retaining the advertising and branding on the pages from these sites, we are not infringing on any of their intellectual property rights. Before we started the company, we had this looked at from a legal standpoint and we are happy to state that we are completely compliant with the Digital Millennium Act.

CHIP: So what are the devices we’ll be seeing Webaroo on?
RM: You’ll see Webaroo pre-installed on Acer notebooks in about six months, for example. But we are looking at bundling our packs on other devices, web sites, media, etc. There are several other deals we are working on at the moment and they’ll be announced in time. With the deals we have on the table, we’re looking at devices of the order of about 10 million units a year that will be carrying our software. And all this for a two-month old baby!



Add your comments
You are not signed in.

You need to be signed in to post your comment and participate in all the interactive sections of Chip.in, such as quizzes, contests, member comments etc.
Sign in now, or click here to register.

Username
Password