Andreas Weigend @ SG

Hello, Time Magazine’s Person of the Year!

Andreas, Meow, Time Person of the Year

We were at NTUC 7th floor, in a meeting organized by TDM, thanks to Su Yuen‘s heads up. Too bad we didn’t see each other there as she has to watch TV in M’sia.

Andreas S. Weigend is one of the most outspoken gurus in the field of data mining on the Internet. As a technology strategist, we were a little disappointed that he wasn’t able to give any tips on the uniqueness of Singapore, at least not yet, maybe after his work with SIA. People kept thinking I know him (I do, he just don’t know me, “yet” too) as he was a Chief Scientist at Amazon.com when I was there.

But as usual, his upbeat and high momentum talk kept all off us catching our breaths, some furiously taking down notes on the next big thing. Actually, this is my notes, hope you can share some of yours with me if you were there too.

Chern Jie went too – I think he’ll make the next big thing here πŸ˜‰

Chern Jie and I

Ok, the rest of the post is going to turn really geeky. If you like pictures, look at TDM’s flickr. Or maybe you’re more a worldly person, perhaps follow Stop Genocide Now at Darfur.

.

The whole talk is anchored on the idea of turning from just e-business (Web 1.0, Amazon.com) to me-business (Web 2.0, del.icio.us and others). What I got out of it was the concept of turning from the static web, to an intention seeking web, and to the attention seeking web (but many sites claims otherwise). Intention is like, Google gives you this wonderful and free search engine, in exchange for knowing your intention; Amazon tracks your behavior on the site such that they know that by clicking on certain things, your intention eventually translate to so and so conversion (i.e. sales). Working backwards, we can then infer your future behavior based on your past actions on the website. That’s really the intention based business. People pay big bucks to learn about your intention, for your participation on the internet.The attention web however, is more than that. You can analogize it with a newly formalized currency on the internet that arises when you pay attention to something on the internet (instead of just blindly providing it information). The example given was having the user choose between paying 99 cents for downloading an MP3 on, say, iTunes, or watch a 90 second advertisement on some other ad driven website like SpiralFrog.

Let’s see, what else is moving?

Structured Participation

We’re moving from Britannica to Wikipedia. User generated content trumps expert data when there is cohesion, and the cohesion comes from the algorithm and the social element presented on the website. Of particular interest to someone like Andreas is the metadata created. Del.icio.us and Flickr for example, allows users to tag content, the tag naturally becomes a more powerful metadata than what the owner used to boast on their on tag. In fact, some claim that it’s more powerful than the referral hyperlinks (everybody remember how a Google Bomb works?) which is the basis of how Google’s Page Rank algorithm works. Ming Yeow quickly pointed out that about his search for logos. Compare these:

Google search for “logos” and Del.icio.us search for “logos”

It’s true that both are “people tell you which are the sites about logos”, but the google one is marred with commercial entities with sophisticated SEO techniques, and no one really know if they are good or not. Sometimes people link due to affiliate rewards. Del.icio.us however, is their personal bookmarks! You don’t lie on your personal bookmarks on which site is the one you go to to get the logos!

Google Base is one of the examples of big boys moving in the same direction (collecting attributes from users instead of relying on its own meta-data crunching.

Here’s another cool one. We’re moving from algorithmic search ala Google to social search ala Eurekster and Illumio. While still in its infancy, it demonstrated the concept of “finding the right expert” very well. Traditionally, if you want to ask a question to an expert (Google is not an expert, the results it presents are “average” responds and requires parsing), you turn to free/pay service like Yahoo! Answers or Google Answers (retired). Very much the same way we grew up reading 十万δΈͺδΈΊδ»€δΉˆ (100,000 Why’s – an encyclopedia for teenager). Illumio takes this one step forward by indexing your entire hard drive (using Google Desktop too!) and form an idea of what you’re an expert in. So if your hard drive contains stuff pertaining to a particular niche, when people ask questions about it, you’ll be matched, and a pop-up will ask whether you want to answer / and thus start socializing with this other person. Let’s see how far this can go!

Ok it’s getting a bit long winded, let’s move on quickly to the commerce space.I think both of us, coming from Amazon.com, still constantly have this idea that Amazon.com itself was and still is a great idea. Lots of Andreas’s presentation are still about Amazon.com even though he has moved on to consult in so many different teams.

When I first came to Amazon, the HR and Jeff himself did a good job in communicating the visions and missions of the company:

  1. To be the place where people come to Find, Discover and Buy everything in the world online.
  2. To be the world’s most customer-centric company.

These are predicated with a bunch of other taglines like, To be the shopper’s first resort (so many people just go there now to check out prices and recommendations), To be the Killer web platform for all Killer web applications (notable in the huge investment in infrastructure and web services – shareholders are not happy hehe) etc.

Amongst the more important innovations that always appear to be talked about, and are mentioned during the talk:

A/B test

The most innocuous but representative of a generation of data mining history. To launch a new feature, half the customers (A) becomes the control group and get status quo, while the other half (B) gets the new feature. Decide on what you want to measure (Andreas stressed very hardly on this – if you don’t know what you want to measure, you’re wasting your time), in this example say click-through or conversation rate, and then run the test for a week or two. If B yields better results, then officially announce that the new feature is good and launch it for everybody. Otherwise, axe the project and go back to the drawing board. Here are other more indepth explanations including multi-variable ones, as well as products.
This is done for like everything we launch on the websites. Compare these:

Amazon checkout on the left vs. Amazon checkout on the right

Which one will drive more sales? Answer – who knows…
In fact if you know the right questions to ask, you would be asking even more:

  • How long does it take for the users to find the checkout button (a sign that they will continue to checkout).
  • How much of the upselling (i.e. sell you more things that are related based on your purchase history as well as the purchasing pattern of everyone else who bought the product you just added to your shopping cart) actually takes place?
  • How useful are the “warnings”, for example the super saver shipping above, as well as the VISA card offer.

Therefore, many teams will setup themselves to collect these data, and just do what 42 million customers (dunno what is the figure now, that’s 2004) tell you everyday. According to Andreas, checkout button on the right side increases sales by 1%. Depending on the situation, sometimes it’s important to look at even 1%. For bigger sites like Amazon.com where customer behavior has already matured, 1% means quite a lot. Can you imagine in 1998 when Single Detail Page (SDP, meaning one page with all the information about a single product) launched, it drove sales up 20% over a matter of weeks?

Recommendations

As portrayed by Andreas, recommendations have evolved from Expert recommendations (which he called Web 0.0) to Statistical recommendations (Web 1.0, Amazon.com) to Individual recommendations (Web 2.0, thisnext.com).

Amazon’s famous recommendation is, by now, straight forward data mining, finding the cosine angle in a huge sparse utility matrix. Don’t fret – it’s a university course level thing (see Stanford’s lecture notes), and if you would allow me to essentialize it, it’s simply finding all the pairs of purchases in the history and recommend the pair. If you are looking at buying A, Amazon will recommend that you buy B because some 50,000 people who bought A also bought B. Nevertheless, Netflix tried the same thing and sort of flopped and looking for help

ThisNext however, goes beyond that statistics, and let people do their social networking style recommendations. What’s Next? ThisNext! Keke.. it’s too catchy for me πŸ˜‰ Having invented the word shopcasting, ThisNext relies on personal recommendation instead of the aggregated, cold blooded statistical recommendation. Once you get hooked, you might find some people’s recommendation to be very close to yours and thus forming a shopping club of sorts.

The lesson is simple – don’t discount the power of one person. When one person dies one’s name will be all over the news, when a few thousand die, they will be reported as statistics. One of the most powerful but underexploited part of the Internet that has become apparent this year is social networking, and how it just manifests itself everywhere. Any site missing that element simply dies a natural death coz when people contribute content, they want others to see it, especially people they know.

In the past 20 years, as the barrier of entry to the Internet drops (I the geek started “blogging” in 1998 when the word blog wasn’t even invented yet), more and more Ali, Ah Kou and Muthu will be able to participate. As portrayed by Andreas:

  • 1986The Target Web
    • Broadcast emails and newsletters
    • Push advertising that’s supply driven
    • People pay for space online
  • 1996The Search Web
    • Read-only Web for Consumer and User that increased the barrier of exit
    • Static, Taxonomy heavy i.e. controlled, thus requiring lots of pretest and validation
    • Discovery of the web based on hyperlinks, giving the author full control, and thus these special authors pay for presentment
  • 2006The Discover Web
    • Read-write Web for Producers that lowered the barrier of entry
    • Temporal
      • This is an important point to linger for a while.
      • In the past 10 years, it’s more important to make it such that the user stay on the your site for a long time. This will increase the chances of them doing something that will benefit you.
      • In the present day web (the live web at least) it’s more important to make it easy for users to jump around and exit. There’s no need to force them to stay as long as they come back for more later. Everytime they come back, they come back to give you more data, get more information etc.
      • [my own interpretation]
        • You can think of this as a spouse vs. a friend.
        • In the 90s, every site you visit wants to be your husband or wife. They will entice you with 10,000 things to propose to you. They want you to put everything you have with them (e.g. you buy everything from Amazon.com) But really, you don’t get much out of one person.
        • The 21 century web is more “promiscuous”: you can have many friends, and like your normal friends, you don’t hang out with everyone of them all the time. You can spread out your attention to many different parties, but the most popular website will be the one with the most friends. In return, you earn less from each person, but if everyone on the street buys you dinner, you don’t have to worry about food for the rest of your life.
    • Tagsonomy (root word Tags) heavy i.e. uncontrolled, user submit tags with no restrictions and forms the metadata (see above), thus requiring Launch and Learn (you don’t know what your site will turn out to become until you launch it and let the masses do their thing)
    • Discovery based on social relations (trust, reputation) etc. RSS is mentioned.
    • Pull discovery: demand driven, therefore people get paid for clicks and action.
  • 2016 – ??? (answer me correctly now and get a million dollars VC funds from me right away)

Behaviorial Analysis

It should be no surprise that most companies still don’t see this as a crucial part of their business, as data mining is still treated as a kind of reporting, a cost centre. Andreas suggest that we rename it to Behavorial Analysis and predictive modelling, immeidiately changing this boring task to become the life line of the company, a profit centre (I don’t really see how here, ah well…). It still requires huge amount of infrastructure setup in order to effectively do it, although arguably, smaller companies can utilize the goodwill on the web like urchin a.k.a. Google Analytics or the myriad other tools to help mine some data.

One interesting thing about data provided from the masses is how accurate it is. People are weird – they always tell the truth!! Take the recently axed (can someone tell me why?) Amazon’s A9 Yellow Pages. After driving all over the US taking pictures along the streets, they still have a big problem of selecting the best image to represent a particular shop. Therefore, they harnessed the user’s power again, by presenting a series of images along the street where the shop’s address is and let the users choose the best image. That image gradually became the main picture. Voila!

When Amazon web services was launch sometime ago, people built really wonderful stuff with the data they got their hands on. The one Andreas mentioned was Valdis Kerbs’ political polarization of books in the market (I think he used more than just Amazon’s data, although the original one I saw from someone else has exactly the same book in the middle).

Data got ummph!

Storage

This brings us to one of those ideas that PAP would love: Everything can and will become data. Norwich Union started pay-as-you-go car insurance, using GPS to track your car and base your insurance premium off your own driving habits instead of others (tell me about it.. why is it that Honda Civic car owners in Singapore who are 26 and below are So Risky to warrant an insurance premium more than double that of a Toyota Altis driver of the same age) This is only possible with data from your own driving. So, soon, you’ll have data on movement, brain activity (fMRI), identity of a person (DNA), and more and more data to play with.

We didn’t talk much about RFID. But we did talk about how storage is now 100x cheaper than 10 years ago. Implicit data collection is going to explode.

More Social Networking

There was also mention of OpenBC (renamed Xing since…) which is a German competitor to LinkedIn. Generally, it’s agreed that Xing does a better job in social networking (pictures, forums) than LinkedIn, which lacks in that aspect. It’s arguable which is better, since in the physical world, business networking tends to be prim and proper, with execs wanting a simpler interface and lots of endorsement. One large thing about OpenBC is its initiatives with the physical world with offline meetings, also, OpenBC allows you to see who’ve recently “checked you out”, great to know that isn’t it?

Social networking needs the sense of touch too πŸ™‚

We also talked about Jambo Networks, which is using the mobile network to create mobile social network. Using a PDA in a conference, you can find nearby friends, remember strangers name etc. and since you prepublish your profiles (from linkedin, spaces, facebook, whatever) people can find matching peronalities and make friends etc. According to the video on the front page, the coverage is like 8 blocks (1 block is like 100 meters) but it all depends on the networking setup. So you can even use it to find people on the plane as it can setup its own adhoc network.

With the mention of Makansutra, Andreas brought Socialight to our attention. It provides geo-tagged sticky notes, which the audience concluded that it is good for recommending food stalls in hawker centres. To me it’s more than that, having to hunt for some location aware service for patients in hospitals (so far everything we saw, e.g. ekahau, are good for asset tracking) This is a good example of a more or less global technology (networking with the world) and increasingly bringing it local (networking with only those around you) as certain things matter more locally (what you need to eat for your next meal).

Other cool stuff

Andreas also mentioned very briefly on other companies who are doing great things. Unfortunately, some of them didn’t stick (I took minimal notes on a Christmas greeting card) so I guess I’ll just enumerate them here, in no particular order.

Attensa – Enterprise RSS feed engine.

CleverSet – Recommendation engine.

Woot – The grand-daddy of one-day-one-item website. Yes, the whole site only sells one item a day.

Zazzle – On-demand, customizable products, like your T-shirt la, mug la etc. The total opposite to woot I guess.

Tchibo – Actually just a coffee chain, but decided to create a brand new experience for their store every week by selling non coffee item, tying in online campaigns as well, and manage to drive people back to the store to get another cup of coffee to check out what’s new.

Nugg.ad – Focused advertising (99% of the ads online are not clicked on, nugg.ad said) – self-learning ad-campaigns anyone?

PowerSetNLP web search. Coming soon.

CircleUp – Took me a while to get this, but it’s a way to ask one question to many many people at one time and get back the answers in a coherent fashion. I wouldn’t call it a survey tool (isn’t that what surveys are all about), but maybe an interesting new way of organizing these answers and avoiding the 10,001 emails and IM response.

Baidu – China’s largest search engine. My MP3 engine ahaha.

Zagat – Consumer survey based information about dining, travel and leisure. They also have geo-location queries (whatis nice here). Makansutra needs to buck-up.

Zara – Fashion. I forgot, sigh…

Ending

Going to read Chris’s book The Long Tail. Andreas also recommended a book by David Holtzman called Privacy Lost which I’ll read next.

Andreas is known for his catch phrase: I search, therefore I am (echoing the great/famous philosopher Descartes I think, therefore I amCogito Ergo Sum) He mentioned yet another in his deck: You are what you tag, and more importantly, You are what you are tagged as / who you are tagged by. There, don’t linger in the past, just move on.

Tail End

Anatomy of the Long Tail (from WIRED)

References:

http://thedigitalmovement.wikispaces.com/Andreas+Weigend+-+TDM%21 (the discussion after the talk)
http://www.weigend.com/WeigendFall2006.pdf (the slides for the talk ++)

Print Friendly, PDF & Email

7 Responses

  1. Jiin Joo!!! I didn’t go not because I was WATCHING TV in Malaysia la!! There was a big flood in Johor and I had to fly to Hong Kong 2 days after that! πŸ˜› but seriously, thanks for attending! ^___^ and also thanks a lto for writing such a comprehensive report on it!! At least I get to learn from your blog post even though I did not attend it hahaha. Anyway! Hope to see ya in Singapore some time soon!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top