Text
Making content relevant to me: SXSW
Panelists:
- Liz Gannes
- Douglas Merrill
- David Maher
- Christopher Dixon
- Geoffrey Roberts
#contentme is the hashtag for this session on twitter.
- Personalization has been around for 15 years, why are websites so impersonal? Answer: it’s hard
- Article here describes the end of anonymity on the net. How companies can pinpoint individuals based on data exhaust. http://33bits.org/2009/03/19/de-anonymizing-social-networks/
-
Operators of online social networks are increasingly
sharing potentially sensitive information about users and
their relationships with advertisers, application developers,
and data-mining researchers. Privacy is typically protected
by anonymization, i.e., removing names, addresses, etc.
We present a framework for analyzing privacy and
anonymity in social networks and develop a new
re-identification algorithm targeting anonymized socialnetwork
graphs. To demonstrate its effectiveness on realworld
networks, we show that a third of the users who
can be verified to have accounts on both Twitter, a popular
microblogging service, and Flickr, an online photo-sharing
site, can be re-identified in the anonymous Twitter graph
with only a 12% error rate.
Our de-anonymization algorithm is based purely on the
network topology, does not require creation of a large
number of dummy “sybil” nodes, is robust to noise and all
existing defenses, and works even when the overlap between
the target network and the adversary’s auxiliary information
is small.
- Google uses over 200 signals to rank ads and searches. (your data exhaust)
- hunch - makes recommendations to you, based on your personal context and profile.
- the filter - recommendation and relevance engine, b2b engine, for digital content (sounds like music/video mostly)
- hunch guy - non-obvious recommendations: what car you’d like to buy, what vacation you should go on, whether you should have another kid. decision support for over 5000 topics. users answer 140 questions about themselves. some correlation: if you like to dance, you should switch to a mac
- the filter - crop prices and the weather, recommendations to help them manage their life. Gigfinder - recommends gigs based on your music preferences and your location.
- the filter looks for connections between the content, rather than connections based on titles. And they feel they can prove that based on consumption. 40% increase in time spent on site based on their better correlation engines.
- Roberts of the filter says ‘we make sure our recommender now only remembers, but forgets as well. Inputs should have decay.’
- Netflix was going to use more demographic data sometimes gives you better relevance on recommendations, but they backed down based on lawsuits.
- Douglas: De-anonymization of data is a terrifying thing. Had problem with google search data. It’s not that hard to work back from generic set of signals back to an individual person.
- Important not to be paralyzed by fear. Removing the name fromt he database is different than providing aggregates. Sharing correlations, but not tying those to particular usage sets from a person is different and not dangerous in the same league as just anonymized data sets.
- Interoperable recommendation sets coming?
- Location is very important as a key variable for interoperability.
- people can have taste profiles, and they can follow others based on their similar tasets, and that’s all volitional.
- hard to scale the data across more domains. Portability is very difficult for generic relevance engines.
- Analysis of data that has been created by the corporation is seen as corporate owned, but it would be great for people to share in order to create a cross-domain content recommendation engine.
- It’s worth exploring changing the UI based on consumption patterns of the content. Not completely thought through yet. - adaptive UI changes based on who you are and how you use.
- Somebody help make relevance out of all the content we’re seeing…
- something that helps me spawns some corporate application (insurance company) so users need to be able to keep my data anonymized
- location based advertising old world example: billboards or street teams.
- Time:aging; if my preferences are curated by my usage several years ago - how useful is that? I pass through life stages that you can’t predict.
- No recommendation engine has been around long enough.
- Let the filter ‘forget’ over time. Necessary part of the engine over time. Weight now/recent vs. older data exhaust.
- Better click-through on female focused sites/content than male focused. “women are browsers, men are searchers.”
- It’s very computationally hard to provide a truly relevant website. Therefore most sites don’t do or don’t do well.
- Can make good inference from a single data set like a few movie photos about some personal items like ‘gender’.
- Join with another data set, and you can predict actual user identity, place on map with a photo.
- Hard to scale expert knowledge; case in point zimbabwe music is rarely recommended set. not enough experts to scale their recommendation.
- If you have ten million tracks, so 80/20 rule will apply. 20% will do 80% of sales. So long tail may never appeal. There must be other ways to pull out content that is less expected from the long tail that you might like.