This past weekend I attended the O’Reilly Social Graph FOO Camp and got to meet a bunch of folks who I’ve only “known” via their blogs or news stories about them. My favorite moment was talking to Mark Zuckerberg about stuff I think is wrong with Facebook and he stops for a second while I’m telling hin the story of naked pictures in my Facebook news feed then says “Dare? I read your blog”. Besides that my favorite part of the experience was learning new things from folks with different perspectives and technical backgrounds from me. Whether it was hearing different perspectives on the social graph problem from folks like Joseph Smarr and Blaine Cook, getting schooled on the various real-world issues around using OpenID/OAuth in practice from John Panzer and Eran Hammer-Lahav or getting to ask getting to Q&A Brad Fitzpatrick about the Google Social Graph API, it was a great learning experience all around.
There have been some ideas tumbling around in my head all week and I wanted to wait a few days before blogging to make sure I’d let the ideas fully marinate. Below are a few of the more important ideas I took away from the conference.
One of the most startling realizations I made during the conference is a lot of my assumptions about why developers of social applications are interested in what has been mistakenly called “social graph portability” were incorrect. I had assumed a lot of social networking sites that utilize the password anti-pattern to screen scrape a user’s Hotmail/Y! Mail/Gmail/Facebook address book were doing that as a way to get a list of the user’s friends to spam invite to join the service. However a lot of the folks I met at the SG FOO Camp made me realize how much of a bad idea this would be if they actually did that. Sending out a lot of spam would lead to negativity being associated with their service and brand (Plaxo is still dealing with a lot of the bad karma they generated from their spammy days).
Instead the way social applications often use the contacts from a person’s email address book is to satisfy the scenario in Brad Fitzpatrick’s blog post URLs are People, Too where he wrote
So you've just built a totally sweet new social app and you can't wait for people to start using it, but there's a problem: when people join they don't have any friends on your site. They're lonely, and the experience isn't good because they can't use the app with people they know.
I then thought of my first time using Twitter and Facebook, and how I didn’t consider them of much use until I started interacting with people I already knew that used those services. More than once someone has told me, “I didn’t really get why people like Facebook until I got over a dozen friends on the site”.
So the issue isn’t really about “portability”. After all, my “social graph” of Hotmail or Gmail contacts isn’t very useful on Twitter if none of my friends use the service. Instead it is about “discovery”.
Why is this distinction important? Let’s go back to the complaint that Facebook doesn’t expose email addresses in it’s API. The site actually hides all contact information from their API which is understandable. However since email addresses are also the only global identifiers we can rely on for uniquely identifying users on the Web, they are useful as way of being able to figure out if Carnage4Life on Twitter is actually Dare Obasanjo on Facebook since you can just check if they are backed by the same email address.
I talked to both John Panzer and Brad Fitzpatrick about how we could bridge this gap and Brad pointed out something really obvious which he takes advantage of in the Google Social Graph API. We can just share email addresses using foaf:mbox_sha1sum (i.e. cryptographical one-way hashes of email addresses). That way we all have a shared globally unique identifier for a user but services don’t have to reveal their user’s email addresses.
I wonder how we can convince the folks working on the Facebook platform to consider adding this as one of the properties returned by Users.getInfo?
In a post entitled A proposal: email to URL mapping Brad Fitzpatrick wrote
People have different identifiers, of different security, that they give out depending on how much they trust you. Examples might include:
- Homepage URL (very public)
- Email address (little bit more secret)
- Mobile phone number (perhaps pretty secretive)
When I think back to Robert Scoble getting kicked off of Facebook for screen scraping his friends’s email addresses and dates of birth into Plaxo, I wonder how many of his Facebook friends are comfortable with their personal contact information including email addresses, cell phone numbers and home addresses being utilized by Robert in this manner. A lot of people argued at SG FOO Camp that “If you’ve already agreed to share your contact info with me, why should you care whether I write it down on paper or download it into some social networking site?”.
That’s an interesting question.
I realized that one of my answers is that I actually don’t even want to share this info with the majority of the people in my Facebook friends list in the first place [as Brad points out]. The problem is that Facebook makes this a somewhat binary decision. Either I’m your “friend” and you get all my private personal details or I’ve faceslammed you by ignoring your friend request or only giving you access to my Limited Profile. I once tried to friend Andrew ‘Boz’ Bosworth (a former Microsoft employee who works at Facebook) and he told me he doesn’t accept friend requests from people he didn’t know personally so he ignored the friend request. I thought it was fucking rude even though objectively I realize it makes sense since it would mean I could view all his personal wall posts as well as his contact info. Funny enough, I always thought that it was a flaw in the site’s design that we had to have such an awkward social interaction.
I think the underlying problem again points to Facebook’s poor handling of multiple social contexts. In the real world, I separate my interactions with co-workers from that with my close friends or my family. For an application that wants to be the operating system underlying my social interactions, Facebook doesn’t do a good job of handling this fundamental reality of adult life.
Now playing: D12 - Revelation
These are my notes from the talk Using MapReduce on Large Geographic Datasets by Barry Brummit.
Most of this talk was a repetition of the material in the previous
talk by Jeff Dean including reusing a lot of the same slides. My notes primarily
contain material I felt was unique to this talk.
A common pattern across a lot of Google services is creating a lot of index files that point and loading them into memory to male lookups fast. This is also done by the Google Maps team which has to handle massive amounts of data (e.g. there are over a hundred million roads in North America).
Below are examples of the kinds of problems the Google Maps has used MapReduce to solve.
| List of roads and intersections | Create pairs of connected points such as {road, intersection} or {road, road} pairs | Sort by key | Get list of pairs with the same key | A list of all the points that connect to a particular road |
| Geographic Feature List | Emit each feature on a set of overlapping lat/long rectangles | Sort by Key | Emit tile using data for all enclosed features | Rendered tiles |
| Graph describing node network with all gas stations marked | Search five mile radius of each gas station and mark distance to each node | Sort by key | For each node, emit path and gas station with the shortest distance | Graph marked with nearest gas station to each node |
These are my notes from the talk Lessons in Building Scalable Systems by Reza Behforooz.
The Google Talk team have produced multiple versions of their application. There is
The team has had to deal with a significant set of challenges since the service launched including
Support displaying online presence and sending messages for millions of users. Peak traffic is in hundreds of thousands of queries per second with a daily average of billions of messages handled by the system.
routing and application logic has to be applied to each message according to the preferences of each user while keeping latency under 100ms.
handling surge of traffic from integration with Orkut and GMail.
ensuring in-order delivery of messages
needing an extensibile architecture which could support a variety of clients
The most important lesson the Google Talk team learned is that you have to measure the right things. Questions like "how many active users do you have" and "how many IM messages does the system carry a day" may be good for evaluating marketshare but are not good questions from an engineering perspective if one is trying to get insight into how the system is performing.
Specifically, the biggest strain on the system actually turns out to be displaying presence information. The formula for determining how many presence notifications they send out is
total_number_of_connected_users * avg_buddy_list_size * avg_number_of_state_changes
Sometimes there are drastic jumps in these numbers. For example, integrating with Orkut increased the average buddy list size since people usually have more friends in a social networking service than they have IM buddies.
Other lessons learned were
Slowly Ramp Up High Traffic Partners: To see what real world usage patterns would look like when Google Talk was integrated with Orkut and GMail, both services added code to fetch online presence from the Google Talk servers to their pages that displayed a user's contacts without adding any UI integration. This way the feature could be tested under real load without users being aware that there were any problems if there were capacity problems. In addition, the feature was rolled out to small groups of users at first (around 1%).
Dynamic Repartitioning: In general, it is a good idea to divide user data across various servers (aka partitioning or sharding) to reduce bottlenecks and spread out the load. However, the infrastructure should support redistributing these partitions/shards without having to cause any downtime.
Add Abstractions that Hide System Complexity: Partner services such as Orkut and GMail don't know which data centers contain the Google Talk servers, how many servers are in the Google Talk cluster and are oblivious of when or how load balancing, repartitioning or failover occurs in the Google Talk service.
Understand Semantics of Low Level Libraries: Sometimes low level details can stick it to you. The Google Talk developers found out that using epoll worked better than the poll/select loop because they have lots of open TCP conections but only a relatively small number of them are active at any time.
Protect Against Operational Problems: Review logs and endeavor to smooth out spikes in activity graphs. Limit cascading problems by having logic to back off from using busy or sick servers.
Any Scalable System is a Distributed System: Apply the lessons from the fallacies of distributed computing. Add fault tolerance to all your components. Add profiling to live services and follow transactions as they flow through the system (preferably in a non-intrusive manner). Collect metrics from services for monitoring both for real time diagnosis and offline generation of reports.
Compatibility is very important, so making sure deployed binaries are backwards and forward compatible is always a good idea. Giving developers access to live servers (ideally public beta servers not main production servers) will encourage them to test and try out ideas quickly. It also gives them a sense of empowerement. Developers end up making their systems easier to deploy, configure, monitor, debug and maintain when they have a better idea of the end to end process.
Building an experimentation platform which allows you to empirically test the results of various changes to the service is also recommended.
These are my notes from the talk Scaling Google for Every User by Marissa Mayer.
Google search has lots of different users who vary in age, sex, location, education, expertise and a lot of other factors. After lots of research, it seems the only factor that really influences how different users view search relevance is their location.
One thing that does distinguish users is the difference between a novice search user and an expert user of search. Novice users typically type queries in natural language while expert users use keyword searches.
Example Novice and Expert Search User Queries
NOVICE QUERY: Why doesn't anyone carry an umbrella in Seattle?
EXPERT QUERY: weather seattle washington
NOVICE QUERY: can I hike in the seattle area?
EXPERT QUERY: hike seattle area
On average, it takes a new Google user 1 month to go from typing novice queries to being a search expert. This means that there is little payoff in optimizing the site to help novices since they become search experts in such a short time frame.
Design PhilosophyIn general, when it comes to the PC user experience, the more features available the better the user experience. However when it comes to handheld devices the graph is a bell curve and there reaches a point where adding extra features makes the user experience worse. At Google, they believe their experience is more like the latter and tend to hide features on the main page and only show them when necessary (e.g. after the user has performed a search). This is in contrast to the portal strategy from the 1990s when sites would list their entire product line on the front page.
When tasked with taking over the user interface for Google search, Marissa Mayer fell back on her AI background and focused on applying mathematical reasoning to the problem. Like Amazon, they decided to use split A/B testing to test different changes they planned to make to the user interface to see which got the best reaction from their users. One example of the kind of experiments they've run is when the founders asked whether they should switch from displaying 10 search results by default because Yahoo! was displaying 20 results. They'd only picked 10 results arbitrarily because that's what Alta Vista did. They had some focus groups and the majority of users said they'd like to see more than 10 results per page. So they ran an experiment with 20, 25 and 30 results and were surprised at the outcome. After 6 weeks, 25% of the people who were getting 30 results used Google search less while 20% of the people getting 20 results used the site less. The initial suspicion was that people weren't having to click the "next" button as much because they were getting more results but further investigation showed that people rarely click that link anyway. Then the Google researchers realized that while it took 0.4 seconds on average to render 10 results it took 0.9 seconds on average to render 25 results. This seemingly imperciptible lag was still enough to sour the experience of users enough that they'd reduce their usage of the service.
There are a number of factors that determine whether a user will find a set of search results to be relevant which include the query, the actual user's individual tastes, the task at hand and the user's locale. Locale is especially important because a query such as "GM" is likely be a search for General Motors but a query such as "GM foods" is most likely seeking information about genetically modified foods. Given a large enough corpus of data, statistical inference can seem almost like artificial intelligence. Another example is that a search like b&b ab looks for bed and breakfasts in Alberta while ramstein ab locates the Ramstein Airforce Base. This is because in general b&b typically means bed and breakfast so a search like "b&b ab" it is assumed that the term after "b&b" is a place name based on statistical inference over millions of such queries.
At Google they want to get even better at knowing what you mean instead of just looking at what you say. Here are some examples of user queries which Google will transform to other queries based on statistical inference [in future versions of the search engine]
| unchanged lyrics van halen | lyrics to unchained by van halen |
| how much does it cost for an exhaust system | cost exhaust system |
| overhead view of bellagio pool | bellagio pool pictures |
| distance from zurich switzerland to lake como italy | train milan italy zurich switzerland |
Performing query inference in this manner is a very large scale, ill-defined problem. Other efforts Google is pursuing is cross language information retrieval. Specifically, if I perform a query in one language it will be translated to a foreign language and the results would then be translated to my language. This may not be particularly interesting for English speakers since most of the Web is in English but it will be valuable for other languages (e.g. an Arabic speaker interested in restaurant reviews from New York City restaurants).
Google Universal Search was a revamp of the core engine to show results other than text-based URLs and website summaries in the search results (e.g. search for nosferatu). There were a number of challenges in building this functionality such as
At Google, the belief is that the next big revolution is a search engine that understands what you want because it knows you. This means personalization is the next big frontier. A couple of years ago, the tech media was full of reports that a bunch of Stanford students had figured out how to make Google five times faster. This was actually incorrect. The students had figured out how to make PageRank calculations faster which doesn't really affect the speed of obtaining search results since PageRank is calculated offline. However this was still interesting to Google and the students' company was purchased. It turns out that making PageRank faster means that they can now calculate multiple PageRanks in the time it used to take to calculate a single PageRank (e.g. country specific PageRank, personal PageRank for a given user, etc). The aforementioned Stanford students now work on Google's personalized search efforts.
Speaking of personalization, iGoogle has become their fastest growing product of all time. Allowing users create a personalized page then opening up the platform to developers such Caleb to build gadgets lets them learn more about their users. Caleb's collection of gadgets garner about 30 million daily page views on various personalized homepage.
Q: Does the focus on expert searchers mean that they de-emphasis natural language
processing?
A: Yes, in the main search engine. However they do focus on it for their voice search
product and they do believe that it is unfortunate that users have to adapt to Google's
keyword based search style.
Q: How do the observations that are data mined about users search habits get back
into the core engine?
A: Most of it happens offline not automatically. Personalized search is an exception
and this data is uploaded periodically into the main engine to improve the results
specific to that user.
Q: How well is the new Universal Search interface doing?
A: As well as Google Search is since it is now the Google search interface.
Q: What is the primary metric they look at during A/B testing?
A: It depends on what aspect of the service is being tested.
Q: Has there been user resistance to new features?
A: Not really. Google employees are actually more resistant to changes in the search
interface than their average user.
Q: Why did they switch to showing Google Finance before Yahoo! Finance when showing
search results for a stock ticker?
A: Links used to be ordered by ComScore metrics but ince Google Finance shipped they
decided to show their service first. This is now a standard policy for Google search
results that contain links to other services.
Q: How do they tell if they have bad results?
A: They have a bunch of watchdog services that track uptime for various servers to
make sure a bad one isn't causing problems. In addition, they have 10,000 human evaluators
who are always manually checking teh relevance of various results.
Q: How do they deal with spam?
A: Lots of definitions for spam; bad queries, bad results and email spam. For keeping
out bad results they do automated link analysis (e.g. examine excessive number of
links to a URL from a single domain or set of domains) and they use multiple user
agents to detect cloaking.
Q: What percent of the Web is crawled?
A: They try to crawl most of it except that which is behind signins and product databases.
And for product databases they now have Google Base and
encourage people to upload their data there so it is accessible to Google.
Q: When will I be able to search using input other than search (e.g. find this tune
or find the face in this photograph)?
A: We are still a long way from this. In academia, we now have experiments that show
50%-60% accuracy but that's a far cry from being a viable end user product. Customers
don't want a search engine that gives relevant results half the time.
These are my notes from the keynote session MapReduce, BigTable, and Other Distributed System Abstractions for Handling Large Datasets by Jeff Dean.
The talk was about the three pillars of Google's data storage and processing platform; GFS, BigTable and MapReduce.
The developers at Google decided to build their own custom distributed file system because they felt that they had unique requirements. These requirements included
One benefit the developers of GFS had was that since it was an in-house application they could control the environment, the client applications and the libraries a lot better than in the off-the-shelf case.
GFS Server ArchitectureThere are two server types in the GFS system.
There are currently over 200 GFS clusters at Google, some of which have over 5000 machines. They now have pools of tens of thousands of machines retrieving data from GFS clusters that run as large as 5 petabytes of storage with read/write throughput of over 40 gigabytes/second across the cluster.
At Google they do a lot of processing of very large amounts of data. In the old days, developers would have to write their own code to partition the large data sets, checkpoint code and save intermediate results, handle failover in case of server crashes, and so on as well as actually writing the business logic for the actual data processing they wanted to do which could have been something straightforward like counting the occurence of words in various Web pages or grouping documents by content checksums. The decision was made to reduce the duplication of effort and complexity of performing data processing tasks by building a platform technology that everyone at Google could use which handled all the generic tasks of working on very large data sets. So MapReduce was born.
MapReduce is an application programming interface for processing very large data sets. Application developers feed in a key/value pair (e.g. {URL,HTML content} pair) then use the map function to extract relevant information from each record which should produce a set of intermediate key/value pairs (e.g. {word, 1 } pairs for each time a word is encountered) and finally the reduce function merges the intermediate values associated with the same key to produce the final output (e.g. {word, total count of occurences} pairs).
A developer only has to write their specific map and reduce operations for their data sets which could run as low as 25 - 50 lines of code while the MapReduce infrastructure deals with parallelizing the task and distributing it across different machines, handling machine failures and error conditions in the data, optimizations such as moving computation close to the data to reduce I/O bandwidth consumed, providing system monitoring and making the service scalable across hundreds to thousands of machines.
Currently, almost every major product at Google uses MapReduce in some way. There are 6000 MapReduce applications checked into the Google source tree with the hundreds of new applications that utilize it being written per month. To illustrate its ease of use, a graph of new MapReduce applications checked into the Google source tree over time shows that there is a spike every summer as interns show up and create a flood of new MapReduce applications that are then checked into the Google source tree.
MapReduce Server ArchitectureThere are three server types in the MapReduce system.
map operation on them then writes the
results to intermediate filesreduce operation
on them.
One of the main issues they have to deal with in the MapReduce system is problem of stragglers. Stragglers are servers that run slower than expected for one reason or the other. Sometimes stragglers may be due to hardware issues (e.g. bad harddrive conttroller causes reduced I/O throughput) or may just be from the server running too many complex jobs which utilize too much CPU. To counter the effects of stragglers, they now assign multiple servers the same jobs which counterintuitively ends making tasks finish quicker. Another clever optimization is that all data transferred between map and reduce servers is compressed since the servers usually aren't CPU bound so compression/decompression costs are a small price to pay for bandwidth and I/O savings.
After the creation of GFS, the need for structured and semi-structured storage that went beyond opaque files became clear. Examples of situations that could benefit from this included
The system required would need to be able to scale to storing billions of URLs, hundreds of terabytes of satellite imagery, data associated preferences with hundreds of millions of users and more. It was immediately obvious that this wasn't a task for an off-the-shelf commercial database system due to the scale requirements and the fact that such a system would be prohibitively expensive even if it did exist. In addition, an off-the-shelf system would not be able to make optimizations based on the underlying GFS file system. Thus BigTable was born.
BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Instead it is more like a multi-level map data structure. It is a large scale, fault tolerant, self managing system with terabytes of memory and petabytes of storage space which can handle millions of reads/writes per second. BigTable is now used by over sixty Google products and projects as the platform for storing and retrieving structured data.
The BigTable data model is fairly straightforward, each data item is stored in a cell which can be accessed using its {row key, column key, timestamp}. The need for a timestamp came about because it was discovered that many Google services store and compare the same data over time (e.g. HTML content for a URL). The data for each row is stored in one or more tablets which are actually a sequence of 64KB blocks in a data format called SSTable.
BigTable Server ArchitectureThere are three primary server types of interest in the BigTable system.
There are a number of optimizations which applications can take advantage of in BigTable. One example is the concept of locality groups. For example, some of the simple metadata associated with a particular URL which is typically accessed together (e.g. language, PageRank™ , etc) can be physically stored together by placing them in a locality group while other columns (e.g. content) are in a separate locality group. In addition, tablets are usually kept in memory until the machine is running out of memory before their data is written to GFS as an SSTable and a new in memory table is created. This process is called compaction. There are other types of compactions where in memory tables are merged with SSTables on disk to create an entirely new SSTable which is then stored in GFS.
Although Google's infrastructure works well at the single cluster level, there are a number of areas with room for improvement including
[The conference was part recruiting event so some of the speakers ended their talks with a recruiting spiel - Dare]
Having access to lots of data and computing power is a geek playground. You can build cool, seemingly trivial apps on top of the data such which turn out to be really useful such as Google Trends and catching misspellings of "britney spears. Another example of the kinds of apps you can build when you have enough data treating the problem of language translation as a statistical modeling problem which turns out to be one of the most successful methods around.
Google hires smart people and lets them work in small teams of 3 to 5 people. They can get away with teams being that small because they have the benefit of an infrastructure that takes care of all the hard problems so devs can focus on building interesting, innovative apps.
I really got into Nigerian hip hop and R&B music while I was there over the past few weeks. Below are links to my favorite songs from my trip, many of which are fairly old but were new to me.
Tongolo by D'Banj: A club banger done in a mix of pidgin English and Yoruba
Raise the Roof by Jazzman Olofin: Don't be fooled by the English title this song is mostly in Yoruba. The song is a general exhortation to dance which is a fairly popular topic for Yoruba hit music
Iya Basira by Styl-Plus: A humorous song about a guy who gets so hooked on food from Iya Basira's (i.e. Basira's Mom) restaurant that he thinks she is using jazz (i.e. magic, voodooo, juju, etc) to make the food taste so good.
Nfana Ibaga (No Problem) by 2Face Idibia: The opening rap is beyond wack but the song itself is quite good. He scored an international hit with a song called African Queen which I really didn't feel that much.
Imagine That by Styl-Plus: This is a fairly crappy video but I love the song. The chorus is a mix of Yoruba and English. Roughly translated it goes "Imagine That! She says she doesn't want us to do this anymore. Imagine That! After everything I've done for her. Imagine That! What does she expect to become of me if she goes. Imagine That! If she goes".
Linking to Niall Kennedy's blog reminded me that I owed him an email response to a question he asked about a month ago. The question asked what I thought about the diversity of speakers at the Widgets Live conference given my comments on the topic in my blog post entitled Who Attends 'Web 2.0' Conferences.
After thinking about it off and on for a month, I realize that I liked the conference primarily because of its content and focus. The speakers weren't the usual suspects you see at Web conferences nor were they homogenous in gender and ethnic background. I assume the latter is a consequence of the fact that the conference was about concrete technical topics as opposed to a gathering to gab with the hip Web 2.0 crowd which meant that the people who actually build stuff were there...and guess what they aren't all caucasian males in their 20s to 30s, regardless of how much conferences like The Future of Web Apps and Office 2.0 pretend otherwise.
This is one of the reasons I decided to pass on the Web 2.0 conference this year. It seems I may have made the right choice given John Battelle's comments on the fact that a bunch of the corporate VP types that spoke at the conference ended up losing their jobs the next week. ;)
These are my notes from the session on Success Story: MeeboMe.
Meebo started as a way for the founders to stay in touch with each other when they were at places where they couldn't install their IM client of choice. They realized that Instant Messaging hadn't really met the potential of Web and decided to create a startup to bring IM to the Web. Today they have grown to a site with 1 million logins daily, 4 million unique users a month and 64 million messages sent a day.
MeeboMe is an embeddable IM windows you can drop on any webpage. People can see your online status. Even cooler is that it allows the Meebo user to see people who is viewing that page and then they can send an IM to the page in real time while they are viewing the page. That is fucking cool. I'm so blown away that I've decided to figure out a way to get MeeboMe on Windows Live Spaces and will start looking into how to get that to happen when I get back to work.. There are three main reasons they built the MeeboMe widget; It meets their core mission of bringing IM to the Web, it drives use of Meebo.com and their users asked for it. :)Their design principles have been quite straightforward. They have used Flash and protocols like Jabber/XMPP that already exist and that they are familiar with to ease development. They try to keep features to a minimum and focus on making Meebo.com act like the traditional IM experience. They have had t deal with performance issues around sending/receiving messages and showing changes to a user's online presence without significant lag. They are also very driven by user feedback and the Meebo blog is embedded in the Meebo web experience when users sign in. User feedback is how they determined that being able to show emoticons in instant messages was more important to users than being able to add IM buddies from Meebo.
MeeboMe is used in a lot of places such as education by high school teachers and college professors as way to give students a way to contact them. Librarians have also used it as a way to have patrons contact the librarian about questions by placing the MeeboMe widget on the front page of the library's website. There is a radio DJ takes requests from the MeeboMe widget on his site. There are also retail sites that use MeeboMe for customer support. One trend they didn't expect is that people place different MeeboMe widgets on different pages on their site si they can have a different buddy list entry for each page.
During the Q&A someone asked if MeeboMe drove account creation on Meebo.com and the answer was "Yes". They had their largest number of new accounts up to that date when they launched the widget.
These are my notes from the session on Success Story: PhotoBucket.
PhotoBucket is a video and image hosting site that sees 7 million photos and 30,000 videos uploaded daily. They serve over 3 billion pieces of media a day. The site has 15 million unique users in the U.S. (20 million worldwide) and has 80,000 new accounts created daily. There is now a staff of 55 people whose job it is to moderate content submissions to ensure they meet their guidelines.
The top sites their images used to be linked from used to be eBay and LiveJournal but now the key drivers of traffic are now social networking sites such as MySpace and Xanga. There is 30% - 40% overlap between their user base and social network website users
There was some general advise about widgets such as being careful about hosting costs which may pile up quick if your widgets become popular and also about trying to monetize users via your widgets because some sites frown upon that behavior such as eBay. However well designed and compelling widgets can drive a lot of traffic back to your site, the best example of this to date being YouTube.
The speaker then gave a timeline of notable occurences in the MySpace widgets world such as MySpace blocking Revver & YouTube to the recent explosion of new widgets in the past few months from MeeboMe to a number photo slideshow widgets from the major image hosting services.
Pete Cashmore over at Mashable.com has compiled some statistics on the most popular widgets on MySpace which shows the relative popularity of PhotoBucket's widgets in comparison to other services.
So what's in a name? They've renamed the feature from BucketFeatures to Widgets and now to 'Slide Shows' because none of their non-Silicon Valley users knew what widgets were. After the rename from 'Widgets' to 'Slide Shows', the usage of the feature almost doubled within a month.
They've also designed a JWidget which allows people to log-in to their PhotoBucket account to access their videos and images. Users can upload images and videos . This way people can outsource both image upload and content moderation to PhotoBucket. Now have 16,000,000 logins a month via JWidget from about 500 partner sites. It is amed JWidget because the developer's name begins with 'J'. :)
During the Q&A someone asked if they support you have tagging & open APIs. The response wa sthat they don't do tagging and their user base has never asked for tagging. With 2500 support tickets a day, none of them have ever been about tagging. Also, since it is just image hosting service, tagging is probably more appropriate for the blog post or profile the image is appearing in than on the hosted image. They don't have an No API primarily due to resource constraints, there are only 40 people at the company working on it.
These are my notes from the session on Fox Interactive Media by Dan Strauss.
Fox Interactive Media (FIM) is the parent company of MySpace. Also owns MySpace, IGN, Fox.com, FoxSports.com, AskMen.com, Rotten Tomatoes and Gamespy. They have 120 million visitors across all the sites.
They are buying small dev teams like Sidereus and Newroo as well as big companies like MySpace & IGN. They created FIM Labs so that some of the small dev teams can coninue to be innovative. FIM Labs focuses on incubation of new technologies, product development and technology evangelization to FIM properties. The folks from Sidereus worked on the Spring Widgets platform. Announcing a new platform named Spring Widgets.
Why widgets? They have a goal of to cross-pollinating users across the various FIM properties and also create a platform that can tie their businesses together. Widgets have been gaining traction and seemed like the right vehicle for furthering their goals.
Sidereus had a desktop background and researched Konfabulator, Dashboard and Vista gadgets.They also looked at Web widgets specifically AJAX and Flash widgets being used by MySpace users. They want users to be able to add widgets for FIM websites to their MySpace profiles and their desktop. From the Sping platform site a user can find a widget then add it to my MySpace. No more cutting and pasting code, the experience is similar to Windows Live Gallery for MySpace. Users can also drag and drop widgets from the Web onto the desktop. Only the Windows desktop widgets are supported for now but Mac support is on the way.
The Spring Widgets platform is 100% flash. Adding a desktop widget requires installing the Spring widgets runtime in addition to having Flash installed. This runtime is less than 2MB. There is an SDK so widget developers get APIs that can tell if the user is onlne or offline, store some persistent state, tell certain UI conditions such as the widgets window size and more. There is also a Web simulation tool developers can test their widgets without having to upload them to a Website.
The talk was followed by a demo showing how easy it is to build a Spring widget using WYSIWYG Flash development tools. They also announced a partnership with FeedBurner.
There were several questions during the Q&A that resulted in an answer of "we're still figuring things out". It was clear that although the technology may be ready there are a number of policy questions that are still left to be answered such as whether there will be integration of the Spring Widgets site into the MySpace UI (similar to how Windows Live gallery is integrated into Windows Live Spaces or what the certification process will be for getting 3rd party widgets hosted on the Spring Widgets site?
Despite the open questions this is definitely a very bold move on the part of Fox Interactive Media. It does the question though that if every widget platform has its own certified widgets gallery that use their own platforms (e.g. Flash in the case of Spring Widgets, DHTML and XML in the case of Windows Live Gallery and proprietary markup in the case of Yahoo! Widgets Engine there is either going to have to be some standardization or else there may be a winner takes all where widget developers target one or two major widget platforms because they don't have the resources to support every homebrow Flash or AJAX platform out there.
These are my notes from the session on Konfabulator by Arlo Rose.
He started with answering the question, why name them 'widgets'? At Apple, a UI control was called a widget. He thought the name meant something more and has always wanted to build widgets that do more.
He was the creator of Kaleidoscope which was one of the key customization and theming applications on the Macintosh. The application was so popular that the CEO of Nokia mentioned it as the inspiration for customization in cell phones.
When Apple announced Mac OS X, he became nervous that this would spell the end of Kaleidoscope and it was because they couldn't make the transition to Cocoa so they killed the product. Arlo then looked around for a new kind of application to build and came across the Control Strip and Control Strip Modules on the Mac which he thought were useful but had a bad user experience. He had also discovered an MP3 player for the Mac named Audion which used cool UI effects to create little UI components on the desktop which seemed transparent. Arlo thought it would be a great idea to build a better Control Strip using Audion-like UI. He talked to his partner from Kaleidoscope but he wasn't interested in the idea. He also talked to the developers of Audion but they weren't interested either. So arlo gave up on the idea and wandered from startup to startup until he ended up at Sun Microsystems
At Sun, he was assigned to a project related to the Cobalt Qube which eventually was cancelled. He then had time to work on a side project and so he resurected his idea for building a better Control Strip with an Audion-like user interface. He originally wanted to develop the project using Perl and XML as the development languages but he soon got some feedback that creative types on the Web are more familiar with Javascript. So in 2002 he started on Konfabulator and released version 1.0 the following year. They also created a widget gallery that enabled developers to upload widgets they've built to share with other users. However they didn't get a lot of submissions from developers so they talked to developers and got a lot of feedback on features to add to their platform such as drag and drop, mouse events, keyboard events and so on. Once they did that they started getting dozens and dozens of develper submissions.
After they got so much praise for the Mac version, they decided to work on a Windows version. While working on the Windows version, he got a call from a friend at Apple who said while he was at a design meeting and he heard "We need to steamroll Konfabulator". He started calling all his friends at Apple and eventually it turned out that the Apple product that was intended to steamroll Konfabulator was Dashboard. The products are different, Dashboard uses standard DHTML while Konfabulator uses proprietary markup. Arlo stated that their use of proprietary technologies gave them advantages over using straight DHTML.
Unfortunately, even though they got millions of downloads of the Windows version not a lot of people paid for the software. There were a number of reasons for this. The first was that in general there is a less of a culture of paying for shareware in the Windows world than in the Mac world. Secondly, there were free alternatives to their product on Windows that had sprung up while there was only a Mac version. In looking for revenue, they sought out partnerships and formed one with Yahoo!. He also talked to people at Microsoft in Redmond who let him know that they were planning to add gadgets to Longhorn Windows Vista. Microsoft made him an offer to come work on Windows Vista but he turned it down. Later on, he was pinged by a separate group at MSN that expressed an interest in buying Konfabulator. Once this happened, Arlo contacted Google and Yahoo! to see if theyd make counter offers and Yahoo! won the bidding war.
They started working on the Yahoo! Widget Engine and the goal was to make it a platform for accessing Yahoo! APIs as part of the Yahoo! Developer Network. However consumers still wanted a consumer product like Konfabulator and eventually they left the YDN and went to the Connected Life group at Yahoo! which works on non-Web consumer applications such as desktop and mobile applications.
There are now 4000 3rd party widgets in the Yahoo! widget gallery and they are the only major widget platform which is cross platform. Also they are the only widget platform that has total access to Yahoo! data.
Q & A
Q: What's next?
A: The next question is to see how far widgets can scale as mini-applications. Can
a picture frame widget become something more but not a full replacement for Flickr
or Photoshop?
Q: What do you think of the Apollo project from Adobe?
A: Doesn't know what it is.
Q: Did he ever figure out a business model for widgets?
A: He planned to make deals with companies like J.Crew, Staples, and Time Warner for
movie tie-ins.
Q: Why move from YDN to Connected Life?
A: They were 3 people and they couldn't do both the developer side & the consumer
application. Also it turned out that the Yahoo!
Developer Network turned out not to have the clout that they thought they would
in that Yahoo! applications would refuse to provide APIs that could be accessed by
3rd party developers but would create special APIs for writing Konfabulator widgets.