After looking through my raw files, the audio for the Database Normalization session at the MySQL UC2006 was missing. In addition I never managed a recording of the Sakila session due to the laptop crash.
All is not lost, interested viewers can see the Normalization session from PHP Quebec 2005 and my Rough Version of the Sakila session as subsitiutes.
On top of everything else I saw at the MySQL UC, I saw enough snazzy digital SLRs to make me green with envy (curse you Stewart, Mike, George, Colin, Jeremy and Julian!)
So, I decided to whip up a way out-there camera wishlist to rule them all! I may not actually have any of this, but when some generous millionaire buys it all for me I shall have the mightiest setup of all at the next uc!
Ok so it is just an excuse to play with the wishlist plugin for wordpress.
This post recorded mid-day on Friday.
The MySQL User conference has ended and I find myself waiting for my flight home — a good time to gather thoughts.
I personally think the conference was a great success. We had over 1500 attendees and the feedback I heard on the conference was very positive. I did hear a few asking for the sessions to be longer than 45 minutes and I do agree that 45 minutes can be a bit cramped, especially for a speaker who is re-using a session previously delivered in a longer block, but overall the response was great.
I had three sessions this year, and those sessions seemed well received by the attendees I spoke to. I am *very* glad that I backed up my slides to a keychain drive and ghosted my drives before leaving as it made the crashing of my laptop an annoyance rather than an unmitigated disaster. I worry that the distraction of a laptop crash may have left me at less than 100% for my sessions on Tuesday, but hopefully the sessions were of value to the attendees.
Wednesday had my laptop working again and made for a great session in the Hierarchical Data talk. It was the first time I had a standing-room only crowd and I found myself approaching the talk with so much energy that I subsequently lost my voice later in the day. I think a three hour tutorial might be worth submitting at the next user conference, covering everything in-depth including Adjacency Lists, Nested Sets and Materialized Path. It would allow for code samples and sample application walkthroughs.
I love the amount of coverage that was available at http://www.planetmysql.org this year, and I got to meet many of the PlanetMySQL bloggers including Giuseppe Maxia, Roland Bouman, Ronald Bradford, Beat Vontobel, Markus Popp, Mike Kruckenberg, Sheeri Kritzer and Beat Vontobel. I wish I had managed a picture with all of them, but maybe next year (Sheeri suggested a PlanetMySQL BOF and I think it is a great idea). I got to read notes from many sessions that I could not attend, and the sheer number of bloggers meant I did not have to worry about not blogging during my computer troubles.
In addition to meeting the bloggers I got to meet with old friends like Zak Greant, Jeremy Cole and Laura Thomson and new friends like George Schlossnagle, Wez Furlong and Sterling Hughes. However, my favorite part is the chance to meet the users. Meeting the users is wonderful — I love to hear how they use MySQL and find it very gratifying to advise and help them in person and share new features and ideas with them. Helping people by IRC and email is good, but nothing beats hearing a user share their challenges and be able to share a previously unknown solution with them so they can walk away from the conference thinking it was well worth the cost.
It was also great meeting with some of the Japanese customers who attended the conference. I can tell that 8 years away from Japan has kicked my language skills down a few notches from the days when I could translate for speakers at meetings, but I was pleased that I could still communicate with a bit of effort and a lot of ‘could you say that again slowly?’ There was a good Japanese contingent this year and with the efforts of the new Japanese subsidiary I can only see it getting larger.
Regardless of where the users are from I still find meeting them to be very motivating. Hearing their stories and the ways they use MySQL is very gratifying. Hearing feedback from the end users is extremely valuable because it comes from those who use what we produce directly. Sharing plans and projects with them produces an infectious excitement that I can feed off of for the next year. Sitting down with the users makes me want to get home and give them a better manual and for that I thank all of the users that I managed to speak with. They are the reason we do what we do and at the end of the day the people on the conference floor are the ones we ultimately answer to.
So, at the end of a fast-paced week I find myself simultaneously buzzed and tired, but very gratified. I hope the rest of the attendees feel the same.
See you next year!
P.S. Don’t forget to check SwagReport.com for the low-down on all the swag at this year’s conference that I could get my hands on. with more coming soon as I sort and photograph it all.
Update: High Res Versions at http://www.openwin.org/mike/wordpress/wp-gallery2.php?g2_view=core.ShowItem&g2_itemId=1198
This year there was a nice little bonus, Julian Cash was on hand to take photos of the staff with a proper studio setup. I’ve seen some of his work before and liked it, and was looking forward to a session. When the photo sessions were announced, we were asked to bring a prop that represented ourselves or our work or personality. I actually forgot about the prop requirement but went for a session on the way to my Hierarchy talk and walked in carrying a stack of books to give away. Julian took one look and pronounced ‘Great Prop!’ upon seeing my stack of books. It was not planned as a prop, but seeing as I am a technical writer and author it was in fact quite appropriate.
The resulting photo is not online yet, but here is a sample I managed to snap from his book the next day:
Enlarged:
It replaces this and I consider it quite an improvement.
Mixi is a massive social networking site in Japan, nice user interface demo.
Began in late 2003 with 10,000 users and 600K pageviews a day. As it became clear that the site was viral, the question became how to scale out before getting overwhelmed.
In the first year they went from 600 users to 210,000 users.
2,000,000 users in the second year.
Now 3.7 million users, with 15,000 new users per day. Japan has 86.7 million users.
Site has 70% active users (logging in at least every 72 hours). Average user spends 3 hours 20 minutes on mixi a week. 35th on Alexa worldwide, 3rd in Japan.
Mixi now gets more traffic than Amazon Japan.
Mixi uses LAMP (perl), memcache, squid. Fedora for Linux.
Requests go mod_proxy to mod_perl, then either memcache or a set of MySQL servers divided by blogs, messages and other.
Using more than 100 servers, 10 new servers per month. All done with non-persistent connections, lots of InnoDB use.
Mixi does their own partitioning.
First architecture was one master and multiple slaves, adding more slaves as load increased. Eventually they hit a wall where the master was overloaded with writes, which pushed down to slave load.
Blog and Forum tables are 75-85% reads, 15-25% writes, replication was not keeping up with the writes.
Eventually the system was split into different replication sets based on purpose. They also considered splitting replication by user sets but they found the option unwieldy.
By splitting by purpose (blogs/forums/etc) it made the implementation easier.
Approach was transitioned by first having the code write to the old and new databases, with INSERT IGNORE used to move data from the old server to the new server for code that was not yet transitioned. As data became complete, reads were moved to the new DB servers.
After purpose split servers became overloaded, they looked at moving to partitioning by user id or message id. User ID splitting was chosen because the use case is generally a read of messages for a given user.
For the partitioning there is a manager DB that determines where a given row should be written to, with perl then writing to the chosen node.
The alternative is to determine storage node by algorithm. When using an algorithm balancing and adding nodes can be difficult.
Mixi rarely hits the DB because most goes into distributed memory caching.
memcached is used for memory cacheing. They have 39 machines with 2gb ram each.
Image serving
Currently using 8tb of storage, MySQL stores metadata.
Images are frequent access and rare access. The frequent access ones are several hundres megs, spread with FTP and squid.
Rarely accessed images are a few TBs, newer files get more access, cache hit ratio is bad, they distribute directly from storage.
They user a cluster of image storage/delivery boxes, with a manager app that determines where the images are stored and routes to the appropriate storage.
When a user views a rarely accessed image, perl needs to know where the picture is located, it asks a manager DB and then constructs a URL to get the image from the relevant storage box, with the uRL incorporated into the display page.
To Do
Evaluate Cluster.
Implement a better partitioning algorithm.
Add a third level of partitioning (split by timestamp).
Questions
How do you backup? They have master/master pairs, one can go offline for snapshots.
How many mod_perl boxes? 50 with 100 DB boxes.
Federated was found to be slower than performing two queries, one on each machine.
I’ve finished my last of three sessions, finally getting through my hierarchies session. With the laptop failure, speaking just was not as enjoyable this year as in previous years when there were no such concerns to distract me from giving the audience a good session.
The Hierarchies session seemed well received, with the room packed and audience members up against the back wall. I do wish I had not been opposite the Second Life session, as I was quite interested in how they scale.
Materials
Managing Hierarchies
Sakila Sample Database - Download Sakila Sample Database Here
Introduction to Normalization
As I mentioned previously, I’ll be reporting on the conference swag along with the sessions at this year’s MySQL User Conference.
The first candidate? The SCO Giant Pen
Check it out!
I love our IT guys. One call to the Cupertino office and one of the IT guys drove to the conference center with the discs needed to reinstall Windows and Office.
I’m now getting everything patched and ready to deliver my session on Managing Hierarchical Data. Sadly there is no recording of the Sakila session because I had to deliver it on a conference-provided PC.
Lessons learned:
Do not sit near high voltage lines and work on your laptop. The only explanation I have for the issues is that I sat near a thick power cable for the lights during the keynote, perhaps it was not well shielded and corrupted files.
Keep your slides in at least three places. When I left I copied my slides from my desktop to my laptop and also to a keychain drive. When my laptop failed I was annoyed but did not panic because I was able to simply insert my keychain into the dekstop provided in the session room and deliver my presentation.
Backup before you leave. Just before I shut down my laptop to head to the airport I had the feeling I should Ghost it. I acted on that feeling and I am very glad I did.
Anyway, it will be good to deliver my session from my laptop.
Just a warning to everyone that I will not be posting many blog entries from the UC today, my laptop is suffering from an unknown issue that I need to diagnose and repair before I can do any work from it.
On the bright side there is a great blogger contingent here this year and there is plenty of content to go around.
Session attendees: I will get content online as soon as I can.
Session by Laura Thomson, OmniTI. Laura is author of a number of popular PHP and MySQL books.
Session will be on security at the developer level and is written from a programmer’s perspective.
Many developers plug along without any knowledge that they have security problems, security awareness is a relatively new thing. This session is not about guru-level knowledge, it is about developer-level practices.
MySQL Security Basics
Do not run your mysqld as the unix root user. Run it as a purpose-created user.Do not use the purpose-created user for anything else. MySQL root user is not related to the Unix root user.
Set permissions on DB data files directory so that only mysqld user can access them.
Disable symlinks to tables with –skip-symbolic-links unless you need them.
Disable access to port 3306 except to trusted hosts.
Accounts and Privileges
Make sure all MySQL user accounts need a password, especially the root user.
Parts of your security policy should relate to your MySQL installation.
The first two thing you do on a new install is delete the anonymous user and set a root password.
Grant people the fewest privileges required to do their job.
Only the root user should have access to the mysql system database.
Keep FILE, PROCESS and SUPER for administrative users. FILE enables file creation, PROCESS allows users to see what others are doing, SUPER can do things like terminate the connections of others.
Avoid wildcards in the hostname portions of GRANTS.
Use IP addresses in your GRANTS instead of hostnames if you do not trust your DNS.
Using Encryption
Don’t store application passwords in plaintext in the database.
Use one-way hashing for password information.
Require database connections to be via SSH or tunneled through SSH.
Block the old passwords using –secure-auth, do not use –old-passwords.
The most secure data is the data you do not store. Do not store Credit Card numbers if you can at all avoid it.
PHP Security Basics
Consider illegitimate uses of your application.
Educate yourself.
Filter all external data.
http://phpsec.org/projects/guide
External Data
External data cannot be trusted.
External data is form data, $_GET, $_POST, $_REQUEST, cookie data, some server variables (SERVER_NAME), query results, web services data, files.
Filter input and escape outout.
Filter input with whitelisting where possible.
Escape output according to where it’s going.
It’s good to be paranoid.
All data is external data.
Attacks
register_globals and why it changed
XSS (cross site scripting)
SQL/Command Injection
Exposed source code
Session fixation
Session hikacking
Cross site request forgeries (CSRF)
Code injection
Attack #1 - Register Globals
Before PHP 4.2, register_globals was on by default.
The change broke everyone’s code and made things harder for new developers.
Why was it done?
Say code checks if ($valid_user) then show stuff…
http://www.foo.com/bar.php?valid_user=true
And you are seeing what you should not.
This issue is about uninitialized variables than register_globals.
The extract() function returns things back to before register_globals was turned off. You can also turn on register_globals in php.ini.
The solution to this problem is to educate developers.
You can turn up the error_reporting level to help educate developers.
Attack #2 - XSS
An attempt to enter data into a web app with client-side code such as javascript. For example, malicious javascript in a guestbook. People can create annoying popups, do meta-refreshes, create dubious forms, steal cookies, and even create AJAX-based attacks.
Stolen cookies with XSS can lead to session hijacking.
You prevent this with output filtering to the browser using htmlentities().
Google for the XSS cheatsheet, you can copy and paste it into your forms as a testbed for XSS attacks.
Attack #3 - SQL Injection
SQL injection is adding your own queries to a string to be executed against MySQL/anyDB. By passing things like quotes and comment marks you can modify the query that the application tries to execute.
Another variation is command injection, if the PHP script tries to run system() or exec(), a user may be able to inject into it and get a command run on the server itself.
Prevent with mysql_real_escape_string, do not use magic quotes.
Alternatively prevent with prepared statements.
Attack #4 - Exposed Source Code
Occurs when code is not interpreted by PHP.
Happens when .inc files are in the web document tree.
Browser requests for these files will often wind up being shown plaintext, passwords and all.
Fixes:
Use standard .php extensions.
Put libraries and templates outside the document tree.
Disabblow serving of .inc files, but don’t rely on it.
Use SetEnv to set usernames/passwords as server variables.
Attack #5 - Session Fixation
Session security works on the basis that the session ID is hard to guess. Not having to guess it makes thinsg easier.
PHP can accept session IDs through a cookie or URL.
Stealing sessions allows you to take over a logged-in user.
Session fixation means having a user follow a link with a provided session id in the URL. If they then login, you could connect with the session ID you provided and use their credentials.
Use session_regenerate_id() whenever a user logs in or changes their privilege level so an attacker cannot go back in and use the session id the provided to the target.
Attack #6 - Session hijacking
Similar to session fixation but we somehow obtain the user’s session ID.
Done with XSS and stealing cookies through javascript
Session IDs can be sniffed or obtained throug proxy servers.
Best to use SSL for anything that requires a logged-in user.
Attack #7 - CSRF
Cross Site Request Forgeries
A request for a page that looks as though it was initiated by a site’s trusted user, but was not (deliberately).
Example: <img src=”http://example.com/single_click_to_buy.php?item=12345″ />
In the example, a user browsing your page winds up buying an item without knowing it.
Do not use GET parameters to perform an action.
Make sure that users come through your forms, and each for submission matches a form you sent out.
Generate a one-time token and embed it in the form, save in the session, check on submission.
Not trivial to protect against this.
The famous MySpace attack used XSS and CSRF to determine who was looking at a user’s page, then used CSRF to send a friend request from the viewer to the page owner, thus building a huge friends list. As well, the friends of those targets also became his friends. The sheer size of the friends list crashed the system.
CSRFs are even used to defeat captchas by taking a captcha and showing it to a user on another site, then pushing it back when a unwitting user reads the captcha.
Attack #8 - Code Injection
Similar to SQL injection and command injection. This involves executing arbitrary code on the target server, usually though file inclusion.
Example script parses user input to generate a filename in a theming system. The sample attack had the theme load an external file from across the net.
Code injection is prevented by filtering user input. Only allow expected values. Do not allow http:// in the filename argument.
In PHP.ini, allow_url_fopen should be disabled if possible. This blocks require/include/fopen on remote files.
Techniques for Prevention
Preventing SQL Injection and Designing for Security
Use mysql_real_escape_sting().
Make sure that the data coming in matches your expectations. If you expect a string, make sure it is a string. If you expect a number, make sure it is a number.
Use Prepared Statements and bind variables.
Laura recommends prepared statements. They speed query execution and someone else handles the filtering.
When used as a coding standard, prepared statements help limit problems cause by novice and naive developers.
Laura then provides some prepared statement examples, showing bindings for input and output.
By using prepared statements you do not need to filter data, the statement preparation does it for you.
After showing a mysqli example, Laura shows an example using PDO, a new common database abstraction library.
Designing for Security
Basic Principles
Don’t rely on server configuration to protect you. No magic quotes.
Design your app with security from the ground up. Have a single point of data cleaning.
Have external code review.
Seek advice from experts.
Keep up to date on the latest security related developments.
Dispatch architecture and MVC help improve security.
Use the error reporting, use E_ALL & E_STRICT to see all errors.
On aproduction server, set display_errors = off.
Attackers can use error messages to learn about your system and facilitate an attack.
Education and Code Review
Problems come from uneducated developers.
Code review should be peer to peer and more importantly more experienced to less experienced.
As well as formal code review, mentor the junior staff. Read commit emails.
Developer education should happen during security audits.
Security Audits
Look at external consultants who are up to date on the latest security issues.
Audits should be accompanied by security education for developers so the mistakes found do not come back.
Listen to the criticism.
Responsiveness
One of the most important things.
Respond to reported exploits, code patches, feedback.
Look at
http://www.owasp.org
http://phpsec.org/projects/guide
http://www.hardened-php.net
Essential Security from O’Reilly
Overall a good overview for developer-level security consideration. I’ll be sure to hit Laura’s other session at the conference.