» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with MySQL + databases

Drizzle, Google Protocol Buffers, Low Hanging Fruit

We have been looking for a new standard method for serializing objects from the server. Schemas, Replication Events, Tables, etc... all for one reason or another have to be serialized at some juncture at the moment. Each type has its own method/language which makes it a bit of a pain to deal with in the server.

Worse?

Few of the formats are upgradable. You end up with all sorts of incompatibilities that over time have become a real PIA to deal with.

Introducing Google's Protocol Buffers:
http://code.google.com/apis/protocolbuffers/

In short, this buys us an upgradable format and a lot less code. It also makes it much simple to write trivial little tools to pull apart objects from the server.

I pushed the first bit of this into Drizzle last night, and picked a low hanging piece of fruit, the schema, to do first:

message Schema {
   required string name = 1;
   optional string characterset = 2;
   optional string collation = 3;
}


The above was all I needed for schemas. Now out db.opt files can be upgraded as time moves on without us having to worry about the internal formats. I even wrote a 14 line piece of C++ code that allowed me to dump the files :)

It was only 14 lines because I did it in C++, I could have done it in fewer with Python :)

If you are interested in something a bit more advanced jump on the mailing list (https://launchpad.net/~drizzle-discuss/), and comment on this:
http://drizzle.wikia.com/wiki/Table_Proto_Definition

That is the definition we are putting together for describing tables (which will give us the ability to ditch FRM and move to a discovery system).

Brian-Aker: Brian "Krow" Aker's Idle Thoughts

Drizzle, Google Protocol Buffers, Low Hanging Fruit

We have been looking for a new standard method for serializing objects from the server. Schemas, Replication Events, Tables, etc... all for one reason or another have to be serialized at some juncture at the moment. Each type has its own method/language which makes it a bit of a pain to deal with in the server.

Worse?

Few of the formats are upgradable. You end up with all sorts of incompatibilities that over time have become a real PIA to deal with.

Introducing Google's Protocol Buffers:
http://code.google.com/apis/protocolbuffers/

In short, this buys us an upgradable format and a lot less code. It also makes it much simple to write trivial little tools to pull apart objects from the server.

I pushed the first bit of this into Drizzle last night, and picked a low hanging piece of fruit, the schema, to do first:

message Schema {
   required string name = 1;
   optional string characterset = 2;
   optional string collation = 3;
}


The above was all I needed for schemas. Now out db.opt files can be upgraded as time moves on without us having to worry about the internal formats. I even wrote a 14 line piece of C++ code that allowed me to dump the files :)

It was only 14 lines because I did it in C++, I could have done it in fewer with Python :)

If you are interested in something a bit more advanced jump on the mailing list (https://launchpad.net/~drizzle-discuss/), and comment on this:
http://drizzle.wikia.com/wiki/Table_Proto_Definition

That is the definition we are putting together for describing tables (which will give us the ability to ditch FRM and move to a discovery system).

MySQL: Planet MySQL

Using Ruby to Migrate Databases


If you deal with databases for a living, eventually you’ll come across cases where you’ll need to migrate a lot of data from one schema to another. I am not just talking about migrating from one different type of database to another, like from Oracle to MySQL, but from, for instance, a badly-designed schema to one more expertly crafted.

If there are minor differences between the source and target schema, this is a trivial affair. On the other hand, if the schema is completely different, this can be quite a challenge. Moreover, the database being migrated might represent a high-demand website that will need to be done with little or no downtime, with lots of planning and preparation to boot. You may be interacting with the application developers, the systems crew, and juggling tight deadlines as well.

Well, as you may have guessed, I have described some of the roles I now play at a leading social networking company. We are indeed in the midst of creating the “NextGen” product — a complete rewrite and redesign. The new system is designed with modularity and scalability in mind. The old system we are transitioning from was created when the company was much smaller and had 2 orders of magnitude or more less demand. Suffice it to say, it has all the appearances of being crafted by a bunch of “juniors” that just quickly browsed through “PHP for Dummies”, “Database Design for Idiots”, and the like the night prior. That the aging application still works at all is seen as the “8th Wonder of the World”, but to it’s credit it brings in millions in revenue despite all of its faults.

I am an “old veteran” when it comes to software development. In my “advance age”, I’ve decided to do databases as something that I’ve not done before in my 30-year career as software developer. The nice thing is that I find much I’ve learned about algorithms and data structures can also be applied to schema design. It also helps with interacting with the applications development team as I can relate to what their needs are and “bridge the gap”, as it were between the code and the database.

I have chosen Ruby out of all the languages I know — Python, Perl, PHP, C++,Java, etc. — because of it’s expressive power and meta-programming capabilities that most of the other languages don’t either do well, or lack  a clean syntax to accomplish the same.

First, let me speak of my general approach to data migration. You have your source and destination databases. Of your source databases, you will obviously have the main database containing the enterprise’s lifeblood information. Some of that data will relate directly to customer/account activity; some may relate to configuration of how that data is handled; other data may serve as a reference, such as a zip-code database.

Similarly, you will also have target databases, with the same type of data, but organized differently — hopefully more efficiently. Also, what may have been denormalized in the source database you might choose to normalize it in the target, or vice-versa. Perhaps password for user accounts were in plaintext in the source and now you need to md5 them in the target.  Perhaps there were a fixed number of columns in the source tables representing some resource that you wish to store as separate rows in the target for added flexibility and expandability. Again, if you are only dealing with a couple of tables, it’s trivial to do the migration. If, on the other hand, you are dealing with dozens of tables, the problem explodes in complexity.

Since I want to illustrate doing a migration, I don’t want to bog you down with a complex schema; instead, I will take a simple example. Suppose you have a picture display site where each picture was represented by a column in the users table, and you need to migrate this to a more flexible system that will allow any number of pictures per user. If you have 10 million users in this table, doing a ALTER TABLE every  time needed to expand on the number of pictures would be just plain silly.

1
2
3
4
5
6
7
8
9
10
CREATE TABLE old_accounts (
  id INT auto_increment primary key,
  name varchar(100) not null,
  email varchar(100),
  picture1 varchar(100),
  picture2  varchar(100),
  picture3  varchar(100),
  picture4  varchar(100),
  picture5  varchar(100)
) ENGINE=MyISAM;

And here is the new schema we wish to migrate this to:

1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE TABLE new_account(
  userID INT auto_increment primary key,
  given varchar(50) not null,
  sur varchar(50) not null,
  email varchar(100)
  ) ENGINE=InnoDB;
 
CREATE TABLE pictures (
  pictureID int not null auto_increment,
  userID INT not null,
  url varchar(100) not null,
  unique index(userID, url)
) ENGINE = InnoDB;

I have deliberately left out the foreign key specifications for clarity — and some would argue it would be a nasty performance hit under some circumstances, though I’ve not run into that problem personally.

I have written a complete Ruby framework specifically for migration, but as of the time of this writing, that code is proprietary and not yet released to open-source, though eventually I may do that if I get clearance. But basically, I use Ruby classes to represent a “unit” of migration — normally a single source table to one or more target tables. So, using my Migration framework, here’s what this migration would look like in Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class UserMigration < Migration
    def migrate_map
        @src_table = {
             :old_accounts => {:PK => :id}
        }
        @dest_table = {
            :new_account = {
                :PK => :userID,
                :id => :userID,
                :name => :given,
                :email => :email,
            },
 
            :pictures => {
                :PK => :pictureID,
                :FK => {:new_account => {:userID => :userID}}
                :picture1 => :url,
            },
 
           :pictures => {
                :PK => :pictureID,
                :FK => {:new_account => {:userID => :userID}}
                :picture2 => :url,
            }, ... 
        }
    end
end

Well, that’s it — almost, and there’s a problem in the Ruby code that you will catch right off the bat if you know Ruby — and I think that if you look at it for a bit, you can figure out what’s going on here. So I’ll leave that as an exercise for you to mull over. You don’t really need to know Ruby at all to understand what’s going on here, and that’s the bit I like about Ruby. You can use it as a type of “meta-language” if you know what you’re doing.

MySQL: Planet MySQL

Drizzle - Lightweight SQL Database for Cloud and Web

"This project is about building a building a database designed for the typical Web usage patterns for a database. It is being designed for massive concurrency on modern multi-cpu/core architecture. The code is originally derived from MySQL."

opensource: del.icio.us tag/opensource

What if (Monty Widenius)

"Why is it that the people working on Drizzle are extremely enthusiastic about Drizzle? This is because Drizzle solves many of the problems that MySQL's development has had for years."

opensource: del.icio.us tag/opensource

Drizzle, Clouds, "What If?"

"What if"

Have you ever wanted to know what would happen if you had taken a different direction?

A number of months ago I was on the phone with the Rackspace CTO talking about Memcached and Gearman, and the work I am doing there. He had asked me if I had ever thought about creating a slimmed down version of MySQL to work with them.

The answer?

Of course!

This is something I get asked with some regularity. People will want MySQL done in some fashion other then what it does by default (aka what serves most of the user base). Taking the code and cutting out the one thing the requester dislikes is a pretty boring task. I keep a notebook of these requests. I consider it valuable feedback, not as single items, but as a whole.

After I got off the phone the idea stuck in the back of my mind. The next day I woke up and started playing with the idea of seeing what a cut down version would look like. One day turned into months. Somewhere along the way I decided to bring in a few people and see what they thought.

The reaction was more ideas, and code followed. We went about asking the question of "What if".

Stored Procedures, Views, Triggers, Query Cache, and Prepared Statements are gone for now. The field types have been simplified and there is an open debate about the SHOW commands (I am falling into the camp that think they may just belong in the client application but not in the server).

Will any of this go back in? It is hard to say. The goal right now is to target a certain class of applications/developers and see if this is useful. As an example:

1) Web based apps.
2) Cloud components.
3) Databases without business logic (aka stored procedures).
4) Multi-Core architecture.

What are the bigger differences in philosophies?

Adopt external libraries, use open source mechanics for contributions, keep to an open style of communication, and remove the line between internal and external development. Essentially do what I have been referring to as "Organic Open Source". We have focused on C99, POSIX and autotools based systems. We have taken to a very micro-kernel design where code is being removed from the center and pushed out to the edges via interfaces. We are taking a Linux/Apache tightly coupled design for modules.

Can you download it and use it?

Nope.

Not there yet, and we do not have a target date of when we will be ready for production use. If you want to work on this right now you are going to have to roll up your sleeves (hint, "bzr branch lp:drizzle"). I would really encourage you to go read the mailing list (http://launchpad.net/drizzle). Keep an open mind. We are not looking to be 100% compatible with MySQL, so this is certainly not a replacement for MySQL.

More questions?

The FAQ covers things like how to get involved, code license, etc.

Linuxcast has an podcast this week with me talking about Drizzle:
http://www.linuxworld.com/podcasts/linux/2008/071808-linuxcast.html


PS. Launchpad has turned out to be pretty awesome. You should be thinking about using it. Do not get caught up in the git vs bzr vs hg debate. It is not about the tool, think iPod/iTunes. It is about the infrastructure around it.
scan00088.jpg

MySQL: Planet MySQL

Drizzle, Clouds, "What If?"

"What if"

Have you ever wanted to know what would happen if you had taken a different direction?

A number of months ago I was on the phone with the Rackspace CTO talking about Memcached and Gearman, and the work I am doing there. He had asked me if I had ever thought about creating a slimmed down version of MySQL to work with them.

The answer?

Of course!

This is something I get asked with some regularity. People will want MySQL done in some fashion other then what it does by default (aka what serves most of the user base). Taking the code and cutting out the one thing the requester dislikes is a pretty boring task. I keep a notebook of these requests. I consider it valuable feedback, not as single items, but as a whole.

After I got off the phone the idea stuck in the back of my mind. The next day I woke up and started playing with the idea of seeing what a cut down version would look like. One day turned into months. Somewhere along the way I decided to bring in a few people and see what they thought.

The reaction was more ideas, and code followed. We went about asking the question of "What if".

Stored Procedures, Views, Triggers, Query Cache, and Prepared Statements are gone for now. The field types have been simplified and there is an open debate about the SHOW commands (I am falling into the camp that think they may just belong in the client application but not in the server).

Will any of this go back in? It is hard to say. The goal right now is to target a certain class of applications/developers and see if this is useful. As an example:

1) Web based apps.
2) Cloud components.
3) Databases without business logic (aka stored procedures).
4) Multi-Core architecture.

What are the bigger differences in philosophies?

Adopt external libraries, use open source mechanics for contributions, keep to an open style of communication, and remove the line between internal and external development. Essentially do what I have been referring to as "Organic Open Source". We have focused on C99, POSIX and autotools based systems. We have taken to a very micro-kernel design where code is being removed from the center and pushed out to the edges via interfaces. We are taking a Linux/Apache tightly coupled design for modules.

Can you download it and use it?

Nope.

Not there yet, and we do not have a target date of when we will be ready for production use. If you want to work on this right now you are going to have to roll up your sleeves (hint, "bzr branch lp:drizzle"). I would really encourage you to go read the mailing list (http://launchpad.net/drizzle). Keep an open mind. We are not looking to be 100% compatible with MySQL, so this is certainly not a replacement for MySQL.

More questions?

The FAQ covers things like how to get involved, code license, etc.

Linuxcast has an podcast this week with me talking about Drizzle:
http://www.linuxworld.com/podcasts/linux/2008/071808-linuxcast.html


PS. Launchpad has turned out to be pretty awesome. You should be thinking about using it. Do not get caught up in the git vs bzr vs hg debate. It is not about the tool, think iPod/iTunes. It is about the infrastructure around it.
scan00088.jpg

Brian-Aker: Brian "Krow" Aker's Idle Thoughts

Found an Ideal I/O Scheduler for my MySQL boxes

Today I was doing some work on one of our database servers (each of them has 4 SAS disks in RAID10 on an Adaptec controller) and it required huge multi-thread I/O-bound read load. Basically it was a set of parallel full-scan reads from a 300Gb compressed innodb table (yes, we use innodb plugin). Looking at the iostat I saw pretty expected results: 90-100% disk utilization and lots of read operations per second. Then I decided to play around with linux I/O schedulers and try to increase disk subsystem throughput. Here are the results:

(more…)

MySQL: Planet MySQL

Fwd: Scorching hot Startup Needs Scalability Sorcerer and Optimization Freak

Question: Do you think you have what it takes to take a service from a few hundred thousand users to tens of millions of users in 1 year flat? If you do read on and perhaps become the next beloved scalability rockstar of our age.

We are looking for a data charmer. A mysql magician. A code hack. A funny man. A mad man. A passionate man. Or perhaps a woman who does all these things and more.

Here’s what you gotta do:

  • Pro-active and reactive performance analysis, monitoring and general database plumbing of all leaky issues.
  • Work with others on the team to help maintain/improve and support the infrastructure for a high traffic, high growth site
  • Optimize and tune the database day to day
  • Algorithmic bent. Develop algos to quicken search times, response times, find shortest paths between various connections on site.
  • Have solid low level networking/protocol/computer security skills
  • Log everything. Usage stats, search stats, user behaviour stats. Draw conclusions. Constantly refine and tinker.
  • Help with periodic large storage migrations
  • Work intimately with operations, development, and strategy team to ensure smooth deployments of new iterations, high availability of database services.
  • Understand capacity planning. Always thinking 10 steps ahead. (Whether it means looking at distributed systems services, cloud computing options, evaluating HA models used in other industries etc)
  • Have a pulse on the state of the web, social media, social networking, different scalability architectures, benefits/negatives of each.
  • Interest in high concurrency, distributed systems architectures.
  • General low level hacking/scripting/optimizations in perl/python.
  • Evaluate changing conditions in the archi
  • Think creatively. No dogmatists.

Ideal skillset:

  • BS in Comp Sci or equivalent
  • 5+ years experiene with Linux/Unix systems
  • 3+ years with MySQL in production environment
  • Knowledge and experience with partitioned architectures and a database sharding techniques
  • Capacity planning/high growth planning/emergency planning experience
  • Passion, bordering on paranoia, for hunting bottlenecks, and optimizing IO operations
  • Experience with MySQL replication
  • Deep experience with MySQL internals
  • Experience with performance analysis tools, storage engines, backup methodologies for MySQL
  • Great perl/shell scripting experience
  • Team player, self motivated, able to handle high stress situations while maintaining a calm disposition
  • Great communication skills, attention for detail, and an interest in the business side of the equation of systems/scale planning
  • Eat/sleep/breathe the web, startups, and the landscape of the social web
  • Insomniac

We’re ready to offer an aggressive salary with tremendous upside by way of stock options, commensurate with your experience, your drive and your results.

Apply directly to:

net ‘dot’ startup ‘at’ googles mail service dot com

by sending us a CV/resume, and optionally, a link to your blog or Linkedin profile.

MySQL: Planet MySQL

Ivan Needs Your Help

Please help save Ivan, son of Andrii Nikitin (MySQL Support Engineer), who needs a bone marrow transplant. Andrii’s message is below:

“My family got bad news - doctors said allogenic bone marrow transplantation is the only chance for my son Ivan.

“8 months of heavy and expensive immune suppression brought some positive results so we hoped that recovering is just question of time.

“Ivan is very brave boy - not every human meets so much suffering during whole life, like Ivan already met in his 2,5 years. But long road is still in front of us to get full recover - we are ready to come it through.

“Ukrainian clinics have no technical possibility to do such complex operation, so we need 150-250K EUR for Israel or European or US clinic. The final decision will be made considering amount we able to find. Perhaps my family is able to get ~60% of that by selling the flat where parents leave and some other goods, but we still require external help.”

– Andrii Nikitin, MySQL Engineer

For donation: Donation can be made through PayPal (via MySQL/Sun website)

MySQL: Planet MySQL

MySQL - Get GlassFish & MySQL Unlimited and Save

Combined, GlassFish and MySQL are a perfect combination that can cut total cost of ownership, enable you to accelerate&lt;sep/&gt;

GlassFish: del.icio.us/tag/glassfish

MySQL

MySQL is the world's most popular open source database software, with over 100 million copies of its software downloaded or distributed throughout its history.

opensource: del.icio.us tag/opensource

MySQL - Get GlassFish & MySQL Unlimited and Save

As much Glassfish (and mysql, blah) as you like, with support, for a set fee.

GlassFish: del.icio.us/tag/glassfish

Giving control of your data to the cloud

I’ve been doing some research and evaluation of more cloud computing. Specifically my focus has been on data store, and considering how to augment an existing operation using a popular database such as MySQL.

I’ve been looking first at Google App Engine and now I have my SimpleDB Beta today will be looking here next.

Some observations I’ve struggled with are:

  • No Native CLI, say for basic data setup. You can do some programmatic input for SELECT statements in a Query object in a SQL like syntax called GQL, but you can’t do DML
  • No simple data viewer. In production you would not do this, but I’m in evaluation and still looking at functionality, verification of results etc. A phpmyadmin clone is what I’m seeking for example. I suspect this would have been a good Google Summer of Code project.
  • Python only. While this was great for me to need to spend a 1/2 day to learn about Python syntax, it was just another small starting hurdle. If your organization doesn’t do or use Python, this is another skill or resource needed.

But the biggest concern and hurdle I’m understanding from my traditional principles is loss of control. Loss of control for monitoring and instrumentation, performance, availability and backup and recovery.

Recent issues in performance and unavailability have highlighted that App Engine is not suitable for mission critical web sites, not as your primary focus. I see huge benefits in augmenting access to information, perhaps more historical for example, a well define API within your application could easily support options to consider cloud storage as a secondary storage and primary retrieval of less important data.

My focus when revisiting here will be looking at means of object translate between tables and the Data Store and maybe an API for data transfer etc.

I suspect that when I get to evaluating EC2/S3 more I will have much more support by being able to leverage existing tools and techniques.

MySQL: Planet MySQL

Handling Disaster 101

I’ve had to accept the “practice what you preach” pill recently due to a disaster at my hosting provider. See Learning from a Disaster.

While it was my own personal site on a dedicated server in question and not a business generating review I found that my MySQL Backup Strategy was incomplete ( It is also based on code 4 years old). I found that I had not tested my Disaster Recovery Plan adequately. I have used my backup and recovery approach in the past for various controlled situations and testing successfully.

So what mistakes did I make. There were two.

1. I was using a cold backup approach. That is specifically copying the entire MySQL Database in a controlled manner at the file system level. These were also copied to an alternate shared hosting server for storage. This works fine when you backup server supports a means of restoring data in this format, however if your backup shared hosting facility does not give you access to the MySQL data area, then this does not work. Not wanting to pay for two dedicated hosts this backup solution is impractical for my present hosting. Time to consider alternatives, such as being prepared with an EC2 image.

2. Recently I moved to using two MySQL instances, both 5.0 GA and 5.1 RC. The problem is I didn’t adjust my backup scripts appropriately to reflect two instances. Of course when my server was unavailable for 43 hours I was completely screwed in at least I could only throw an Site Unavailable page rather then my website. Combined with my hosting provider totally screwing DNS and admin access to manage DNS for 3-4 days more then almost 2 days of total server unavailability I also had to change support for my DNS. Would have been easier if I had a DNS DR plan.

The lesson is, no matter how trivial the information or website, if you actually want the content, make sure you test your recovery strategy ahead of a disaster.

MySQL: Planet MySQL

Pondering MapReduce

I spent this past weekend writing a Paper for a project I’ve been playing with. It is a simplified distributed processing system loosely based on Google’s MapReduce, except rather than focusing on larger batch jobs, it prototypes out some common database application uses. The model is currently very basic, but I plan on exploring this further (possibly with a performance-enhanced implementation in C). I’ve also been reading up on other interesting projects like Hadoop, HyperTable, Amazon’s SimpleDB, and of course the DB interface for Google’s AppEngine. I’m wondering how these distributed models can apply to relational databases (the Pig project has some relational constructs).

MySQL: Planet MySQL

Best MySQL Server Under $10K?

Server picture I want to get opinions from outside of my daily circle of people on the best server hardware to use for MySQL. I remember from the conference somebody (Pipes?) mentioning a particular Dell server with multiple disk RAID10 that could supposedly be had for about $6k but I completely misplaced the model number (Frank, did you get my email?).

I know that a multi-disk RAID array with a bunch of fast disks (15k RPM?) is probably the most important method of improving performance, followed by the amount of RAM, so I'm trying to find the best combination/balance of the two. However, server prices on the Internet range so much that I don't even know where to begin to tell a good deal from a bad one. I don't think SSDs can play a role here, because we need at least 200GB of usable space per machine. For comparison, we currently use the following spec: Dual quadcore Intel, 16GB RAM, 200GB RAID1 + 1TB RAID1 using SATA drives (eww?) in a 2U rack (a bit too chunky, isn't it?) made by Zantaz. It performs quite nicely but I think it chokes on the amount of writes way too early.

So, what does everyone think the best configuration is under $10,000? Bonus points if you can provide a link to the vendor site or at least a model number!

Spasibo.

Similar Posts:

MySQL: Planet MySQL

Corruption using MySQL AES_[EN|DE]NCRYPT functions

I was contacted this week by a previous client regarding a failure of processing data. This was a Contact, Financial and Media Management system I developed for a non-for-profit organization a LAMJ stack, and I’ve had to do nothing since deployment in the past 3 years, no bug fixes, no feature enhancements. The only thing lacking is additional management reporting, and data is extracted for this option now.

It runs under commodity Hardware, Linux and MySQL and it’s only unscheduled downtime was an power failure before UPS power was installed. However this all changed last week. Processing of regular scheduled encrypted data simply failed unexpectedly.

A summary of the environment.

  • Data is inserted with AES_ENCRYPT(data,key);
  • Data is retrieved with AES_DECRYPT(data,key);
  • Data is never updated.
  • New data is always added, and historical data always remains.
  • The application has no end user access to modify data.
  • The application has no function anywhere to modify the data, so no rouge happening could have occured.
  • An AUTO_INCREMENT column and TIMESTAMP columns provide a level of auditing for data forensics.
  • Backup copies of data exist up to 3 years for reference.
  • The seed key has not changed.

The problem

Selecting the first 10 rows saved in the table (By AUTO_INCREMENT Key and confirmed by dates), 8 of 10 are now corrupt. Select the last 10 rows inserted, zero are corrupt. Across 20,000 records 75% are now corrupt.

A lot of analysis has been performed to identify and track the data that was recorded, a certain amount of data forensics, and it was confirmed information was successfully processed last month for example. As this performs financial transactions, there is a lot more auditing available and being reviewed however it is simply a mystery that I can’t solve.

  • What options remain? is this a Hardware problem, Disk or even Memory.
  • What other data maybe corrupt?
  • How can more investigation occur to track the cause of the problem.
mysql> select version();
+------------------+
| version()        |
+------------------+
| 4.1.10a-standard |
+------------------+
1 row in set (0.00 sec)

MySQL: Planet MySQL

Updated Website and Professional Blog

For those that have my existing blog bookmarked, or use any RSS/Atom feeds please update your information now.

My new blog can be found at http://ronaldbradford.com/blog/

RSS 2 is http://ronaldbradford.com/blog/feed/rss2, Atom is http://ronaldbradford.com/blog/feed/atom

Ok, so before you read on please change now. While all old links will redirect I encourage you to take a few moments in your browser and RSS reader.

So what’s new. Well as Baron stated in I moved this blog to pairLite with zero downtime, and it was easy I moved my old blog site blog.arabx.com.au and my temporary staging site ronaldbradford.com and transitioning my passworded beta site all with no downtime. Even with all my testing over several weeks, and several test users, my production migration had one hitch with some internal url re-directs not working correctly, but my overall site main links were not affected. You would have had to been quick at 1:30am on Saturday to catch it.

Other features is a consolidation of information from various sites now all under my brand ronaldbradford.com and allowing me now to expand on that. I’ve also split my professional blog from my personal views, photos and comments as well as new SEO/SEM optimizations. More on these experiences later. My theme was actually inspired by some business cards I had made in Australia while home last week, quality 300GSM color cards that were designed and printed in a just a few hours and are very good quality (which was very impressive).

I’m now free to work on so many other projects including many MySQL draft posts now Version 1.0 of my new site is out. As with all release early, release often sites, expect 1.1 soon!

MySQL: Planet MySQL

Everything open source from Sun

In the recent interview Missed Twitter Questions from Jonathan Schwartz Interview at Web 2.0 Expo Sun CEO Jonathan Schwartz is quoted as saying “Everything Sun delivers will be freely available, via a free and open license (either GPL, LGPL or Mozilla/CDDL), to the community.”

Presently not everything is under one of these licenses. Java getting fully Open Source highlights that Java is getting there, but still contains closed source components. Open Solaris, NetBeans, Glassfish, Virtual Box and Open Office are. Even the mainline Solaris 10 is. Star Office is one that is not.

MySQL is also not there and presently has at least three different licenses. You have GPL for the MySQL Community Server. You have a subscription model for MySQL Enterprise which includes additional bug fixes not available in the Community Server. The subscription model also includes the MySQL Enterprise Monitor, software that will fail to operate if your subscription lapses. Additional upcoming MySQL Proxy features will also be subscription or have been termed “closed source” only. You also have the different model used by MySQL Workbench for licenses.

I know many that work on these products at MySQL do not agree with the various license policies. Indeed recent comments from Rich Green of Sun have also indicated we are not going to change anything at MySQL.

I was initially interested if Sun would move MySQL to the CDDL, however this question was recently raised and Marten Mickos stated there were no plans to move away from the GPL v2.

I can only hope that as Sun continues to promote itself as the largest open source company, these differences subside and disappear, and don’t continue to evolve and change.

MySQL: Planet MySQL

Migrating my blog & updating WordPress

I’m migrating my existing WordPress run blog site at blog.arabx.com.au to a my new site ronaldbradford.com (which is not yet publically available)

As part of this process I’ll be doing a number of upgrades/changes including:

  1. Update blog software to 2.5.1 from 2.0.2 (I’d previously done a 2.0.2 upgrade to 2.3.2, but not deployed)
  2. Migrate to new domain
  3. Upgrade existing MySQL 5.1 version from 5.1.11 to 5.1.24
  4. Migrate database to using MySQL 5.1.24, from 5.0.22 (my server runs 5.0 and 5.1 instances)
  5. Split my blog into Professional & Personal

Upgrading
The upgrade is straightforward, backup database, download latest wordpress software. I run full revision to older versions via directories + symlinks so my installation is more complicated, but fully recoverable. Install, and run upgrade script. That all works, but my site breaks. Suspecting is my heavily customized them, by disabling that my site is up. One to add to the TODO list.

Migrate to new domain
Dump + Reload data into new schema. Copy Wordpress install. I had to make two data changes to correctly use the new domain.


update wp_options set option_value='http://ronaldbradford.com/blog' where option_name='siteurl';
update wp_options set option_value='http://ronaldbradford.com/blog' where option_name='home';

Upgrade MySQL 5.1 version
That was also relatively straightforward. I was surprised I was running such an old 5.1.11, but I remember originally using 5.1.6 in production use before that.

Migrate database to 5.1 from 5.0
This is where my problems have begun. Wordpress does not appear to like host+port stuff. I can confirm access via MySQL client. At 9:30pm Sunday night, this may have to remain in the unresolved bin for a few days.

MySQL: Planet MySQL

Give MySQL a break please

In a unique display of mass hysteria, one blogger after the other and even slashdot (no, I’m not going to link) managed to take the completely innocent message that certain new enterprise features might get released as closed source only and turn it into an ongoing bad press onslaught about “MySQL closing down source code”.

Why don’t you all give MySQL a break here please?  The rule is always the same for everybody: the one that writes the code gets to pick the license.  Listen, I 100% believe in open source and I consider myself to be a big advocate, but commercial open source companies like MySQL (and Pentaho) are commercial entities.  At lease try to put yourself in their position for a second.  For example, if a customer asks you to NOT to release a piece of software they paid for, you don’t release it, it’s that simple.

In the end, what MySQL is doing is simple: they are experimenting with a commercial open source  (COS) model.  Why are they experimenting?  Because the concept of COS is very new and there are no clear guidelines.  It simply hasn’t been done before.  How do you keep growing?  How do you keep paying more open source developers?  How do you pay for the millions of web hits each day?  How do you pay for the millions of downloads, the Tera bytes of internet traffic?  How do you guarantee your long term survival?  How do you strike a balance between commercial success and widespread open source adoption?  How do you keep your investors happy as well as your community?

I guess we learned one thing the past week : it’s easier to spout criticism than to give answers to these tough questions.

Matt

MySQL: Planet MySQL

Making business decisions for the community and the enterprise

I was prompted following a few key words by Marten Mickos at the Sun Dinner on Wednesday evening, and subsequent one on one discussion with Marten, to post my thoughts of some significant news this week announced at the MySQL Conference. The decision to provide as it’s been termed is “Enterprise only features”. It is unfortunate this was not discussed in Marten’s opening keynote, having been exposed the evening before in the Partner’s meeting and hitting the blog sphere before the conference officially started.

MySQL, past, present and future as an Open Source company requires a functional business model to succeed. This includes the funding of resources and the technology progression. It is also necessary in this business climate to build a successful business quickly. How do you do this? Well that’s probably the difference between a successful CEO and an unsuccessful one, and what Marten Mickos has produced is clearly very successful.

I may not necessarily agree with the decisions made, more specifically not understanding at this time the rationale of which features are free and which are likely to be commercial, but I respect the decision made by Marten Mickos. These new feature considerations are in a future release of MySQL, they are also I’m sure not yet set in stone, however MySQL can not be all things to all people, no software can. It reminds me of the Homer Simpson car, designed to do everything Homer wanted in a car, but it bankrupted the previously successful company due the views of one individual to solve all their own needs, but not the needs of the majority. Who is affected by this decision, who will benefit, again it’s too early to tell.

Monty indicates this is a MySQL decision, not a Sun decision. This indicates the transition of MySQL to being under the Sun banner of the largest open source company is well, still in transition. Today it was again confirmed to me, that the MySQL database will always be GPL and MySQL will not never revoke the functionality that powers the world’s largest websites as free software for the database server.

As an advocate for the MySQL community, I’d like to consider myself one of the pulses, a thought in the MySQL conscious and even a vocal lobbyist. I am however not interested in being a disruptor in the MySQL ecosystem. A Communications Lesson on Slashdot I believe correctly states “Sun to Begin Close Sourcing MySQL.” The headline is wrong.

A number of people have posted their comments, let’s stop bickering about it, and let’s see something positive happen for the benefit of the community. For example, as I’ve mentioned previously regarding the lack of differentiation for the Community version, again mentioned by Mark Callaghan in A better (community) HEAP engine where a worthwhile patch can’t be of benefit to the community in a binary release for the lay person.

As new COO of PrimeBase Technologies and in charge of the Business Solutions division, I am responsible for contributing to the success of the company, to play a significant role in the functional business model, to ensure funding for resources and technology progression. In other terms, how can revenue generation be achieved to fund prominently salaries of staff, including my own. How am I going to do this? This will be the difference between my huge success or not.

MySQL: Planet MySQL

Continued confusion in MySQL/Sun release policy

In review of some list posts today, I came across the Falcon Preview 6.0.5 downloads available from the MySQL Forge (even that is unclear, but the directory indicates this on the forge). The Forge Wiki Documentation indicates the 6.0.5 release features (without a download link), however the official MySQL downloads for 6.0, and the directory structure for all MySQL releases only describes 6.0.4.

Nothing in the MySQL News and Events of this new release. The documentation for 6.0.5 is also unclear as it lists this version as Not Yet Released.

This comes on the heels of “Pending General Availability/near-final release candidate” (See here and here), and the Press Releases for Sun Celebrates Third-Party MySQL Storage Engines only to be negated by Sun Microsystems Announces MySQL 5.1 which broke the plugins as they were not compatible.

I don’t wish to create confusion, but it’s unclear what is happening here, not withstanding the new preview only has 5 builds and excludes Mac OS/X which means I can’t natively test.

MySQL: Planet MySQL

Media Blob Streaming getting a Google boost

The 2008 Google Summer of Code MySQL Projects are now available. MySQL has 14 listed projects, one of the ~190 different Open Source products listed. Unfortunately there is no summary to see the total number of projects being sponsored across all products.

Media Blob Streaming actually has the luxury of two approved projects, so we will have plenty of mentoring work at PrimeBase Technologies.

Raj Kissu Rajandran will be working on BLOB Streaming Support for phpMyAdmin and KishoreKumar Bairi on Streaming Enabled MySQL Driver for PHP. Welcome to our team for your respective projects.

MySQL: Planet MySQL

Back from MySQL User Conference

I thought I had already conquered the jet lag last night when I fell asleep at 22:00 (that's 10 pm...). But then I woke up somewhere around 2:30, read a book for a while until I decided to give up and have a nice breakfast. I kind of like jet lag though, it is a nice feeling to get out of bed 5:30, when no one else is awake, and birds are singing outside.

There would be a lot to tell about the conference, but you kind of find summaries from many live bloggers on Planet MySQL. One interesting aspect of the conference of course was to meet so many interesting people, many of whom I work with of course, but meeting them in flesh is still great. And Santa Clara being in Silicon Valley adds another funny revelational feeling to it all. For a European Yahoo, Google, Digg and others are Internet companies and seeing that they actually do have tangible offices in Silicon Valley was a surprisingly unreal revelation to me. Oh yes, I also saw the Transmeta offices, you know, where Linus went to work after graduating with the M.Sc. work titled "Linux: A portable operating system" (What did YOU do for master's thesis? I know I did multiple choice quizzes, I'm not kidding...). And when strolling in Santa Clara I was also amused to find out that apparently Freedom also begins there, just a few blocks away from where the conference was held!

Road sign says "Begin Freedom CIR"

read more

MySQL: Planet MySQL

The database frontier

Jay’s opening lines regarding the final MySQL Conference keynote speaker was: “I work with a lot of data. I think peta-bytes, maybe exa-bytes”. This was relating to Jacek Becla from the Stanford Linear Accelerator Center, giving his presentation on “The Science and Fiction of Petascale Analytics”.

The goal of the Large Synoptics Survey Telescope (LSST) is the storage of 50+ PB of images and 20+ PB data.
Let’s just clarify the size. 20 PB of data = 20 years of HD Movies = 2000 years of 128kb MP3

The next database frontier is obviously building huge databases. What part will MySQL or other relational databases play? Some interesting facts were.

  • The Digital Universe Created 161 Exabytes of data last year.
  • Google, processes 20 petabytes of data per day.

The Operational plan for LSST Project Timeline is 10 years, only starting in 2014. The timeline:

  • 2009 Choosing Technology
  • 2010-2014 constructions
  • 2014-2023 production

The primary goals are: Scale, parallelize, fault tolerant.

MySQL: Planet MySQL

MySQL Conference Liveblogging: Optimizing MySQL For High Volume Data Logging Applications (Thursday 2:50PM)

  • http://en.oreilly.com/mysql2008/public/schedule/detail/874
  • presented by Charles Lee of Hyperic
  • Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application
  • I suddenly remember hyperic was highly recommended above nagios in MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)
  • performance bottleneck
    • the database
      • CPU
      • memory
    • IO
      • disk latency
      • network latency
    • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
    • agent collects data and sends reports to server with multiple data points
    • server batch inserts metric data points
    • if network connection fails, agent continues to collect but server "backfills" unavailable
    • when agent reconnects, spooled data overwrite backfilled data points (why not use REPLACE for all inserts?)
  • things are very basic so far
  • batch insert
    • INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),…
    • using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%
    • batch inserts are limited by 'max_allowed_packet'
  • other options for increasing insert speed
    • set unique_checks=0, insert, set unique_checks=1 (definitely need to make sure data is valid first)
    • set foreign_key_checks=0, insert, set foreign_key_checks=1 (same concerns as above)
    • Hyperic doesn't use the 2 above
  • INSERT … ON DUPLICATE KEY UPDATE
    • when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax
    • it's much slower but it allows
  • this is all basic, where are the performance tweaks?!
  • batch aggregate inserter
    • queue metric data from separate agent reports
      • minimize number of inserts, connections, CPU load
      • maximize workload efficiency
    • optimal configuration for 700 agents
      • 3 workers
      • 2000 batch size seems to work best
      • queue size of 4,000,000
    • this seems to peak at 2.2mil metric data inserts per minute
  • data consolidation
    • inspired by rrdtool
    • lower resolution tables track min, avg, and max
    • data compression runs hourly
    • size limit 2 days
    • every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc
    • this is is a good approach if you don't care about each data point
  • I'm overwhelmed by the amount of "you know"s from the speaker. Parasite words, ahh! Sorry Charles )
  • software partitioning
    • measurement data split into 18 tables, representing 9 days (2 per day)
    • they didn't want to do more than 2 SELECTs to get data per day, hence such sharding
    • oddly, Charles didn't actually use the word 'shard' once
    • tables truncated, rather than deleting rows => huge performance boost
    • truncation vs deletion
      • deletion causes contention on rows
      • truncation doesn't produce fragmentation
      • truncation just drops and recreates the table - single DDL operation
  • indexes
    • every InnoDB table has a special index called the clustered index (based on primary key) where the physical data for the rows is stored
    • advantages
      • selects faster - row data is on the same page where the index search leads
      • inserts in (timestamp) order - avoid page splits and fragmentation
    • shows comparison between non-clustered index and clustered index (see slides)
  • still no mention of configuration tweaks
  • UNION ALL works better than inner SELECTS because the optimizer didn't optimize them enough (at least in the version these guys are using, not sure which)
  • recommended server options are on the very last slide, I was waiting for those the most! I guess I'll look up the slides after
Similar Posts:

MySQL: Planet MySQL

MySQL Conference Liveblogging: MySQL Hidden Treasures (Thursday 11:55PM)

  • Damien Seguy of Nexen Services presents
  • easiest session of all (phew, that's a relief)
  • clever SQL recipes
  • tweaking SQL queries
  • shows an example where SELECT is ORDERED by a column that is actually an enum.
    • an enum is both a string and a number
    • sorted by number
    • displayed as string
    • can be sorted by string if it's cast as string
  • compact column
    • compacts storage
    • faster to search
    • if (var)char is turned into enum, some space can be saved, shows example
  • random order
    • order by rand(1) - obviously
    • the integer parameter is actually a seed
    • really slow, also obviously, especially for larger tables because it has to order first, then apply rand() to the list
    • another solution is to add an extra column, put random values into it, and add index, then
      • UPDATE tbl SET chaos=RAND();
    • random extraction
      • SELECT id, col FROM tbl JOIN
          (SELECT CEIL(RAND() *
            (SELECT MAX(id) FROM tbl)) AS r)
        AS r2 ON id=r;
    • for multiple rows, Damien offers a solution involving an integer table (see his own slides). This solution is supposed to be really fast.
  • integer table
    • generate a table with only 10 integers
    • do cross joins to get 100, 10,000, etc integers
    • 10mil rows takes only 13 seconds (on his laptop), then either store it in another table or just use the result set
    • yet again, code is shows, so see slides
  • generating alphabet
    • shows a handy query
  • generating text
    • offers a long query that uses ELT(), CONCAT(), and RAND() (see slides)
    • a SELECT that builds words uses GROUP_CONCAT()
    • looks pretty useful
  • ASCII art
    • SELECT version, REPEAT('*', percentage * 2) AS bars FROM php_versions;
    • shows a nifty result set with stars as fillers in percentage bars
    • shows another result set, similar to the one above, with spaces as fillers and stars at the end
    • shows another example that looks like some sort of a spaceship, from mysql forge
  • GROUP_CONCAT (can I get a woot? I love this function)
    • CONCAT() and CONCAT_WS() now for groups
    • ORDER BY
    • SEPARATOR
    • limited to 1kb by default, change group_concat_max_len
    • shows example (see slides)
  • how to get a second to last value?
    • uses GROUP_CONCAT, looks horribly long and inefficient, there has to be a better way
    • somebody suggests using ORDER BY with LIMIT
  • rankings
    • uses MySQL vars
    • not as many people as I thought seem to know about it, judging by the low amount of raised hands
    • SET @var:=3
    • SELECT @var;
    • variables are not shared between connections, destroyed at the end of the connection
    • uses LEAST()
    • see slides
  • agile loading
    • SET @i:=0
    • LOAD DATA INFILE '/tmp/stats.txt'
      INTO TABLE statPHPload
        (@date, @php, @country, @continent)
      SET
        id = 0,
        period = date(STR_TO_DATE(@date, '%d-%b-%y')),
        rank = (@i := @i + 1),
        php = CAST(REPLACE(@php, ',', '.') AS DECIMAL),
        country = @country;
Similar Posts: