» tagged pages
» logout

(Feed found, click Add Page to syndicate.) Error finding feed, please try again » Find feed title

A Blog Page allows you to add entries, for news or other time sensitive postings

(Login required to save to your tagged pages.)
(or Cancel)

Make further edits, (or Cancel)

(Login required to save to your tagged pages.)
(or Cancel)

(Editing anonymously: to be credited for your changes, login or register a new account)

Change Page Permissions? Changing these permissions will adjust who can modify this page.

Anonymous (change)
(change)
(or Cancel)
Upload an image from your computer:
or Copy an image from a URL:
or Erase the current icon:
Icon Preview:

or Cancel

Erase scale? The contents of scale page and all pages directly attached to scale will be erased.

or Cancel

(Editing anonymously: to be credited for your changes, login or register a new account)

other page actions:
scale

scale

sorted by: recent | see : popular
Content Tagged scale

[from bushwald] Anycast

"Anycast is a network addressing and routing scheme whereby data is routed to the "nearest" or 'best' destination as viewed by the routing topology."

User:jeyrb: del.icio.us/network/jey

On Clouds, the Sun and the Moon

The main value proposition of cloud computing is better economics, that it’s cheaper to rent hardware, software platforms and applications (via a per-usage or subscription model) than it is to buy, build and maintain them in the corporate data center. But if we expect that cloud computing is here to stay –- and not just a passing fad –- it must be feasible for the cloud providers themselves. So how do they do it?

They do it by leverage economies of scale. Put simply, the idea is that one very large organization can more efficiently build and operate its infrastructure than many small firms can on their own. To better understand this, let’s break down some of the financial advantages leveraged in cloud computing:

Specialization: Specialization is also known as division of labor, a term coined by the father of modern economics, Adam Smith. A company for whom running a large-scale data center is a core part of its business will do so much more cost-effectively than a company for whom it’s merely one aspect. The former will hire the best experts in the world, and will have the management attention required to continuously innovate, optimize and improve operations. And the overhead costs associated with doing so will spread thinly across massive usage. Case in point: Since it needed to use hundreds of thousands of servers, it was worthwhile for Google to build its own, homegrown devices to fit its exact power supply and fault-tolerance needs.

Although in software, anyone can build anything with enough people, time and money (as my old boss used to say, “It’s all ones and zeros”), it makes no sense for individual companies to develop capabilities such as dynamic provisioning, linear scalability and in-memory data partitioning when they’re readily available from off-the-shelf products.

Purchasing Power: Large organizations buy in bulk, which they can leverage to negotiate lower prices. So presumably the cloud provider can acquire lower-costing servers and networks, operating systems and virtualization software. Furthermore, they can negotiate better interest rates, insurance premiums and other contracts.

Utilization: This is perhaps the most important one and what I like to call the Kindergarten Principle, or “sharing is good.” In computing, tremendous savings can be achieved by having multiple companies share the same IT infrastructure.

Experts estimate average data center utilization rates range from 15 percent to 20 percent. If you include the processing, memory and storage capacity available on company-owned laptops and desktops as well, utilization rates may be as low as 5 percent. That’s a lot of waste. Imagine if this were the case in the hospitality industry. In most cases, a hotel with even 50 percent average occupancy rates would quickly go out of business.

So why is this happening with corporate IT?

Application loads are volatile; they experience peaks and troughs based on time of day, day of the week or month, seasons and so on. To avoid hitting the “scalability wall,” companies need to overprovision. So if a company expects a certain daily peak volume (for example, the opening of the trading day for an e-trading application), it will provision enough hardware so that utilization rates at the peak reach no more than 70 percent (leaving some room for unexpected loads – hey, Steve Jobs may announce the next iPhone today). But at other times utilization rates could go as low as 10 percent, with the average somewhere in between.

So the difference between peak loads and average loads drives overprovisioning and a high rate of unused computing capacity. But if we aggregate the activities of several companies, we will not face such volatility in application loads. Let’s see why.

Follow the Sun: In many cases, peaks and troughs in application volumes can largely be attributed to the time of day. Human-facing applications are active during daytime and face very low activity during the night. When New York experiences the opening bell trading spike, London is in the midday lull and Tokyo is going to bed. Same goes for e-commerce sites, social networking sites, gaming sites and others, though these types of applications might experience peaks after business hours as well.

If companies around the globe and in different industries share the same resources on the cloud, higher utilization rates will be achieved by the cloud provider, lowering its costs – savings that it can turn around and pass on to its customers. This model of shared resources even addresses the need to overprovision for unexpected peaks, as it is unlikely that all the cloud users, in all geographical regions and all industries will face peaks at the same time. This is similar to the notion of a bank not having all of the cash reserves necessary to handle the cash commitments to all customers at the same time (is there an equivalent to a bank run in cloud computing?).

Follow the Moon: And with so much focus on energy costs, data center power consumption and cooling (not to mention the environment), there’s also a cloud computing approach known as Follow the Moon. It posits that a cloud provider with physical data centers in several different geographical locations can run the applications that are active from the day side of the world in centers on the night side of the world, taking advantage of lower power and cooling costs.

Cloud computing, therefore, is an economically feasible strategy. Over time, the cost savings will be too compelling for all but the very largest companies to ignore.

Geva Perry is the chief marketing officer of GigaSpaces

If this story interests you then you should definitely check out our upcoming conference, Structure 08.

Technology-News: GigaOm

Scala - Musical Scales/Tuning Experimenter's Tools

Scala is a powerful software tool for experimentation with musical tunings, such as just intonation scales<sep/>

scala: del.icio.us/tag/scala

MySQL Conference Liveblogging: Optimizing MySQL For High Volume Data Logging Applications (Thursday 2:50PM)

  • http://en.oreilly.com/mysql2008/public/schedule/detail/874
  • presented by Charles Lee of Hyperic
  • Hyperic has the best performance with MySQL out of MySQL, Oracle, and Postgres in their application
  • I suddenly remember hyperic was highly recommended above nagios in MySQL Conference Liveblogging: Monitoring Tools (Wednesday 5:15PM)
  • performance bottleneck
    • the database
      • CPU
      • memory
    • IO
      • disk latency
      • network latency
    • slow queries
  • media size deployment example
    • 300 platforms (300 remote agents collecting data)
    • 2,100 servers
    • 21,000 services (10 services per server), sounds feasible
    • 468,000 metrics (20 metrics per service)
    • 28,800,000 metric data rows per day
    • larger deployments have a lot more of these (sounds crazy)
  • data
    • measurement_id
    • timestamp
    • value
    • primary key (timestamp, measurement_id)
  • data flow
    • agent collects data and sends reports to server with multiple data points
    • server batch inserts metric data points
    • if network connection fails, agent continues to collect but server "backfills" unavailable
    • when agent reconnects, spooled data overwrite backfilled data points (why not use REPLACE for all inserts?)
  • things are very basic so far
  • batch insert
    • INSERT INTO TABLE (a,b,c) VALUES (0,0,0), (1,1,1),…
    • using MySQL batch insert statements vs prepared statements with multiple queries in other databases seems to improve overall performance by 30%
    • batch inserts are limited by 'max_allowed_packet'
  • other options for increasing insert speed
    • set unique_checks=0, insert, set unique_checks=1 (definitely need to make sure data is valid first)
    • set foreign_key_checks=0, insert, set foreign_key_checks=1 (same concerns as above)
    • Hyperic doesn't use the 2 above
  • INSERT … ON DUPLICATE KEY UPDATE
    • when regular INSERT fails, retry batch with INSERT ON DUPLICATE KEY syntax
    • it's much slower but it allows
  • this is all basic, where are the performance tweaks?!
  • batch aggregate inserter
    • queue metric data from separate agent reports
      • minimize number of inserts, connections, CPU load
      • maximize workload efficiency
    • optimal configuration for 700 agents
      • 3 workers
      • 2000 batch size seems to work best
      • queue size of 4,000,000
    • this seems to peak at 2.2mil metric data inserts per minute
  • data consolidation
    • inspired by rrdtool
    • lower resolution tables track min, avg, and max
    • data compression runs hourly
    • size limit 2 days
    • every hour, data is rolled up into another table that holds hourly aggregated values with size limit 14 days, then that one gets rolled up into a monthly table, etc
    • this is is a good approach if you don't care about each data point
  • I'm overwhelmed by the amount of "you know"s from the speaker. Parasite words, ahh! Sorry Charles )
  • software partitioning
    • measurement data split into 18 tables, representing 9 days (2 per day)
    • they didn't want to do more than 2 SELECTs to get data per day, hence such sharding
    • oddly, Charles didn't actually use the word 'shard' once
    • tables truncated, rather than deleting rows => huge performance boost
    • truncation vs deletion
      • deletion causes contention on rows
      • truncation doesn't produce fragmentation
      • truncation just drops and recreates the table - single DDL operation
  • indexes
    • every InnoDB table has a special index called the clustered index (based on primary key) where the physical data for the rows is stored
    • advantages
      • selects faster - row data is on the same page where the index search leads
      • inserts in (timestamp) order - avoid page splits and fragmentation
    • shows comparison between non-clustered index and clustered index (see slides)
  • still no mention of configuration tweaks
  • UNION ALL works better than inner SELECTS because the optimizer didn't optimize them enough (at least in the version these guys are using, not sure which)
  • recommended server options are on the very last slide, I was waiting for those the most! I guess I'll look up the slides after
Similar Posts:

MySQL: Planet MySQL

Page 1 | Next >>
Username:
Password:
(or Cancel)