» tagged pages
» logout

(Feed found, click Add Page to syndicate.) Error finding feed, please try again » Find feed title

A Blog Page allows you to add entries, for news or other time sensitive postings

(Login required to save to your tagged pages.)
(or Cancel)

Make further edits, (or Cancel)

(Login required to save to your tagged pages.)
(or Cancel)

(Editing anonymously: to be credited for your changes, login or register a new account)

Change Page Permissions? Changing these permissions will adjust who can modify this page.

Anonymous (change)
(change)
(or Cancel)
Upload an image from your computer:
or Copy an image from a URL:
or Erase the current icon:
Icon Preview:

or Cancel

Erase Informix? The contents of Informix page and all pages directly attached to Informix will be erased.

or Cancel

(Editing anonymously: to be credited for your changes, login or register a new account)

other page actions:
informix

informix

Tags Applied to informix

No one has tagged this page.

Informix Wiki Pages

What is informix? Edit this page and describe it here.

sorted by: recent | see : popular
Content Tagged Informix

Log Buffer #101: a Carnival of the Vanities for DBAs

Welcome the the 101st edition of Log Buffer, the weekly review of database blogs.

This edition was originally claimed by Ward Pond for his SQL Server Blog. Unfortunately, Ward is, in his own words, “dealing with the aftermath of a burst appendix,” which is a very good reason not to spend your time at the computer. Ward, heal up soon! We’ll see you on LB before too long.

In lieu of the normal Log Buffer, I throw it open to our readers. Please leave a comment mentioning your favourite database blog items from the week that was, and anything else you care to say about them.

LB will be back to normal next Friday. See you then!

MySQL: Planet MySQL

Generating XML from IDS 9.x

Generating XML from IDS 9.x

XML: del.icio.us/tag/xml

Open Source ETL tools.

The other day I was looking for a open source, feature-rich, high performance ETL tool to use in an enterprise environment. I was disappointed nothing really seemed to match my requirements. Have I overlooked something or is this really a niche where there aren’t any viable projects? After looking in the usual places like sourceforge.net and doing a bunch of Google searches. I could not find any products that fit the bill. Here are (some of) my criteria:

  • Fast. The candidate tool has to be able to move huge amounts of information between the source and target databases quickly.
  • Flexible error handling. Data errors occur all the time, and when errors are encountered, we should be able to stop processing or log the error to a file or push the record into a violations table for subsequent processing. There are probably other popular strategies for handling errors, such as changing the offending data and trying to insert it again. Errors like this often occur when there are serial number columns or time stamp conflicts.
  • Multi-database and multi-platform support. multi-database really means all the “biggies”. Oracle, DB2, Informix, MySQL, MS SQL Server, PostgreSQL and Sybase. Multi-platform in this case basically means every flavor of Unix and Linux. I would be willing to consider using MS SQL Server as a data source and a target with the ETL job running remotely on a Unix/Linux machine. Isn’t that what ODBC and JDBC are all about? )
  • East to deploy and administer. Does the product have complicated dependencies? If it’s written in C, does it need specific versions of UnixODBC or vendor CLI or ODBC libraries? If it’s written in Java maybe it needs a specific version of the JDBC driver and Java run time?
  • Effective monitoring tools. It would be nice to see how much data has moved. It would also be nice to be able to determine the estimated completion time for the ETL job. Monitoring repetitive jobs and alerting if the run time is greater than or less than a certain threshold would be nice.
  • Data manipulation features.
    • Data srcubbing/data cleansing. In some cases the data being pulled from a transactional system may have invalid values. There should be a mechanism to establish “business rules” (note the “scare quotes” ) ), and the data should be scrubbed to comply with the business rules. Also, error logs should be maintained for the erroneous data and you’ll need log analysis tools. A huge batch of errors during the ETL process is a certain clue that something has changed (or gone wrong) in the transactional system. Someone should be looking into that problem.
    • Column duplication. During the transform phase of the ETL process, we should be able to replace values in the destination row with other values from the source record. For example, when loading a record with two time stamp columns, you might want to have one of the time stamps “copied” to the other column, replacing that data.
    • Literal value or sequence injection. In this case we want to force a literal value into some column without depending on the database having the correct default set up in the table definition. This can also be used to force a valid unique value into a serial column in cases where there are duplicate serial numbers.
    • Derived columns; reference data. In this case, there may be a requirement to add a time stamp to the record to indicate when it was added to the table. This could be used as an audit trail. Also, if the destination table has a column that represents the som of other columns, it would be a nice feature to be able to populate it “on the fly” as theh data is being loaded into the target.
  • Security. Depending on your requirements, you might want the data to move over the network via SSL or another encrypted mechanism. also, the ETL suite should run as it’s own user and probably in it’s own group, so it’s easy to give it a database role that’s appropriate. also, views can be implemented on the table to facilitate the extraction of the appropriate data.
  • Scheduled execution. a cron-style scheduled with more capabilities than just kicking off the job. the scheduled should support job chains, and conditional execution of jobs when errors occur. It would also be terrific if this has a decent UI, since having the crontab in vi and using vi on a massive pile of shell scripts is not the most productive way to get things done.

Even through the previous list is only a partial list of my requirements, I have not found any open source software that is moving in this direction. Suggestions are welcome!

MySQL: Planet MySQL

Username:
Password:
(or Cancel)