03.08.07

HA JDBC – High Availability JDBC

Posted in General at 2:07 pm by tom

Some time ago I worked on a project which was in need of a way to cluster databases. For those of you who don’t exactly know what database clustering is: database clustering is a way to have multiple databases work together to act like a single database. A cluster of databases typically has the following benefits:

  • A cluster has a query throughput that is much higher than just a single database.
  • A database in the cluster can fail without losing access to the data.

The result is that you now have access to a ‘database’ that is never down and can handle a lot more queries than a single database could. We didn’t care much about the higher query throughput but were very interested in a database that would always be available.

After some research I found out that there were a lot of products available to accomplish this.

We decided to use HA-JDBC (= High Availability Java DataBase Connectivity), an open source Java framework that offers database clustering. In this blog entry, I will tell you about my experiences with this relatively unknown framework, and hopefully share experiences with people who also use it. But first, I will try to explain how this framework does its job.

HOW HA-JDBC WORKS

Normally when you connect to a database, you use a JDBC connection. HA-JDBC wraps one or more connections and acts like a proxy in your javacode. This means that your javacode interacts with the proxy and is transparently communicating with multiple databases.

In order for HA-JDBC to know which connections to proxy, it needs an XML configuration file. This file defines how the cluster is configured. It defines the information needed to connect to the databases, but also the behaviour of the cluster itself.Below you can read about the experiences I’ve had with a few different aspects of HA-JDBC.

SYNCHRONIZATION

One aspect of HA-JDBC is synchronization. Synchronization in HA-JDBC is the process of making an out-of-date database up-to-date again by comparing its data with other databases in the cluster. A database can get out-of-date when it gets turned off for some reason. It could have died on its own or someone could have turned it off on purpose, but in any case: it needs synchronization to have its data up-to-date again. When a database goes down, users of the application will not notice. They can continue using the application like nothing is going on because other databases of the cluster are still available.

When a database is down, it will rapidly get out-of-date because it will not process any updates anymore. When the database is started again, HA-JDBC will pick this up and executes a synchronization strategy that is configured in the XML configuration file. The downside to this is that the built-in strategies of HA-JDBC are not very efficient. These are the ones packaged with HA-JDBC:

  • FullSynchronizationStrategy: Deletes the content of all tables of the database that is being updated and fills them again with data from a different (up-to-date) database in the cluster.
  • DifferentialSynchronizationStrategy: Compares all rows of the out-of-date database with a different (up-to-date) database in the cluster to find out which rows need to be updated, inserted or deleted.
  • PassiveSynchronizationStrategy: There also is a strategy that assumes no updates have taken place during the down-time and is therefore doing nothing.

A nice thing about HA-JDBC is that you can implement your own synchronization strategy. Since the above mentioned strategies will take hours to complete on large tables we decided to write a strategy ourselves. This strategy requires tables to have a timestamp for versioning, does not support deletes but turned out to be a lot faster than the built-in strategies of HA-JDBC.

 

 

ID GENERATION

When using a single database you can let the database itself generate IDs for records you insert. When using multiple databases with HA-JDBC you can still do that, but there is no guarantee that all databases in the cluster will generate the same ID. When a different ID is generated in each database, this will leave your cluster in an invalid state because now all the databases in this cluster contain different data.

Of course there is a solution to this problem but this isn’t pretty. When using an ORM tool like Hibernate, you can specify a generator for the ID field. By default Hibernate makes the database responsible for generating IDs which is not what we want. When using HA-JDBC you should use one of the following generators:

  • UUID-generator
  • HiLo-generator.

These two generators both don’t depend on an individual database, which is just what we need, but they do not produce normal IDs. For example the UUID generator generates IDs like ‘4028828d-0dc7f2a2-010d-c7f2a4d3-0013’. This value is based on the current timestamp and the IP address of the machine the application is on. The HiLo generator generates normal numbers but they don’t increase like you’re used to. It is possible that the first number it generates is 432 and the next one is 33200, which you wouldn’t expect from IDs.

NON-INTRUSIVE?

When I started using HA-JDBC, I expected it to be non-intrusive to our project. Because the only thing we needed was to change our JDBC driver and write a simple XML configuration file for it. But as you have read in this blog entry, you first of all need to write your own strategy for synchronization, and second of all, you probably have to switch to UUIDs. This requires a lot of refactoring all over your code because you are now switching from Long typed IDs to String typed IDs. So in fact it does influence your project more than you would expect.

CONCLUSION

In conclusion, HA-JDBC is very easy to set up and has well written documentation on its website. It performs quite well, especially when writing a customized strategy for synchronization. Since it delegates calls to underlying JDBC drivers directly, it is fast and has full JDBC support. You also don’t need anything else then just your database servers and your application servers. But there are a few issues with HA-JDBC that are a bit annoying, you will probably end up with having UUIDs for records in your database and having to write your own synchronization strategy.

I was wondering if there were any other people that have some experience with this database clustering approach and would like to share their experiences. So if you have any experience with HA-JDBC, don’t hesitate to leave a comment!

21 Comments »

  1. opensourcereader said,

    March 8, 2007 at 5:35 pm

    Thanks for sharing this, have you looked at sequoia ( http://sequoia.continuent.org/ ) when choosing ha-jdbc ?
    Happy coding !

  2. Anonymous coward said,

    March 8, 2007 at 8:15 pm

    With Hibernate, using UUIDs for identifiers is a good idea anyway, since it makes implementing correct hashCode() / equals() semantics in your persistent POJOs vastly simpler. See http://jroller.com/page/jcarreira?entry=overcoming_the_hashcode_object_identity for more details.

  3. tom said,

    March 9, 2007 at 11:48 am

    Yes I did look at Sequoia before I chose HA-JDBC. There actually is a comparison to Sequoia on the HA-JDBC website:
    http://ha-jdbc.sourceforge.net/faq.html#faq-N10139

  4. tom said,

    March 9, 2007 at 11:54 am

    I didn’t realize using UUIDs could have these kind of benefits. But I do know that the downside of using it is that if you create dummy data (sql scripts) or manually change stuff in the database, it is a pain in the ass to have to type in a 32-character long string instead of just a number.

  5. Hanson Char said,

    March 22, 2007 at 7:18 pm

    Hi Tom,

    Thanks for the interesting blog.

    Have you considered using “sequence ID interleaving” ? For example, sequence A of db1 generates id’s like 1,11,21… and sequence A in db2 generates 2,12,22, etc. Both db1 and db2 are in the same cluster.

    Now the allocation of a sequenc id (via a select query) can come from either db1 or db2 but will never collide. An inserted record uses the allocated id and would be the same and unique across all clustered db’s (up to 10 in this case.)

    Any chance you can share the code of the time based synchronization stragtegy mentioned in the blog ?

    Cheers

  6. tom said,

    March 23, 2007 at 12:51 pm

    We never considered using sequence ID interleaving, but I think it is a very good idea to use for HA-JDBC. Unfortunately we are using Mysql 4 which doesnt support sequences. So at the moment this solution is not suitable for us.

    About the sync. strategy: I am allowed to post the code and I will do this as soon as possible. So stay tuned!

  7. Hanson Char said,

    April 6, 2007 at 5:34 pm

    http://hansonchar.blogspot.com/2007/04/oracle-table-differential.html

  8. tom said,

    April 12, 2007 at 9:46 am

    Sorry for the late response, but here it is. http://home.planet.nl/~zumme005/jteam/sync-strategy.rar

    I included all the required dependencies as well as a JUnit test using in memory databases to see the strategy in action. If you you use IntelliJ as your IDE you can open the project file that is included to be able to quickly run the test.

  9. Brent said,

    June 12, 2007 at 6:58 pm

    Tom,

    I need some help on getting started. I don\’t understand how ha-jdbc is started. Is this a separate controller service similar to a controller in sequoia? I\’m a bit confused… Or does this work only inside a container like an mbean? Can you use it outside of a container? Any links, docs, etc.. would be helpful. Or just send me an email with info to brent.ryan[at]gmail.com.

    Thanks,
    Brent

  10. mike said,

    June 28, 2007 at 4:28 am

    I am getting this error on JBoss 4.0.5 when deploying with ha-jdbc:
    08:57:17,446 INFO [DatabaseClusterFactory] Initializing HA-JDBC 1.1.11 from file:/D:/jboss-4.0.5.GA/server/all/conf/ha-jdbc.xml
    09:03:48,221 ERROR [DatabaseClusterFactory] Failed to load HA-JDBC configuration from file:/D:/jboss-4.0.5.GA/server/all/conf/ha-jdbc.xml
    Error reading from stream
    java.io.IOException: Read error
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:194)
    at org.jibx.runtime.impl.InputStreamWrapper.fillBuffer(InputStreamWrapper.java:156)
    at org.jibx.runtime.impl.InputStreamWrapper.require(InputStreamWrapper.java:183)

    I put the ha-jndi.xml in /config dir, but jibx is unable to load it. Any idea?

  11. tom said,

    July 5, 2007 at 1:39 pm

    Hey Mike,

    It looks like your problem has nothing to do with HA-JDBC. It has something to do with reading a file, the exception gets thrown by a native method. So check if this file can actually be read by HA-JDBC.
    - Is the file URL correct?
    - Does the process have the rights to read the file?
    - Is the file corrupt?
    - etc…

  12. cricri said,

    July 9, 2007 at 2:44 pm

    Hello,
    Can anyone help me because I am a newbie and I am a little bit lost?
    I am getting the following error:

    31 [http-8084-Processor23] INFO net.sf.hajdbc.DatabaseClusterFactory - Initialisation de HA-JDBC 1.1.12 de file://localhost/D:/ha-jdbc-cluster.xml
    63 [http-8084-Processor23] ERROR net.sf.hajdbc.DatabaseClusterFactory - null
    java.lang.ExceptionInInitializerError
    at net.sf.hajdbc.JiBX_bindingFactory.createUnmarshallingContext()
    at net.sf.hajdbc.DatabaseClusterFactory.createDatabaseClusterFactory(DatabaseClusterFactory.java:161)
    at net.sf.hajdbc.DatabaseClusterFactory.getInstance(DatabaseClusterFactory.java:128)
    at net.sf.hajdbc.sql.Driver.getDatabaseCluster(Driver.java:152)
    at net.sf.hajdbc.sql.Driver.acceptsURL(Driver.java:68)
    at java.sql.DriverManager.getDriver(DriverManager.java:232)
    at org.apache.tomcat.dbcp.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:773)
    at org.apache.tomcat.dbcp.dbcp.BasicDataSource.getConnection(BasicDataSource.java:540)
    at org.hibernate.connection.DatasourceConnectionProvider.getConnection(DatasourceConnectionProvider.java:69)
    at org.hibernate.cfg.SettingsFactory.buildSettings(SettingsFactory.java:76)
    at org.hibernate.cfg.Configuration.buildSettings(Configuration.java:1933)
    at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1216)
    at com.kpn.nge.work.HibernateUtil.(HibernateUtil.java:41)
    at com.kpn.nge.web.HibernateSessionFilter.doBeforeProcessing(HibernateSessionFilter.java:45)
    at com.kpn.nge.web.HibernateSessionFilter.doFilter(HibernateSessionFilter.java:68)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:432)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
    at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
    at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
    at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
    at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
    at java.lang.Thread.run(Thread.java:595)
    Caused by: java.lang.RuntimeException: Specified parser factory class org.jibx.runtime.impl.StAXReaderFactory does not implement IXMLReaderFactory interface
    at org.jibx.runtime.impl.UnmarshallingContext.createReaderFactory(UnmarshallingContext.java:181)
    at org.jibx.runtime.impl.UnmarshallingContext.(UnmarshallingContext.java:73)

    Am i supposed to do something with JDNI parameters?
    If it is the case, could anyone tell me how to do?

    Thanks in advance.

  13. Caleb said,

    April 15, 2008 at 5:16 pm

    An aside… for uuid based objects you need not use strings, one can use a 2 part key consisting of 2 longs (BIGINT). Consider this if row size is an issue

    @MappedSuperclass
    @Table(
    uniqueConstraints={@UniqueConstraint(columnNames={”_uuid1_”, “_uuid2_”})}
    )
    abstract class baseName

    …SNIP…

    private UUID uuid = java.util.UUID.randomUUID();
    … SNIP…

    @Transient
    public UUID getUUID() {
    return uuid;
    }

    @Column(name=”_uuid1_”,nullable=false,updatable=false)
    protected long getUuid1() {
    return uuid.getLeastSignificantBits();
    }

    /** For use by the persistent layer only! */
    @SuppressWarnings(”unused”)
    private void setUuid1(long part1) {
    long part2 = uuid.getMostSignificantBits();
    uuid = new UUID(part2,part1);
    }

    @Column(name=”_uuid2_”,nullable=false,updatable=false)
    protected long getUuid2() {
    return uuid.getMostSignificantBits();
    }

    /** For use by the persistent layer only! */
    @SuppressWarnings(”unused”)
    private void setUuid2(long part2) {
    long part1 = uuid.getLeastSignificantBits();
    uuid = new UUID(part2,part1);
    }

    Have fun….
    Cheers

  14. Wen said,

    April 18, 2008 at 10:41 am

    I want to use the ha-jdbc with hibernate in the spring framework, but how to inject the bean of ha-jdbc in the spring container? Also how to configure a datasource of ha-jdbc for the org.springframework.orm.hibernate3.LocalSessionFactoryBean which spring provides for supportting hibernate?

  15. andrey said,

    September 12, 2008 at 11:52 am

    Unfortunately we are using Mysql 4 which doesnt support sequences.

  16. kolia said,

    September 12, 2008 at 11:53 am

    Is the file URL correct?

  17. Davide said,

    December 24, 2008 at 7:50 pm

    Hi,
    usually when i need to read/write to database i prefer to use an ORM solution.
    My preferred implementation is Cayenne… but also Hibernate do this job very well.
    Based on your experience, is possibile to use this kind of persistence framework in conjunction with HA-JDBC?
    There is some documentation available? I haven’t found it :-(

    Thanks
    Davide

  18. yassola said,

    April 22, 2009 at 10:59 am

    hi,

    anybody can help, I want to use LEXST Database Cluster, because I assume my table will expand too big.
    But it need knowledge of JSP. it’s a pity for me.

    Can I use Ha-JDBC, even I don’t know how to write JSP ? (I can only use PHP or ASP).

    thank you, your help appreciated.

  19. tom said,

    April 29, 2009 at 2:25 pm

    Again, to clarify, using HA-JDBC is nothing more than using a normal JDBC approach. So for you guys who are wondering whether you can use ORM like Hibernate: yes you can! In fact, you can take an existing Hibernate application, replace the JDBC driver you’re using with HA-JDBC and you’re done!

    And yassola, you were asking about JSP … I don’t really know what you mean, since JSP has nothing to do with JDBC

  20. Mou said,

    June 9, 2009 at 11:51 am

    Sorry for a newbie question, I configured ha-jdbc, but when the connection is created the thread start working in the background and the application flow don’t move further.

    hibernate.cfg.xml

    jdbc:ha-jdbc:cluster1
    org.hibernate.dialect.MySQLDialect
    hajdbcdb
    root
    root
    net.sf.hajdbc.sql.Driver

    ha-jdbc-cluster1.xml

    com.mysql.jdbc.Driver
    jdbc:mysql://localhost:3306/hajdbcdb?autoReconnect=true&useUnicode=true&characterEncoding=utf8&characterSetResults=utf8
    root
    root

    System.out.println(”Loading Session Factory”);
    SessionFactory sessionFactory = new Configuration().configure(
    “hibernate.cfg.xml”).buildSessionFactory();
    session = sessionFactory.openSession();
    Transaction tx = session.beginTransaction();
    System.out.println(”Inserting Record”);
    Idtest test = new Idtest();
    test.setName(”Test”);
    session.beginTransaction();
    session.save(test);
    tx.commit();
    System.out.println(”Done”);

    Application only prints “Loading Session Factory”!!

    any idea what we are doing wrong?

  21. Bin said,

    March 24, 2010 at 12:42 pm

    Can we use HAJDBC in GDAF framework?

Leave a Comment