The Future of the Database
New Groups of Users
Within the last few years, relational database management systems (RDBMS), such as Oracle and Sybase, has seen a whole new class of users. Up until recently, an RDBMS was primarily a tool for the big corporations. They’d use it to manage their payroll, for production management, for financial management, etc. In short, mission-critical applications where data integrity and security is far more important than flexibility or cost. Joe Nerd had no interest in using a database for his own stuff. There’s no way that he could ever create enough data all by himself, to warrant the huge cost and pain of installing and maintaining Oracle.
Enter the web, and the picture changes. Now, Joe Nerd wants to run a web site. Any interesting web site includes a way for the users of the web site to collaborate. Joe Nerd may own and run it, but the users of the web site share their comments, talk to each other, and in general contribute to the contents of the web site. To do that, you need to store your data somewhere and be able to retrieve them dynamically, as a browser requests a web page.
Databases are extremely well suited to this type of application. They take care of all the concurrency issues and does transaction management. Since only the RDBMS touches the files, the risk of corrupting the data files are low. And the query optimizer will make your data retrieval life easy. In short, you can’t really build an interesting web site without having a database sitting behind it.
This changes the user base for databases. The corporate MIS department developer gets company from Joe Nerd in the programmer user space. In the end user space, the user of the final application, the corporate staff gets company from Jane websurfer. As happened with the PC and with the Internet, when a technology is exposed to a new group of users, radical changes happen. But what changes?
Open Source Databases
The new users will be the type of people disposed for contributing to an open source project. These people haven’t really had any use for databases before, but they do now. Since open source programmers primarily program for their own needs, they’ll start hacking on relation database management systems. There is already one RDBMS that’s open source from the ground up, namely <a href=”http://www.postgresql.org/”>PostgreSQL. And <a href=”http://www.interbase.com/”>InterBase, previously a secret-source project, has had its source opened, although not under GPL. The secret-source database vendors, such as Oracle and Sybase, will begin to feel the pressure from open source competition within the next few years.
But the new types of users also has different demands than the old ones did. The applications used in banks don’t change much over time: Their primary concern is data integrity and security over flexibility. The programmers working for banks are used to old-fashioned development tools. The users of banking applications are used to arcane interaction experiences. And banks are wealthy enough that they don’t mind having to spend a lot of resources on maintaining a database server. Needless to say, this is not the case for Joe Nerd and his web application.
Flexible Data Models
On the web, the name of the game is change. A web application is constantly evolving in response to user needs. Often times, you even want the users of your web site to be able to define and evolve their own data models, a’ la a web-based MS Access application.
Current RDBMS software like Oracle and Sybase don’t support this scenario very well. They are designed for static data models. They’re very narrowminded as to what types of changes you’re allowed to make. More importantly, changing the data model isn’t transactional, so you can’t group two or three changes together and make sure that either all or none of them are executed. This is very critical, as there may be a programming error between the second and the third change, so only the two first are executed. These first changes have to somehow be undone, or we’ll have them lying useless around forever.
With current RDBMS software, the developer can chose to bite the bullet, do the data model changes very carefully, and be prepared to manually untangle the mess that will ineviably arise from time to time, when the changes break. Or he can chose to re-implement part of what the RDBMS was supposed to do on top of the RDBMS, which will be slower, less powerful and more work than having the RDBMS do its job.
The market will demand flexible data models. The RDBMS will have to acommodate it by making data definition more flexible and, most importantly, transactional. We need it, if we are to deliver high-quality applications to the end users, and it makes a lot more sense to put it into the database layer, rather than to have every application invent its own ad-hoc solution on top.
Modern Programming Environment
I’ve worked a lot with Oracle lately, and it’s just soooo seventies. Oracle’s proprietary PL/SQL language lack modern language features and doesn’t support compound data types. And the development environment, i.e. getting your code into the RDBMS and maintaining it once its in there, is very cumbersome.
Developers coming from a background in hacking Perl, Tcl, Java using modern development tools will have a hard time getting used to that. They won’t understand why they can’t write their code in Perl, Tcl, Java or C, just because it runs inside the database server. And indeed, with PostgreSQL, you can actually write your stored procedures in Tcl. And there are ways to get Java to run inside Oracle, though the process of getting it in there and maintaining your code in the database is even more cumbersome than for stored procedures written in PL/SQL.
Even more, with databases being used for the web, the data we’re storing in the database is more and more strings. Consequently, the RDBMS will have to get better at managing and manipulating strings. Currently, Oracle has some strange limitations, where you have to completely change your syntax if a string might exceed 4000 characters. And string manipulation in PL/SQL is horrible. This will have to change.
Supporting Explorative Use
The undo function of most desktop applications is one of the most important functionsand one of the hardest to implement on top of an RDBMS. The reason it is so important is that it encourages exploration by the user. The user can learn the application by trying, confident that he can undo any changes that he regrets.
Databases already have a wonderful tool for this in the transaction: You can make lots of changes to the database, but until you decide that you’re done and commit the changes, you still have the option of undoing everything by issuing a rollback command. But you can only undo what you’ve done so far in this transaction. The instant you’ve committed, you can’t change your mind in any other way than by manually and carefully reversing your actions.
I’d like to see a RDBMS that’ll let me pick an old (but not too old) transaction and have that undone. This will make it easier for applications to offer a general undo function to users, and users will be happier. To implement an undo feature in a database application today is very hard work.
The tricky thing here is the interaction among multiple users. What if I post a question on a discussion forum, some other user answers my question, and then I want to undo? Well, basically, I can’t. That’s the hard fact of life. If you say something out loud and someone else acts on it, it’s too late to take it back. But if noone’s yet acted on what the user just did, we might as well let her take it back.
I’m not familiar enough with the details of database theory to judge whether it’s possible or not, but I’d guess that it would be safe to undo an action if no later transaction has read the data the user just touched and made other changes after having read it. The RDBMS knows whether that’s the case or not.
Ease of Maintenance
Joe Nerd doesn’t have spare time or money for installing and maintaing a database installation. The RDBMS has to take on more of the job of maintaining itself and running smoothly, so the developer can focus on adding the features that matters to the users of his web site. I haven’t tried installing and maintaining too many different RDBMS’, but I do know that Oracle is quite a beast to deal with, and that both Interbase and PostgreSQL are much better.
The current trouble falls in two parts: Instaling it and making sure it keeps running smoothly. Installation should be completely straight-forward, and have reasonable defaults, so you don’t even have to change any parameters before installing it. It can surely be done.
Databases often get slower and slower as they get bigger. The RDBMS should be able to automatically defragment as needed, grow as needed, and in general do a reasonable job of running smoothly without human intervention. I don’t have first-hand experience with this, but others have told me that Interbase does an excellent job hereproof that it can be done.
In summary, I believe that the twin effect of new users starting to use databases for new things will have a profound effect on the database market. Within the next three years, I expect to see open source databases have a great impact, and I expect to see them pioneering a number of innovations in RDBMS software.