Fundamentals: How Programmers Use Databases

Target audience for this article:  Beginning programmers who are ready to start working with databases

Deciding on a Database

Clearly you will need to know which database you are using, and how to decide will be extensively covered on this site in other articles.  This article applies to any of them and will go over concepts.

If you are just starting out, and are writing a simple non-web application, SQLite is probably a great choice!  For a web application, MySQL (or one of its forks) is the de-facto standard and included in nearly any web hosting plan.  It also has a lot of newbie-friendly information available.

The following sections go over the parts of code that work with databases.  You should cross reference this with other tutorials and the documentation for your database and your programming language.

Loading Database Libraries

Before you can work with a database, your programming language needs to understand the commands to work with it.  This exact process varies by language, but here are some pointers.

PHP as distributed by Linux distributions (by far the most common use) requires database drivers to be loaded as modules, which are typically installed through your Linux distribution's package manager, which should be strongly preferred.  (SQLite support is built in through the PDO data abstraction layer.)  If a module for your database is not available via your package manager, you can install it via the 'pecl' command.  Once the proper extension module is loaded, your code does not need to do anything else.  It can assume the module its there, and if it isn't your script will fail saying the function is not known.

Python, Perl, and Ruby also use loadable modules for database support.  Like PHP, these should also be installed from your package manager if possible.  If not available there, you can install them via easy_install (Python), CPAN (perl), or gem (Ruby).  These languages do require you to import the module in your program source code before using it.  For example, in Python, this would import the "psycopg2" PostgreSQL driver:

import psycopg2

Java requires that a JDBC driver for your SQL database be installed.  Other drivers are likely different (fixme: this author not a Java expert!)

C/C++:  Writing a database application in C or C++ is not for the faint of heart, as it requires quite a bit of boilerplate code for each operation, and any perceived speed advantage of using C/C++ will be negated by the fact that your code will have to wait for the relatively slow database.  So doing this is uncommon; nevertheless it can be done!  You'll need to first be sure that development packages are installed for the database - if you installed the DB via your Linux distribution package manager, you'll want to install the -dev package (Debian/Ubuntu) or -devel package (Red Hat and clones).  Then you will need to #include the proper header file, and be sure that the linker will know to link the shared library, which contains the actual code to talk to the database, with your program.

What the Database Libraries Actually Do

For non-embedded databases, the libraries provide the code to talk to the database server.  Your program will call the functions in the library and it takes care of all the low-level communication stuff.

For embedded databases such as SQLite, the library contains the actual database code itself!  No server is required, so the entire code actually lives with your program.

Connecting to a Database

So the library is loaded and you're ready to connect.  This is generally accomplished by calling some kind of connect function.  It typically requires these parameters - first, for non-embedded databases:

Host:  Where the database server resides.  It can often by omitted to signify that the database server is located on the same computer as your application (localhost).  The host can generally be specified as an IP address or a DNS name.

Port:  Which TCP/IP port we connect to.  It can usually be omitted to use the default port (each database has one).

User/Password:  Authentication information.  Some databases do not require this, but most do.

Database Name:  Most database servers can have many databases, so you need to specify which one to connect to.

For embedded databases:  Often the filename of the database is sufficient!

Sometimes, connecting to a database server, authentication, and selecting a database are separate steps.  See a tutorial for your specific language/DB for details.

The connect command will generally return a handle to the database connection.  You can use this handle to perform other operations on the database.  You usually only need to connect once per run of the program.  The handle can be re-used for many queries and other operations.

Sending a Query

And now it gets interesting!  You get to send a query to fetch the data you want.  With SQL databases you put the command into a string, and call your database library's query function.  For example (psuedocode):

s = "SELECT id, name FROM employees WHERE start_date > '2010-01-01'
results = db_handle->query(s)

So here we are selecting the name and the ID number of all employees that began work after the first of the year 2010.  This query is simply assigned to a string variable s, and the next line simply passes it to the query function of the object that represents the DB handle.  Of course there are more complicated cases.  You can programatically construct the SQL string based on user criteria.  For example if the user would like to search by salary, or by start date, or by manager, you could append clauses to the WHERE section, and then pass in the constructed query.

CAUTION: Never trust data from the outside world!  Since SQL starts and ends strings inside the query with a single quote, you must be extremely careful with any fields from the outside that you use to construct your query.  If they contain a single quote (or other special characters), those characters must be escaped for security reasons.  If you don't, then you allow what is called an SQL injection attack, whereby the attacker can carefully craft input in order to do unpleasant things with your database!  Most SQL language libraries include appropriate functions to do this escaping -- be sure to pass all string user input through it before sending it to the database.  This is very important!!!

A better way to handle user input is with prepared queries.  Your query can include "?" marks as placeholders, and from your code you can set values for these placeholders directly with the database driver immediately before sending the query.  Not only is this safer, but it is also faster, especially if the query is prepared, compiled, and planned once on the server side so that those steps do not need to happen every time the query is executed.

NoSQL queries are a bit different.  In MongoDB for example, queries are basically JSON objects that can be constructed as normal objects in your langauge of choice and sent to the find command.  Others require queries to be set up as map/reduce jobs on the server (beginners are unlikely to get into this for a while!).  Something like SQL injection is not a problem in these databases that I have heard of.

After a program sends a query, it gets back a handle to the results (called 'results' above).  It can use this handle to loop through the returned data, or ask questions like how many rows are in the result set.

Processing Results

If the program asked for a set of data, there will often be a loop to consume each record individually.  The fields from the record are often displayed on a web page, or used in a calculation.  Of course, the details are highly application specific.