Glossary

As you work with libre databases, you'll run into these terms.

ACID:  Short for Atomicity, Consistency, Isolation, Durability; this is a strong set of guarantees that a database system provides to applications using it that data will be handled in a robust manner.  See individual term definitions.

API:  Application Programming Interface - a set of types, functions, objects, and/or methods which are presented by a service for an application programmer to utilize.  For example a database will offer API facilities to insert and query data.  A cloud will provide API services to provision resources, etc.

Atomicity:  A guarantee that an application can attempt to run an arbitrary number of commands, whereby they all succeed or the effects of all of them will be rolled back.  The classic example is a financial transaction - one account must be credited and the other must be debited.  If the second operation fails, the first must not be allowed to succeed either!

Availability: The property of a computing system whereby a user can expect to be able to perform his or her task at any given time.

B-Tree Index:  An index whereby the data is arranged in a binary tree.  The database server can arrive at the required datum by jumping to a point in the index and asking "is this value greater than, less than, or equal to what I am looking for?" and jumping accordingly.  This is by far the most common type of index, and facilitates querying for a range of values between two points.

CAP Theorem:  In a distributed computing system, you can pick only two of Consistency, Availability, or Partition Tolerance.  Having all three is theoretically impossible.  See individual definitions.

Cloud:  A large pool of computing resources set up in a way that resources can be allocated to a given user or project by making an API call.  This allows infrastructure to be defined by code.

Collection:  In MongoDB, a set of documents that, according to your application, are related to each other.  This is rather like a table in SQL-land but without any enforced structure.

Consistency: The property where a database system contains data that does not violate any constraints, is in a coherent state (especially when two or more tables are updated together), and where different nodes of a distributed system will give the same answer when asked the same question.  Related to ACID, consistency ensures that before a transaction is committed, any violations of constraint checks are rectified (if not, the transaction will abort).  Related to the CAP Theorem, consistency ensures that every node has the same data.  (Getting this right in real time is a hard problem, so most NoSQL databases go the route of eventual consistency, see definition.)

Constraint: Limits on what kind of value a field can hold.  For example, it can be "between 0 and 100" or, more commonly, it can be a foreign key (see definition).

Covering Index:  An index that itself contains all the information needed by a given query.  This greatly speeds up the query, because it does not need to reference the actual table.

CRUD: Create, Read, Update, Delete.  A simple acronym to refer to the most basic of data operations in a database.

Database:  A collection of related information.  Most database servers can manage multiple databases simultaneously.  Generally you will want to use a database per project.  If fields from tables (or collections) are related to each other, they should probably be in the same database.  For multi-tenant SQL servers, different customers would almost certainly have different databases.

DDL:  Data Definition Language.  These are commands sent to an SQL database to modify the data schema.  Examples are CREATE TABLE, ALTER TABLE, DROP TABLE, etc.

DML:  Data Modification Language.  These are commands sent to an SQL database which modify data in a table without changing the schema.  Examples are UPDATE, INSERT, and DELETE.

Driver:  Software "glue" that contains the software needed to talk to a database and presents its API to a given programming language.  (Example: a Python driver for PostgreSQL.)

Durability: The property of a database that guarantees that once a successful write operation is acknowledged to the calling application, it has been safely written to disk and will persist even in the event of a system crash.

Eventual Consistency:  A property of a replicated database system whereby replicas may serve stale data for a (hopefully small) window of time after the data is updated on the master.  For example:  A user updates his preferences, then accesses another site page which pulls preferences from a replica.  It is possible that the update may not have yet reached the replica, so the old settings may apply to this request.

Field:  A single datum value.  Many of these will be combined to form data types representing entities (such as tables or collections).

Foreign Key:  A declaration stating that a certain field in the table must be equal to a value of any field in a given column in another table.  For example you may have a table 'states', with a row for each state and a column 'code' containing the corresponding two letter state code.  A 'cities' table could have a column 'state', which would be a foreign key to the 'code' column in the 'states' table.  If someone tried to insert a city with an invalid state code, an error would be thrown and the insert would not succeed.

fsync:  A UNIX/Linux/POSIX system call that ensures that data is durably written to a disk before it returns to the program.  Some database systems can be configured to fsync on every transaction or not.  Enabling this will slow things down somewhat, but is necessary for durability guarantees.

GIS:  Geographic Information System.  A database supporting this can handle things like points, lines, polygons, etc; and can handle computations to find intersections and other geometric calculations.

Hash Index: An method of indexing that calls a function to compute a digest value for a given input.  Since the same input always generates the same digest, the system can use this to quickly locate where in the index the value resides.

Index:  Data from one or more fields of a table/collection, stored in a way that lets the database find information quickly.  It then contains a pointer to the relevant row in the table proper.  It is generally transparent to the application developer, with every update to the table/collection automatically updating the index as well.  Note that this can slow down write operations, so it is important to have only indexes that will actually be used for time-sensitive queries.  See B-Tree and Hash indexes, the most common types.

Isolation: The property of a transaction that allows it to see its own unique, consistent view of the data, possibly different than what is seen by other transactions.

Map/reduce:  A method of querying data whereby all the data is sent through a "map" function, which emits data points for further processing by a "reduce" function to aggregate results.  Typically used on huge data sets that span many physical computers.

OnLine Transaction Processing (OLTP):  A data system that processes events as they occur and gives immediate responses to requests from users.  Availability is important here.

Partition Tolerance: The ability of a distributed system to function even if a network outage splits the system into two or more parts unable to communicate with each other.

Primary Key: A column or group of columns which is guaranteed to have a unique value for each row in a table.  An index is created on the key, making it the main method of uniquely distinguishing a given row.

Query:  A request to a database system to fetch and return a specific set of data.  In some database systems they can be arbitrarily complex, causing the database server to do a fair amount of work to collect the results.  The server must determine which data is being requested and the most efficient way to collect the data, using a query plan.

REST:  "Representational State Transfer."  A way of manipulating entities using HTTP (HyperText Transfer Protocol), the exact same protocol used to fetch Web pages.  It is easy to learn and can be used via the command line program curl.  It is the native protocol of CouchDB.

Row:  One single record in a table.  It consists of one or more fields, which work together to describe the entity represented by the row.

Table:  In SQL, a collection of rows of information, each row of which "is" the same thing and contains the same fields.  For example a table may contain all employees with their basic information, but would not also include customer orders.

Transaction:  A group of statements executing together, normally requiring ACID compliance.