Big Data Management
Oracle NoSQL Database facilitates efficient storage of
massive amounts of data in a simple, flexible format.
Following the announcement of the availability of Oracle NoSQL Database,
Rich Schwerin, Oracle Magazine contributor,
sat down with Dave Segleau, director of
product management at Oracle, to talk about
the new offering for big data management.
The following is an excerpt from that
interview. Download the full podcast at
Oracle Magazine: Let’s start at the beginning.
What is a NoSQL database?
Segleau: NoSQL means not only SQL, and it
encompasses a set of database technologies
that have been under development for the
past 12 years. NoSQL databases in general
try to address some of the data management requirements of what’s been called big
data in the industry. In very general terms, a
NoSQL database is a nonrelational database
that can manage data over a distributed set
of storage servers, is designed to be highly
available and highly scalable, and supports
a variable data schema and data formats.
NoSQL databases often avoid ACID [atomic,
consistent, isolated, and durable] transactions and table joins in order to achieve
faster throughput. There are several different
kinds of NoSQL databases, and each implementation tends to have its own particular
set of technical features and behavior. The
tough part about defining what a NoSQL
database is, is that there are no standards for
NoSQL today. There are literally hundreds of
products claiming to be NoSQL databases or
having NoSQL capabilities.
Oracle Magazine: When would a developer
choose a NoSQL database?
Segleau: The most common use cases involve
Web or internet-centric applications—what
we like to call Web-scale applications or
Web services, in the broadest sense. These
applications are providing either data capture
or data services over the Web. Data capture
is the ability to monitor, capture, and query
incoming data from a multitude of data
points, such as network monitoring, sensor
networks in factory automation, and mobile
device management. Data services are Web-scale, high-performance, customer-oriented
Web services, like Amazon, LinkedIn, or
Facebook. Often, it’s both data capture and
Oracle Magazine: What are some of the pros
and cons associated with NoSQL databases?
Segleau: The pros include the ability to scale
out compute and storage capacity horizontally over a wide range of hardware resources,
simple and fast queries, and a flexible and
simple approach to schema management.
The cons include a lack of support for complex
queries, a lack of support for multitable joins,
limited transaction support, and having to
learn a new database technology approach.
Oracle Magazine: You mentioned several different kinds of NoSQL databases. What kind
of database is Oracle NoSQL Database?
Segleau: Oracle NoSQL Database is a distributed key-value database, like the ones
currently used at LinkedIn and Amazon.com.
The key might be the user or membership
ID, and the value contains some information
about that user—for example, basic profile
information including address, picture, and
other vital information. Other records associated with that key might contain the user IDs
or e-mail addresses of friends and the products that the user has recently purchased.
If you’re an RDBMS person, you can think
of a key-value database as the simplest form
of a two-column relational table: the first
column is the key, and the second column is
the value. Keys and values can be very simple
values or complex structures. Oracle NoSQL
Database stores records that contain a key-value pair and retrieves records based on the
requested key. Oracle NoSQL Database distributes those key-value records, based on the
hashed value of the key, across any number of
servers that we call storage nodes. The database is designed to scale out to many systems
as your data management needs grow and
provides many of the features common to
other NoSQL database implementations, as
well as providing several key features that are
not available in other NoSQL products.
Oracle Magazine: What are the some of those
Segleau: There are several key features that
I’d like to highlight, but what it boils down
to is that Oracle NoSQL Database is general
purpose, as well as simple to use and deploy.
Lots of the existing NoSQL products are specially tuned for specific kinds of problems.
The issue is that this approach doesn’t adapt
well to other types of problems. For example,
Dynamo—Amazon’s NoSQL database—is
very good for Amazon’s requirements
because Amazon wrote it. But most customers are not Amazon, and what they want
is a more general-purpose solution that will
address their NoSQL database needs.
A common complaint is that many of the
existing NoSQL products discard fundamental