|
||||
Have you ever found it tedious to mix SQL and other languages, or been
reluctant to write the same four lines of code again to do a simple
database query? This article can help you eliminate the drudgery
involved in database access, and make your programming time more
efficient, by wrapping simple transactions in friendly native Python
syntax. By making the database emulate regular Python objects, you can
remove a source of friction and frustration from your development
process. Your time spent programming will be more efficient and
productive when you can focus on the task at hand, without being
constantly sidetracked by unimportant details like where the cursor
object is, or whether you need to escape-protect the data in the next
query.
Another benefit of using native syntax is better portability. This approach makes it easy for you to change databases without having to rewrite any of your application code. Simply modify a few lines in the database wrapper, and you can support an entirely new database. Though this article focuses on MySQL, the code should work easily with PostgreSQL or even SQLite with only minor modifications.
The approach described here is useful any time you want to write a quick program to access a database. You may find it convenient if you want to generate a handful of new reports quickly, or perform basic modifications to your data.
Another benefit of using native syntax is better portability. This approach makes it easy for you to change databases without having to rewrite any of your application code. Simply modify a few lines in the database wrapper, and you can support an entirely new database. Though this article focuses on MySQL, the code should work easily with PostgreSQL or even SQLite with only minor modifications.
The approach described here is useful any time you want to write a quick program to access a database. You may find it convenient if you want to generate a handful of new reports quickly, or perform basic modifications to your data.
|
||||
To begin, you'll want to convert the most frequently-used operations.
The most basic operation is reading or selecting data, so I will focus
on that first.
Applications frequently need to do things such as "retrieve record 87", or "print results 30-40." This calls for an array or list structure, which is easy to do in Python. Normally, to show record 87, you would need to execute something such as the following:
Applications frequently need to do things such as "retrieve record 87", or "print results 30-40." This calls for an array or list structure, which is easy to do in Python. Normally, to show record 87, you would need to execute something such as the following:
c = db.cursor()
c.execute("select * from book limit 87,1")
row = c.fetchone()
display(row)
Four lines is needlessly long for what is essentially an array lookup.
It barely begins to utilize the full power of SQL, yet this type of
thing is quite common. So, what if you could use simpler syntax to do
the same thing? The following would be much easier:
display(books[87])
That sort of easy syntax is supported in Python, using its special
class methods. Any object can emulate an array by implementing special
functions. In this case, you want to create a Table class with a
method called __getitem__()
.
class Table:
def __init__(self, db, name):
self.db = db
self.name = name
self.dbc = self.db.cursor()
def __getitem__(self, item):
self.dbc.execute("select * from %s limit %s, 1" \
% (self.name, item))
return self.dbc.fetchone()
Using this class, you can easily access a database table as if it were
a native Python
list. It merely provides read-only access to single rows, but it's
a good start. The following short program demonstrates its use:
import MySQLdb
db = MySQLdb.connect(user="user", passwd="passwd", db="library")
books = Table(db, "book")
for i in xrange(80, 100): # (xrange type)
print "Book %s: %s" % (i, books[i])
Condensing the database lookup into a single short expression greatly
increases its expressive power. When you must write code to keep track
of a cursor, execute queries, and fetch rows from the result, the
operation just doesn't fit into as many places. Using a word or two
for the same concept allows you to incorporate such condensed
operations into other ideas more clearly and easily.
|
||||
Something important was missing from the code mentioned so far. Many
programs want to know how much data needs to be processed. Most
search programs return this with the results, such as "results 80 to
100 of 487." That 487 is important to many search engine users.
Luckily, the length operator in Python is easy to
implement. Simply add the following to the Table class:
def __len__(self):
self.dbc.execute("select count(*) from %s" % (self.name))
r = int(self.dbc.fetchone()[0])
return r
This will make the function len(books)
return the number
of items in the table, so you can loop over the entire data set or at
least tell the user how much data exists:
for i in xrange(len(books)):
print books[i]
|
||||
The loop in the previous example was better, but it could have been
simpler still. Python supports iterators, which would have been
a little easier. It should be possible to process every item in a
table with syntax such as:
Putting the concepts of an idea, such as database access, into simpler terms enables still more and better innovations. For example, a car engine is a very complicated device with countless parts and design considerations. It takes a lot of work to fully describe an engine, and to attach it to wheels to make a car. But once you fully understand the concept of an engine, you no longer need to describe it in detail. It becomes sufficient to say "an engine with wheels." Then, after building such a car, you can easily create new concepts by reusing the same old engine ideas. Perhaps after the car, you decide to attach a blade and build a lawn mower. And then, attach wings and a propeller to make a plane. Creating these new things would be vastly more difficult if you had to re-create the concept of an engine each time. Or, in this case, if you had to keep writing the same code over and over, each time you needed to perform an operation you've done before.
for book in books:
print book
This is simple to do, but first it would be a good idea to refactor a
little. The Table class will be doing a lot of database queries, and
pumping all those queries through one function will help make
debugging easier. First, add a self.debug = 1
line to the
Table's __init__()
function, and then add a method for
queries:
def _query(self, q):
if self.debug: print "Query: %s" % (q)
self.dbc.execute(q)
To add iteration, simply add two methods to the Table class:
__iter__()
and next()
. The class does not
need to be derived from a built-in iterator type, it merely needs to
implement the same policies. This "Duck
Typing" technique is part of the "policy over mechanism" theme
common in Python. It provides most of the same benefits as more strict
languages, without losing flexibility. Adding the following code turns
the Table class into an iterable object:
def __iter__(self):
"""creates a data set,
and returns an iterator (self)"""
q = "select * from %s" % (self.name)
self._query(q)
# an Iterator is an object with a next() method
return self
def next(self):
"""returns the next item in the data set,
or tells Python to stop"""
r = self.dbc.fetchone()
if not r:
raise StopIteration
return r
By adding this code, the Table class now acts like an iterator,
allowing you to use the easy syntax at the beginning of this section.
Putting the concepts of an idea, such as database access, into simpler terms enables still more and better innovations. For example, a car engine is a very complicated device with countless parts and design considerations. It takes a lot of work to fully describe an engine, and to attach it to wheels to make a car. But once you fully understand the concept of an engine, you no longer need to describe it in detail. It becomes sufficient to say "an engine with wheels." Then, after building such a car, you can easily create new concepts by reusing the same old engine ideas. Perhaps after the car, you decide to attach a blade and build a lawn mower. And then, attach wings and a propeller to make a plane. Creating these new things would be vastly more difficult if you had to re-create the concept of an engine each time. Or, in this case, if you had to keep writing the same code over and over, each time you needed to perform an operation you've done before.
|
||||
Databases tend to be large, and accessing an entire data set can be
very slow. If you have 50,000 records in a table, and only need twenty
of them right now, grabbing the whole set would be very inefficient.
However, taking only one record at a time is also inefficient. To
solve this, use slices. Python provides slicing for list-like objects,
which can select a range of records quickly and easily.
The previous examples executed one database query for each row accessed in the table. Pulling tens or hundreds of records at a time would be more efficient though. The following demonstrates the syntax for taking a slice of a list, or a list-like object:
Different databases use different syntax for selecting data slices, which can become confusing if you work with more than one. It can be difficult to remember if the
The previous examples executed one database query for each row accessed in the table. Pulling tens or hundreds of records at a time would be more efficient though. The following demonstrates the syntax for taking a slice of a list, or a list-like object:
# create a list of all items to go on the third page
page3 = books[20:30]
# or, pull ten records into a list, and loop over them
for book in books[20:30]:
print book
To implement slice support in the Table class, modify the
__getitem__()
method. Both individual lookups and slice
lookups are implemented with this method, and determining which type
of result to return is simply a matter of checking whether the
parameter is an integer or a slice. The built-in
isinstance()
function, along with the types module
can help you decide what to do.
def __getitem__(self, item):
q = "select * from %s" % (self.name)
# isinstance() is recommended over direct type(a) == type(b)
# comparisons, to accomodate derived classes.
# TODO: However, duck typing would be even better.
if isinstance(item, types.SliceType):
q = q + " limit %s, %s" \
% (item.start, item.stop - item.start)
self._query(q)
return self.dbc.fetchall()
elif isinstance(item, types.IntType):
q = q + " limit %s, 1" % (item)
self._query(q)
return self.dbc.fetchone()
else:
raise IndexError, "unsupported index type"
The Slice object contains three useful attributes: start
,
stop
, and step
. The step
part
will probably not be useful, because SQL databases don't generally
support step sizes other than one. However, start
and
stop
are useful, and translate easily into arguments to
MySQL's LIMIT
parameter. The start
attribute
is used directly as the first parameter. Subtracting
start
from stop
produces the second
parameter -- length.
Different databases use different syntax for selecting data slices, which can become confusing if you work with more than one. It can be difficult to remember if the
LIMIT
clause takes
first, last
as arguments, or if it uses first,
length
, or if it expects something else. This tends to cause
silent errors in programs because the syntax is still correct --
accidentally getting too many records, or too few, or even none at
all. Hopefully, using the easy slice syntax will help you avoid
that type of error.
|
||||
So far, this article has ignored two of the most important aspects of
databases. Searching and sorting, or finding and arranging data, let you
narrow and organize results automatically. If you were writing an
address book application, the user would often ask questions such as
"Who do I know named 'Bob'?" and expect to get a sorted list. The
following code is a simple and feasible way to implement such a query:
Some simple tricks include sorting by several columns, or searching with more than one condition. A more complicated trick is to turn the Table into something like a view, by connecting it to more than one database table. The following code demonstrates all three techniques:
contacts.search("firstname='bob'")
contacts.sort("lastname")
for c in contacts:
print "%s %s: %s" % (c[FNAME], c[LNAME], c[PHONE])
The output would print a list of "Bob" names and phone numbers, sorted
by last name:
Bob Barker: 123-4567Python does not provide operators for searching or sorting, so a class method must suffice. These methods are very simple, because SQL implements them as parameters to
Bob Dobbs: 234-5678
Bob Monroe: 345-6789
Bob Zemeckis: 456-7890
SELECT
instead of
allocating separate commands for them. The only work the search and
sort methods need to do is save data for later use:
def search(self, method):
self._search = ""
if method: self._search = "where %s" % (method)
def sort(self, method):
self._sort = ""
if method: self._sort = "order by %s" % (method)
Making these functions actually do something requires modifying the
other class methods. The self._search
and
self._sort
attributes must be defined, and inserted into
queries where appropriate. The following methods should be modified:
Table.__init__()
: Setself._search = ""
andself._sort = ""
to avoid accessing them before they are defined.Table.__len__()
: Change the query to:self._query("select count(*) from %s %s" % (self.name, self._search))
Table.__iter__()
: Change the query to: q = "select * from %s %s %s" % (self.name, self._search, self._sort)Table.__getitem__()
: Change the first line to:q = "select * from %s %s %s" % (self.name, self._search, self._sort)
Table.search()
and Table.sort()
are exactly
what you would write in a WHERE
or ORDER BY
clause in a normal query. The table's name is also placed directly
into the query. This affords some interesting opportunities, or
tricks.
Some simple tricks include sorting by several columns, or searching with more than one condition. A more complicated trick is to turn the Table into something like a view, by connecting it to more than one database table. The following code demonstrates all three techniques:
books = Table(db, "book, series")
books.search("book.series=series.id AND book.publish_year>1995")
books.sort("series.name, book.publish_year")
SERIES_NAME, BOOK_TITLE, BOOK_YEAR = 6, 1, 2
for b in books:
print "%s: %s (%s)" % (b[SERIES_NAME],
b[BOOK_TITLE], b[BOOK_YEAR])
The preceding code tells Python to get rows from two tables, book
and series. The search method then adds two additional
instructions—join the tables by using the series ID, and ignore
anything from 1995 or earlier. Then sort the results first by the name
of the series, and then by the year each book was published. This has
the effect of grouping related items together and showing them in
chronological order. The code then chooses three interesting columns
from the data, and prints the information in a human-readable form.
|
||||
Reading data is useful, especially if you need to build reports or
otherwise sort through data. But you probably will want to write to the
database too. The simplest way to write is to insert new rows, which you
can achieve with the following syntax:
contacts.insert('', "Stanley", "Spudowski", "1965-01-01")
This is similar to a direct SQL query, but even simpler. And, because
of the way the Python database
spec works, you won't even have to worry about escaping data to
make it safe. The spec requires automatic escaping for anything in the
second parameter of the cursor's execute()
function. If
you ever plan to insert data supplied by a user, it is a good idea to
use that protection, because it will prevent possible exploits in your
program. Doing this requires a minor modification to the
Table._query()
method:
def _query(self, q, data=None):
if self.debug: print "Query: %s (%s)" % (q, data)
self.dbc.execute(q, data)
With that modification in place, you can now write an
insert
function. All it needs to do is pass the data to
your database in a format compatible with the DB API spec. However,
because it needs to support tables of any size and shape, the format
string should be created dynamically. The following creates a format
string such as "%s,%s,%s,%s"
according to the length of
the row it is given, and then executes the query.
def insert(self, *row):
fmt = ("%s," * len(row))[:-1] # repeat; remove final comma
q = "insert into %s values (%s)" % (self.name, fmt)
self._query(q, row)
As an additional usability feature, it avoids the need to pass a tuple
object, by collecting all the function's arguments into a tuple
automatically. The *row
syntax does this. Its effect,
basically, is to remove one set of parenthesis, allowing you to run
insert(a,b,c)
instead of insert((a,b,c))
.
|
||||
Removing rows is also likely to be useful. And, in most SQL databases,
it is not as simple as just telling the DB to remove row number 87.
The difficulty is a matter of telling the database to identify which
row to remove. One approach is to identify the row by specifying every
value in its cells. For example,
A less error-prone approach is to use row IDs. Row IDs are supported by most databases, though the implementation differs. In MySQL, the query would look more like this:
DELETE FROM movies WHERE
title='Harvey' AND year='1950' AND genre=37;
. The problem with
this approach is that you might have two rows with identical data. In
that case, you would accidentally delete both rows instead of just
one.
A less error-prone approach is to use row IDs. Row IDs are supported by most databases, though the implementation differs. In MySQL, the query would look more like this:
DELETE FROM movies WHERE
_rowid=58;
. That query will remove one and only one row,
regardless of the content of the table. Even if every row is
identical, each one will have a unique row ID. As an added bonus, the
query is smaller, simpler, and quite similar to the array-like syntax
Python uses to delete list items:
del movies[58]
To make your Table class respond to the del
command, add
the following method to it:
def __delitem__(self, item):
q = "select %s from %s %s %s limit %s, 1" % \
("_rowid", self.name, self._search, self._sort, item)
self._query(q)
rid = self.dbc.fetchone()[0]
q = "delete from %s where %s=%s" % (self.name, "_rowid", rid)
self._query(q)
This function performs two queries. First, it is necessary to obtain the
row ID. Then it deletes the row identified by that ID. You can eliminate
the need for the first query by always including the row ID in your
query results, but for the sake of brevity, that is not demonstrated
here. The provided example code includes row IDs, though, along with
several other enhancements.
|
||||
The data-access approach described above should help you use your time
more efficiently and help you write data-processing scripts more
quickly. You could extend and refine the approach into a much more
sophisticated library, and make it more appropriate for larger
applications. For example, you could modify the code to use dict-style
cursors, instead of tuples. Or, you could add a Row class, with methods
to modify individual cells. Another useful extension would be to
formally support joined tables instead of using the somewhat tricky
method described earlier. Many other enhancements could be added too,
such as speeding up iteration by fetching and caching many rows, or
making deletion more powerful or efficient. Implementing row
assignment would also make many programs simpler.
This "bottom-up" approach should help you build up the Python language to better deal with the tasks you commonly perform. By adding a few special functions, you can turn Python into a richer, more appropriate language for whatever you need. And hopefully, by letting the interpreter do more of the work for you, you can build faster, cleaner solutions.
While the approach described here demonstrates some of the things you can do with Python's syntax overloading, you may want to look into established tools for real database use. The SQLAlchemy library is very powerful, or I can also recommend SQLObject. Both are far more complicated than what is presented here, but they are also a great deal more powerful and mature. There is also an add-on to SQLAlchemy which makes it easier to use, called SqlSoup. It reads database structure automatically and maps it onto classes for you, so you don't have to.
This "bottom-up" approach should help you build up the Python language to better deal with the tasks you commonly perform. By adding a few special functions, you can turn Python into a richer, more appropriate language for whatever you need. And hopefully, by letting the interpreter do more of the work for you, you can build faster, cleaner solutions.
While the approach described here demonstrates some of the things you can do with Python's syntax overloading, you may want to look into established tools for real database use. The SQLAlchemy library is very powerful, or I can also recommend SQLObject. Both are far more complicated than what is presented here, but they are also a great deal more powerful and mature. There is also an add-on to SQLAlchemy which makes it easier to use, called SqlSoup. It reads database structure automatically and maps it onto classes for you, so you don't have to.
Definitions |
||
From the glossary
of the Python tutorial:
Pythonic programming style that determines an object's type by
inspection of its method or attribute signature rather than by
explicit relationship to some type object ("If it looks like a duck
and quacks like a duck, it must be a duck.") By emphasizing
interfaces rather than specific types, well-designed code improves
its flexibility by allowing polymorphic substitution. Duck-typing
avoids tests using
type()
or isinstance()
.
Instead, it typically employs hasattr()
tests or EAFP
programming.
Easier to ask for forgiveness than permission. This common Python
coding style assumes the existence of valid keys or attributes and
catches exceptions if the assumption proves false. This clean and
fast style is characterized by the presence of many try and except
statements. The technique contrasts with the LBYL style that is
common in many other languages such as C.