技术控

    今日:67| 主题:49369
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] Working with SQLite Databases using Python and Pandas

[复制链接]
安靜做個小孩 发表于 2016-10-4 11:57:51
91 2

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
SQLite is a database engine that makes it simple to store and work with relational data. Much like the csv format, SQLite stores data in a single file that can be easily shared with others. Most programming languages and environments have good support for working with SQLite databases. Python is no exception, and a library to access SQLite databases, called sqlite3 , has been included with Python since version 2.5 . In this post, we’ll walk through how to use sqlite3 to create, query, and update databases. We’ll also cover how to simplify working with SQLite databases using the pandas package. We’ll be using Python 3.5 , but this same approach should work with Python 2 .
  Before we get started, let’s take a quick look at the data we’ll be working with. We’ll be looking at airline flight data, which contains information on airlines, airports, and routes between airports. Each route represents a repeated flight that an airline flies between a source and a destination airport.
   All of the data is in a SQLite database called flights.db , which contains three tables – airports , airlines , and routes . You can download the data here .
   Here are two rows from the airlines table:
                   id     name     alias     iata     icao     callsign     country     active                   10     11     4D Air     \N     NaN     QRT     QUARTET     Thailand     N             11     12     611897 Alberta Limited     \N     NaN     THD     DONUT     Canada     N            As you can see above, each row is a different airline, and each column is a property of that airline, such as name , and country . Each airline also has a unique id , so we can easily look it up when we need to.
   Here are two rows from the airports table:
                   id     name     city     country     code     icao     latitude     longitude     altitude     offset     dst     timezone                   0     1     Goroka     Goroka     Papua New Guinea     GKA     AYGA     -6.081689     145.391881     5282     10     U     Pacific/Port_Moresby             1     2     Madang     Madang     Papua New Guinea     MAG     AYMD     -5.207083     145.7887     20     10     U     Pacific/Port_Moresby            As you can see, each row corresponds to an airport, and contains information on the location of the airport. Each airport also has a unique id , so we can easily look it up.
   Here are two rows from the routes table:
                   airline     airline_id     source     source_id     dest     dest_id     codeshare     stops     equipment                   0     2B     410     AER     2965     KZN     2990     NaN     0     CR2             1     2B     410     ASF     2966     KZN     2990     NaN     0     CR2            Each route contains an airline_id , which the id of the airline that flies the route, as well as source_id , which is the id of the airport that the route originates from, and dest_id , which is the id of the destination airport for the flight.
  Now that we know what kind of data we’re working with, let’s start by connecting to the database and running a query.
  Querying database rows in Python

   In order to work with a SQLite database from Python, we first have to connect to it. We can do that using the connect function, which returns a Connection object:
  [code]import sqlite3

conn = sqlite3.connect("flights.db")[/code]   Once we have a Connection object, we can then create a Cursor object. Cursors allow us to execute SQL queries against a database:
  [code]cur = conn.cursor()[/code]   Once we have a Cursor object, we can use it to execute a query against the database with the aptly named execute method. The below code will fetch the first 5 rows from the airlines table:
  [code]cur.execute("select * from airlines limit 5;")[/code]   You may have noticed that we didn’t assign the result of the above query to a variable. This is because we need to run another command to actually fetch the results. We can use the fetchall method to fetch all of the results of a query:
  [code]results = cur.fetchall()
print(results)[/code]  [code][(0, '1', 'Private flight', '\\N', '-', None, None, None, 'Y'),
(1, '2', '135 Airways', '\\N', None, 'GNL', 'GENERAL', 'United States', 'N'),
(2, '3', '1Time Airline', '\\N', '1T', 'RNX', 'NEXTIME', 'South Africa', 'Y'),
(3, '4', '2 Sqn No 1 Elementary Flying Training School', '\\N', None, 'WYT', None, 'United Kingdom', 'N'),
(4, '5', '213 Flight Unit', '\\N', None, 'TFU', None, 'Russia', 'N')][/code]   As you can see, the results are formatted as a list of tuples . Each tuple corresponds to a row in the database that we accessed. Dealing with data this way is fairly painful. We’d need to manually add column heads, and manually parse the data. Luckily, the pandas library has an easier way, which we’ll look at in the next section.
  Before we move on, it’s good practice to close Connection objects and Cursor objects that are open. This prevents the SQLite database from being locked. When a SQLite database is locked, you may be unable to update the database, and may get errors. We can close the Cursor and the Connection like this:
  [code]cur.close()
conn.close()[/code]  Mapping airports

  Using our newfound knowledge of queries, we can create a plot that shows where all the airports in the world are. First, we query latitudes and longitudes:
  [code]import sqlite3

conn = sqlite3.connect("flights.db")
cur = conn.cursor()
coords = cur.execute("""
  select cast(longitude as float),
  cast(latitude as float)
  from airports;"""
).fetchall()[/code]   The above query will retrieve the latitude and longitude columns from airports , and convert both of them to floats. We then call the fetchall method to retrieve them.
   We then need to setup our plotting by importing matplotlib , the primary plotting library for Python. Combined with the basemap package, this allows us to create maps only using Python.
  We first need to import the libraries:
  [code]from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt[/code]  Then, we setup our map, and draw the continents and coastlines that will form the background of our map:
  [code]m = Basemap(
  projection='merc',
  llcrnrlat=-80,
  urcrnrlat=80,
  llcrnrlon=-180,
  urcrnrlon=180,
  lat_ts=20,
  resolution='c'
)

m.drawcoastlines()
m.drawmapboundary()[/code]  Finally, we plot the coordinates of each airport onto the map. We retrieved a list of tuples from the SQLite database. The first element in each tuple is the longitude of the airport, and the second is the latitude. We’ll convert the longitudes and latitudes into their own lists, and then plot them on the map:
  [code]x, y = m(
  [l[0] for l in coords],
  [l[1] for l in coords]
)

m.scatter(
  x,
  y,
  1,
  marker='o',
  color='red'
)[/code]  We end up with a map that shows every airport in the world:
     
Working with SQLite Databases using Python and Pandas-1 (represents,databases,repeated,included,version)
    As you may have noticed, working with data from the database is a bit painful. We needed to remember which position in each tuple corresponded to what database column, and manually parse out individual lists for each column. Luckily, the pandas library gives us an easier way to work with the results of SQL queries.
  Reading results into a pandas DataFrame

   We can use the pandas read_sql_query function to read the results of a SQL query directly into a pandas DataFrame. The below code will execute the same query that we just did, but it will return a DataFrame. It has several advantages over the query we did above:
  
       
  • It doesn’t require us to create a Cursor object or call fetchall at the end.   
  • It automatically reads in the names of the headers from the table.   
  • It creates a DataFrame, so we can quickly explore the data.  
  [code]cur = conn.cursor()0[/code]                   index     id     name     alias     iata     icao     callsign     country     active                   0     0     1     Private flight     \N     -     None     None     None     Y             1     1     2     135 Airways     \N     None     GNL     GENERAL     United States     N             2     2     3     1Time Airline     \N     1T     RNX     NEXTIME     South Africa     Y             3     3     4     2 Sqn No 1 Elementary Flying Training School     \N     None     WYT     None     United Kingdom     N             4     4     5     213 Flight Unit     \N     None     TFU     None     Russia     N           As you can see, we get a nicely formatted DataFrame as the result. We could easily manipulate the columns:
  [code]cur = conn.cursor()1[/code]  [code]cur = conn.cursor()2[/code]   It’s highly recommended to use the read_sql_query function when possible.
  Mapping routes

  Now that we know how to read queries into pandas DataFrames, we can create a map of every airline route in the world. We first start by querying the data. The below query will:
  
       
  • Get the latitude and longitude for the source airport for each route.   
  • Get the latitude and longitude for the destination airport for each route.   
  • Convert all the coordinate values to floats.   
  • Read the results into a DataFrame, and store them to the variable routes .  
  [code]cur = conn.cursor()3[/code]  We then setup our map:
  [code]cur = conn.cursor()4[/code]   We iterate through the first 3000 rows, and draw them. The below code will:
  
       
  • Loop through the first 3000 rows in routes .   
  • Figure out if the route is too long.   
  • If the route isn’t too long:
           
    • Draw a circle between the origin and the destination.   
       
  [code]cur = conn.cursor()5[/code]  We end up with the following map:

Working with SQLite Databases using Python and Pandas-2 (represents,databases,repeated,included,version)
     The above is much more efficient when we use pandas to turn the results of the SQL query into a DataFrame, instead of working with the raw results from sqlite3 .
  Now that we know how to query database rows, let’s move on to modifying them.
     Enjoying this post? Learn data science with Dataquest!

   
  
       
  • Learn from the comfort of your browser.   
  • Work with real-life data sets.   
  • Build a portfolio of projects.   
    Start for Free    Modifying database rows

   We can use the sqlite3 package to modify a SQLite database by inserting, updating, or deleting rows. Creating the Connection is the same for this as it is when you’re querying a table, so we’ll skip that part.
  Inserting rows with Python

   To insert a row, we need to write an INSERT query. The below code will add a new row to the airlines table. We specify 9 values to insert, one for each column in airlines . This will add a new row to the table.
  [code]cur = conn.cursor()6[/code]   If you try to query the table now, you actually won’t see the new row yet. Instead you’ll see that a file was created called flights.db-journal . flights.db-journal is storing the new row until you’re ready to commit it to the main database, flights.db .
   SQLite doesn’t write to the database until you commit a transaction . A transaction consists of 1 or more queries that all make changes to the database at once. This is designed to make it easier to recover from accidental changes, or errors. Transactions allow you to run several queries, then finally alter the database with the results of all of them. This ensures that if one of the queries fails, the database isn’t partially updated.
   A good example is if you have two tables, one of which contains charges made to people’s bank accounts ( charges ), and another which contains the dollar amount in the bank accounts ( balances ). Let’s say a bank customer, Roberto, wants to send $50 to his sister, Luisa. In order to make this work, the bank would need to:
  
       
  • Create a row in charges that says $50 is being taken from Roberto’s account and sent to Luisa.   
  • Update Roberto’s row in the balances table and remove $50.   
  • Update Luisa’s row in the balances table and add $50.  
  These will require three separate SQL queries to update all of the tables. If a query fails, we’ll be stuck with bad data in our database. For example, if the first two queries work, then the third fails, Roberto will lose his money, but Luisa won’t get it. Transactions mean that the main database isn’t updated unless all the queries succeed. This prevents the system from getting into a bad state, where customers lose their money.
   By default, sqlite3 opens a transaction when you do any query that modifies the database. You can read more about it here . We can commit the transaction, and add our new row to the airlines table, using the commit method:
  [code]cur = conn.cursor()7[/code]   Now, when we query flights.db , we’ll see the extra row that contains our test flight:
  [code]cur = conn.cursor()8[/code]                   index     id     name     alias     iata     icao     callsign     country     active                   0     1     19846     Test flight               None     None     None     Y           Passing parameters into a query

  In the last query, we hardcoded the values we wanted to insert into the database. Most of the time, when you insert data into a database, it won’t be hardcoded, it will be dynamic values you want to pass in. These dynamic values might come from downloaded data, or might come from user input.
  When working with dynamic data, it might be tempting to insert values using Python string formatting:
  [code]cur = conn.cursor()9[/code]   You want to avoid doing this! Inserting values with Python string formatting makes your program vulnerable to SQL Injection attacks. Luckily, sqlite3 has a straightforward way to inject dynamic values without relying on string formatting:
  [code]cur.execute("select * from airlines limit 5;")0[/code]   Any ? value in the query will be replaced by a value in values . The first ? will be replaced by the first item in values , the second by the second, and so on. This works for any type of query. This created a SQLite parameterized query , which avoids SQL injection issues.
  Updating rows

   We can modify rows in a SQLite table using the execute method:
  [code]cur.execute("select * from airlines limit 5;")1[/code]  We can then verify that the update happened:
  [code]cur.execute("select * from airlines limit 5;")2[/code]                   index     id     name     alias     iata     icao     callsign     country     active                   0     6049     19847     Test Flight               None     None     USA     Y           Deleting rows

   Finally, we can delete the rows in a database using the execute method:
  [code]cur.execute("select * from airlines limit 5;")3[/code]  We can then verify that the deletion happened, by making sure no rows match our query:
  [code]cur.execute("select * from airlines limit 5;")2[/code]                   index     id     name     alias     iata     icao     callsign     country     active           Creating tables

  We can create tables by executing a SQL query. We can create a table to represent each daily flight on a route, with the following columns:
  
       
  • id – integer   
  • departure – date, when the flight left the airport   
  • arrival – date, when the flight arrived at the destination   
  • number – text, the flight number   
  • route_id – integer, the id of the route the flight was flying  
  [code]cur.execute("select * from airlines limit 5;")5[/code]  Once we create a table, we can insert data into it normally:
  [code]cur.execute("select * from airlines limit 5;")6[/code]  When we query the table, we’ll now see the row:
  [code]cur.execute("select * from airlines limit 5;")7[/code]                   id     departure     arrival     number     route_id                   0     1     2016-09-28 0:00     2016-09-28 12:00     T1     1           Creating tables with pandas

  The pandas package gives us a much faster way to create tables. We just have to create a DataFrame first, then export it to a SQL table. First, we’ll create a DataFrame:
  [code]cur.execute("select * from airlines limit 5;")8[/code]   Then, we’ll be able to call the to_sql method to convert df to a table in a database. We set the keep_exists parameter to replace to delete and replace any existing tables named daily_flights :
  [code]cur.execute("select * from airlines limit 5;")9[/code]  We can then verify that everything worked by querying the database:
  [code]cur.execute("select * from airlines limit 5;")7[/code]                   index     id     departure     arrival     number     route_id                   0     0     1     2016-09-29 00:00:00     2016-09-29 12:00:00     T1     1           Altering tables with Pandas

   One of the hardest parts of working with real-world data science is that the data you have per record changes often. Using our airline example, we may decide to add an airplanes field to the airlines table that indicates how many airplanes each airline owns. Luckily, there’s a way to alter a table to add columns in SQLite:
  [code]results = cur.fetchall()
print(results)1[/code]   Note that we don’t need to call commit – alter table queries are immediately executed, and aren’t placed into a transaction. We can now query and see the extra column:
  [code]results = cur.fetchall()
print(results)2[/code]                   index     id     name     alias     iata     icao     callsign     country     active     airplanes                   0     0     1     Private flight     \N     -     None     None     None     Y     None            Note that all the columns are set to null in SQLite (which translates to None in Python) because there aren’t any values for the column yet.
  Altering tables with Pandas

  It’s also possible to use Pandas to alter tables by exporting the table to a DataFrame, making modifications to the DataFrame, then exporting the DataFrame to a table:
  [code]results = cur.fetchall()
print(results)3[/code]   The above code will add a column called delay_minutes to the daily_flights table.
  Further reading

  You should now have a good grasp on how to work with data in a SQLite database using Python and pandas. We covered querying databases, updating rows, inserting rows, deleting rows, creating tables, and altering tables. This covers all of the major SQL operations, and almost everything you’d work with on a day to day basis.
  Here are some supplemental resources if you want to dive deeper:
  
       
  • sqlite3 documentation   
  • Comparing pandas and SQL   
  • Dataquest SQL Course   
  • sqlite3 operations guide  
   If you want to keep practicing, you can download the file we used in this blog post, flights.db , here .
友荐云推荐




上一篇:Voice from CERN: Apache Spark 2.0 Performance Improvements Investigated With Fla
下一篇:Guide to building modern JavaScript applications — mobile, native, hybrid, an ..
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

爱若只如初见 发表于 2016-10-5 11:50:48
禽兽不如应该是说不回帖的吧?
回复 支持 反对

使用道具 举报

雷雪莹 发表于 2016-10-5 12:30:18
有些失望是无可避免的,但大部分的失望,都是因为你高估了自己。
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表