网络科技

    今日:500| 主题:244642
收藏本版
互联网、科技极客的综合动态。

[其他] A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot

[复制链接]
南方北方是远方 发表于 2016-10-4 03:11:59
609 48

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
Why Even Try, Man?

  I recently came upon Brian Granger and Jake VanderPlas’s Altair, a promising young visualization library. Altair seems well-suited to addressing Python’s ggplot envy, and its tie-in with JavaScript’s Vega-Lite grammar means that as the latter develops new functionality (e.g., tooltips and zooming), Altair benefits — seemingly for free!
   Indeed, I was so impressed by Altair that the original thesis of my post was going to be: “Yo, use Altair.”
   But then I began ruminating on my own Pythonic visualization habits, and — in a painful moment of self-reflection — realized I’m all over the place: I use a hodgepodge of tools and disjointed techniques depending on the task at hand (usually whichever library I first used to accomplish that task 1 ).
   This is no good. As the old saying goes: “The unexamined plot is not worth exporting to a PNG.”
  Thus, I’m using my discovery of Altair as an opportunity to step back — to investigate how Python’s visualization options hang together. I hope this investigation proves helpful for you as well.
   How’s This Gonna Go?

   The conceit of this post will be: “You need to do Thing X. How would you do Thing X in matplotlib? pandas? Seaborn? ggplot? Altair?”   By doing many different Thing X’s, we’ll develop a reasonable list of pros, cons, and takeaways — or at least a whole bunch of code that might be somehow useful.
  (Warning: this all may happen in the form of a two-act play.)
   The Options (in ~Descending Order of Subjective Complexity)

   First, let’s welcome our friends 2 :
    matplotlib
  The 800-pound gorilla — and like most 800-pound gorillas, this one should probably be avoided unless you genuinely need its power, e.g., to make a really custom plot or produce a publication-ready graphic
    pandas
   “Come for the DataFrames; stay for the plotting convenience functions that are arguably more pleasant than the matplotlib code they supplant.” — rejected pandas taglines
  (Bonus tidbit: the pandas team must include a few visualization nerds, as the library includes things like RadViz plots and Andrews Curves that I haven’t seen elsewhere.)
    Seaborn
  Seaborn has long been my go-to library for statistical visualization; it summarizes itself thusly:
  “If matplotlib ‘tries to make easy things easy and hard things possible,’ seaborn tries to make a well-defined set of hard things easy too”
    yhat’s ggplot
  A Python implemention of the grammar of graphics. This isn’t a “feature-for-feature port of ggplot2,” but there’s strong feature overlap. (And speaking as a part-time R user, the main geoms seem to be in place.)
    Altair
  The new guy, Altair is a “declarative statistical visualization library” with an exceedingly pleasant API.
  Wonderful. Now that our guests have arrived and checked their coats, let’s settle in for our very awkward dinner conversation. Our show is entitled…
   Little Shop of Python Visualization Libraries (starring all libraries as themselves)  

   ACT I: LINES AND DOTS

  (In Scene 1, we’ll be dealing with a tidy data set named “ts.” It consists of three columns: a “dt” column (for dates); a “value” column (for values); and a “kind” column, which has four unique levels: A, B, C, and D. Here’s a preview…)
                   dt     kind     value                   0     2000-01-01     A     1.442521             1     2000-01-02     A     1.981290             2     2000-01-03     A     1.586494             3     2000-01-04     A     1.378969             4     2000-01-05     A     -0.277937            Scene 1: How would you plot multiple time series on the same graph?

    matplotlib: Ha! Haha!  Beyond simple. While I  could and  would accomplish this task in any number of complex ways, I know your feeble brains would crumble under the weight of their ingenuity. Hence, I dumb it down, showing you two simple methods. In the first, I loop through your trumped-up matrix — I believe you peons call it a “Data” “Frame” — and subset it to the relevant time series. Next, I invoke my “plot” method and pass in the relevant columns from that subset.
  [code]# MATPLOTLIB
fig, ax = plt.subplots(1, 1,
                       figsize=(7.5, 5))

for k in ts.kind.unique():
    tmp = ts[ts.kind == k]
    ax.plot(tmp.dt, tmp.value, label=k)

ax.set(xlabel='Date',
       ylabel='Value',
       title='Random Timeseries')   

ax.legend(loc=2)
fig.autofmt_xdate()[/code]   
A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot -1 (discovery,techniques,including,recently,benefits)

    MPL:   Next, I enlist this chump  (*motions to pandas*) , and have him pivot this “Data” “Frame” so that it looks like this…
  [code]# in matplotlib-land, the notion of a tidy
# dataframe matters not
dfp = ts.pivot(index='dt', columns='kind', values='value')
dfp.head()[/code]              kind     A     B     C     D             dt                                       2000-01-01     1.442521     1.808741     0.437415     0.096980             2000-01-02     1.981290     2.277020     0.706127     -1.523108             2000-01-03     1.586494     3.474392     1.358063     -3.100735             2000-01-04     1.378969     2.906132     0.262223     -2.660599             2000-01-05     -0.277937     3.489553     0.796743     -3.417402             MPL: By transforming the data into an index with four columns — one for each line I want to plot — I can do the whole thing in one fell swoop (i.e., a single call of my “plot” function).
  [code]# MATPLOTLIB
fig, ax = plt.subplots(1, 1,
                       figsize=(7.5, 5))

ax.plot(dfp)

ax.set(xlabel='Date',
       ylabel='Value',
       title='Random Timeseries')

ax.legend(dfp.columns, loc=2)
fig.autofmt_xdate()[/code]

A Dramatic Tour through Python’s Data Visualization Landscape (including ggplot -2 (discovery,techniques,including,recently,benefits)

      pandas (*looking timid*): That was great, Mat. Really great. Thanks for including me. I do the same thing — hopefully as good?      (*smiles weakly*)
     [code]# PANDAS
fig, ax = plt.subplots(1, 1,
                       figsize=(7.5, 5))

dfp.plot(ax=ax)

ax.set(xlabel='Date',
       ylabel='Value',
       title='Random Timeseries')

ax.legend(loc=2)
fig.autofmt_xdate()[/code]    pandas: It looks exactly the same, so I just won’t show it.
    Seaborn (*smoking a cigarette and adjusting her beret*): Hmmm. Seems like an awful lot of data manipulation for a silly line graph. I mean, for loops and pivoting? This isn’t the 90’s or Microsoft Excel. I have this thing called a FacetGrid I picked up when I went abroad. You’ve probably never heard of it…
  [code]# SEABORN
g = sns.FacetGrid(ts, hue='kind', size=5, aspect=1.5)
g.map(plt.plot, 'dt', 'value').add_legend()
g.ax.set(xlabel='Date',
         ylabel='Value',
         title='Random Timeseries')
g.fig.autofmt_xdate()[/code]
友荐云推荐




上一篇:Apple Watch sales will be worse in 2016 than in 2015 (AAPL)
下一篇:Independent consumer body ranks iPhone 7 well behind Android rivals in battery l
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

可乐天曲 发表于 2016-10-4 03:51:04
俺从不写措字,但俺写通假字!  
回复 支持 反对

使用道具 举报

魂凝静幽 发表于 2016-10-4 03:51:08
兜兜转转,楼主我又来了!
回复 支持 反对

使用道具 举报

aphr9868 发表于 2016-10-4 03:51:40
如果我做了皇帝,就封你当太子!
回复 支持 反对

使用道具 举报

相亲相爱一佰年 发表于 2016-10-4 03:51:47
我看着大家顶!
回复 支持 反对

使用道具 举报

qgbmc 发表于 2016-10-4 03:54:20
撸过...
回复 支持 反对

使用道具 举报

怀萍 发表于 2016-10-4 04:01:11
接下来是见朕骑妓的时刻
回复 支持 反对

使用道具 举报

董杰 发表于 2016-10-4 04:06:27
如果回帖是一种美德,那董杰早就成为圣人了!  
回复 支持 反对

使用道具 举报

ci尔宓影b 发表于 2016-10-4 04:15:05
好,很好,非常好!
回复 支持 反对

使用道具 举报

冬萱 发表于 2016-10-4 04:16:26
very good
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表