请选择 进入手机版 | 继续访问电脑版

技术控

    今日:192| 主题:53459
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] 7 Time Series Datasets for Machine Learning

[复制链接]
马上有钱 发表于 2016-11-30 02:28:26
341 5

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
Machine learning can be applied to time series datasets.
  These are problems where a numeric or categorical value must be predicted, but the rows of data are ordered by time.
  A problem when getting started in time series forecasting with machine learning is finding good quality standard datasets on which to practice.
  In this post, you will discover 8 standard time series datasets that you can use to get started and practice time series forecasting with machine learning.
  After reading this post, you will know:
  
       
  • 4 univariate time series datasets.   
  • 3 multivariate time series datasets.   
  • Websites that you can use to search and download more datasets.  
  Let’s get started.
  Univariate Time Series Datasets

  Time series datasets that only have one variable are called univariate datasets.
  These datasets are a great place to get started because:
  
       
  • They are so simple and easy to understand.   
  • You can plot them easily in excel or your favorite plotting tool.   
  • You can easily plot the predictions compared to the expected results.   
  • You can quickly try and evaluate a suite of traditional and newer methods.  
   The website Data Market provides access to a large number of time series datasets. Specifically, the Time Series Data Library created by Rob Hyndman , Professor of Statistics at Monash University, Australia
  Below are 4 univariate time series datasets that you can download for free from Data Market from a range of fields such as Sales, Meteorology, Physics and Demography.
  Shampoo Sales Dataset

  This dataset describes the monthly number of sales of shampoo over a 3 year period.
  The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright and Hyndman (1998).
  Below is a sample of the first 5 rows of data including the header row.
  1. "Month","Sales of shampoo over a three year period"
  2. "1-01",266.0
  3. "1-02",145.9
  4. "1-03",183.1
  5. "1-04",119.3
  6. "1-05",180.3
复制代码
Below is a plot of the entire dataset taken from Data Market.
     

7 Time Series Datasets for Machine Learning

7 Time Series Datasets for Machine Learning
   Shampoo Sales Dataset
    The dataset shows an increasing trend and possibly some seasonal component.
  
       
  • Learn More  
  Minimum Daily Temperatures Dataset

  This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.
  The units are in degrees celsius and there are 3650 observations. The source of the data is credited as the Australian Bureau of Meteorology.
  Below is a sample of the first 5 rows of data including the header row.
  1. "Date","Daily minimum temperatures in Melbourne, Australia, 1981-1990"
  2. "1981-01-01",20.7
  3. "1981-01-02",17.9
  4. "1981-01-03",18.8
  5. "1981-01-04",14.6
  6. "1981-01-05",15.8
复制代码
Below is a plot of the entire dataset taken from Data Market.
     

7 Time Series Datasets for Machine Learning

7 Time Series Datasets for Machine Learning
   Minimum Daily Temperatures
    The dataset shows a strong seasonality component and has a nice fine grained detail to work with.
  
       
  • Learn More  
  Monthly Sunspot Dataset

  This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).
  The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).
  Below is a sample of the first 5 rows of data including the header row.
  1. "Month","Zuerich monthly sunspot numbers 1749-1983"
  2. "1749-01",58.0
  3. "1749-02",62.6
  4. "1749-03",70.0
  5. "1749-04",55.7
  6. "1749-05",85.0
复制代码
Below is a plot of the entire dataset taken from Data Market.
     

7 Time Series Datasets for Machine Learning

7 Time Series Datasets for Machine Learning
   Monthly Sun Spot Dataset
    The dataset shows seasonality with large differences between seasons.
  
       
  • Learn More  
  Daily Female Births Dataset

  This dataset describes the number of daily female births in California in 1959.
  The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).
  Below is a sample of the first 5 rows of data including the header row.
  1. "Date","Daily total female births in California, 1959"
  2. "1959-01-01",35
  3. "1959-01-02",32
  4. "1959-01-03",30
  5. "1959-01-04",31
  6. "1959-01-05",44
复制代码
Below is a plot of the entire dataset taken from Data Market.
     

7 Time Series Datasets for Machine Learning

7 Time Series Datasets for Machine Learning
   Daily Female Births Dataset
   
       
  • Learn More  
  Multivariate Time Series Datasets

  Multivariate datasets are generally more challenging and are the sweet spot for machine learning methods.
   A great source of multivariate time series data is the UCI Machine Learning Repository . At the time of writing, there are
  At the time of writing, there are 63 time series datasets that you can download for free and work with.
  Below is a selection of 3 recommended multivariate time series datasets from Meteorology, Medicine and Monitoring domains.
  EEG Eye State Dataset

  This dataset describes EEG data for an individual and whether their eyes were open or closed. The objective of the problem is to predict whether eyes are open or closed given EEG data alone.
  The objective of the problem is to predict whether eyes are open or closed given EEG data alone.
  This is a classification predictive modeling problems and there are a total of 14,980 observations and 15 input variables. The class value of ‘1’ indicates the eye-closed and ‘0’ the eye-open state. Data is ordered by time and observations were recorded over a period of 117 seconds.
  Data is ordered by time and observations were recorded over a period of 117 seconds.
  Below is a sample of the first 5 rows with no header row.
  1. 4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.9,4393.85,0
  2. 4324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82,4384.1,0
  3. 4327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,0
  4. 4328.72,4011.79,4296.41,4155.9,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,0
  5. 4326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.9,4627.69,4210.77,4244.1,4212.82,4288.21,4632.82,4398.46,0
复制代码

       
  • Learn More  
  Occupancy Detection Dataset

  This dataset describes measurements of a room and the objective is to predict whether or not the room is occupied.
  There are 20,560 one-minute observations taken over the period of a few weeks. This is a classification prediction problem. There are 7 attributes including various light and climate properties of the room.
  The source for the data is credited to Luis Candanedo from UMONS.
  Below is a sample of the first 5 rows of data including the header row.
  1. "date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"
  2. "1","2015-02-04 17:51:00",23.18,27.272,426,721.25,0.00479298817650529,1
  3. "2","2015-02-04 17:51:59",23.15,27.2675,429.5,714,0.00478344094931065,1
  4. "3","2015-02-04 17:53:00",23.15,27.245,426,713.5,0.00477946352442199,1
  5. "4","2015-02-04 17:54:00",23.15,27.2,426,708.25,0.00477150882608175,1
  6. "5","2015-02-04 17:55:00",23.1,27.2,426,704.5,0.00475699293331518,1
  7. "6","2015-02-04 17:55:59",23.1,27.2,419,701,0.00475699293331518,1
复制代码
The data is provided in 3 files that suggest the splits that may be used for training and testing a model.
  
       
  • Learn More  
  Ozone Level Detection Dataset

  This dataset describes 6 years of ground ozone concentration observations and the objective is to predict whether it is an “ozone day” or not.
  The dataset contains 2,536 observations and 73 attributes. This is a classification prediction problem and the final attribute indicates the class value as “1” for an ozone day and “0” for a normal day.
  Two versions of the data are provided, eight-hour peak set and one-hour peak set. I would suggest using the one hour peak set for now.
  Below is a sample of the first 5 rows with no header row.
  1. 1/1/1998,0.8,1.8,2.4,2.1,2,2.1,1.5,1.7,1.9,2.3,3.7,5.5,5.1,5.4,5.4,4.7,4.3,3.5,3.5,2.9,3.2,3.2,2.8,2.6,5.5,3.1,5.2,6.1,6.1,6.1,6.1,5.6,5.2,5.4,7.2,10.6,14.5,17.2,18.3,18.9,19.1,18.9,18.3,17.3,16.8,16.1,15.4,14.9,14.8,15,19.1,12.5,6.7,0.11,3.83,0.14,1612,-2.3,0.3,7.18,0.12,3178.5,-15.5,0.15,10.67,-1.56,5795,-12.1,17.9,10330,-55,0,0.
  2. 1/2/1998,2.8,3.2,3.3,2.7,3.3,3.2,2.9,2.8,3.1,3.4,4.2,4.5,4.5,4.3,5.5,5.1,3.8,3,2.6,3,2.2,2.3,2.5,2.8,5.5,3.4,15.1,15.3,15.6,15.6,15.9,16.2,16.2,16.2,16.6,17.8,19.4,20.6,21.2,21.8,22.4,22.1,20.8,19.1,18.1,17.2,16.5,16.1,16,16.2,22.4,17.8,9,0.25,-0.41,9.53,1594.5,-2.2,0.96,8.24,7.3,3172,-14.5,0.48,8.39,3.84,5805,14.05,29,10275,-55,0,0.
  3. 1/3/1998,2.9,2.8,2.6,2.1,2.2,2.5,2.5,2.7,2.2,2.5,3.1,4,4.4,4.6,5.6,5.4,5.2,4.4,3.5,2.7,2.9,3.9,4.1,4.6,5.6,3.5,16.6,16.7,16.7,16.8,16.8,16.8,16.9,16.9,17.1,17.6,19.1,21.3,21.8,22,22.1,22.2,21.3,19.8,18.6,18,18,18.2,18.3,18.4,22.2,18.7,9,0.56,0.89,10.17,1568.5,0.9,0.54,3.8,4.42,3160,-15.9,0.6,6.94,9.8,5790,17.9,41.3,10235,-40,0,0.
  4. 1/4/1998,4.7,3.8,3.7,3.8,2.9,3.1,2.8,2.5,2.4,3.1,3.3,3.1,2.3,2.1,2.2,3.8,2.8,2.4,1.9,3.2,4.1,3.9,4.5,4.3,4.7,3.2,18.3,18.2,18.3,18.4,18.6,18.6,18.5,18.7,18.6,18.8,19,19,19.3,19.4,19.6,19.2,18.9,18.8,18.6,18.5,18.3,18.5,18.8,18.9,19.6,18.7,9.9,0.89,-0.34,8.58,1546.5,3,0.77,4.17,8.11,3145.5,-16.8,0.49,8.73,10.54,5775,31.15,51.7,10195,-40,2.08,0.
  5. 1/5/1998,2.6,2.1,1.6,1.4,0.9,1.5,1.2,1.4,1.3,1.4,2.2,2,3,3,3.1,3.1,2.7,3,2.4,2.8,2.5,2.5,3.7,3.4,3.7,2.3,18.8,18.6,18.5,18.5,18.6,18.9,19.2,19.4,19.8,20.5,21.1,21.9,23.8,25.1,25.8,26,25.6,24.2,22.9,21.6,20,19.5,19.1,19.1,26,21.1,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,0.58,0.
  6. 1/6/1998,3.1,3.5,3.3,2.5,1.6,1.7,1.6,1.6,2.3,1.8,2.5,3.9,3.4,2.7,3.4,2.5,2.2,4.4,4.3,3.2,6.2,6.8,5.1,4,6.8,3.2,18.9,19.5,19.6,19.5,19.5,19.5,19.4,19.2,19.1,19.5,19.6,18.6,18.6,18.9,19.2,19.3,19.2,18.8,17.6,16.9,15.6,15.4,15.9,15.8,19.6,18.5,14.4,0.68,1.52,8.62,1499.5,4.3,0.61,9.04,10.81,3111,-11.8,0.09,11.98,11.28,5770,27.95,46.25,10120,?,5.84,0.
复制代码

       
  • Learn More  
  Summary

  In this post, you discovered a suite of standard time series forecast datasets that you can use to get started and practice time series forecasting with machine learning methods.
  Specifically, you learned about:
  
       
  • 4 univariate time series forecasting datasets.   
  • 3 multivariate time series forecasting datasets.   
  • Two websites where you can download many more datasets.  
  Did you use one of the above datasets in your own project?
  Share your findings in the comments below.



上一篇:Downloading files from web using Python
下一篇:JetBrains在国内举办开发者日 首席布道师Hadi Hariri向中国开发者介绍Kotlin ...
音乐AS篮球 发表于 2016-11-30 04:02:09
前排,留名!
回复 支持 反对

使用道具 举报

与我无关统统 发表于 2016-11-30 05:29:56
楼主开恩,沙发是我滴。。。。。
回复 支持 反对

使用道具 举报

a303464904 发表于 2016-11-30 05:55:31
在哪里跌倒就在哪里躺下!
回复 支持 反对

使用道具 举报

青筠 发表于 2016-12-3 21:25:47
想污染一个地方有两种方法:垃圾,或是钞票.
回复 支持 反对

使用道具 举报

雁卉 发表于 2016-12-11 04:48:51
支持,赞一个
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )

© 2001-2017 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表