Example 10.1: Read a file byte by byte

综合技术 2012-09-10



More and more makers of electronic devices use standard storage media to record data. Sometimes this is central to the device's function, as in a camera, so that the data must be easy to recover. Other times, it's effectively incidental, and the device maker may not provide easy access to the stored data.

For example, I recently was prescribed a constant positive air pressure (CPAP) machine for
sleep apnea
. My machine, made by Philips, records data onto a SD card. The card's file system is readable, but Philips provides neither software to read the files, nor a data dictionary to explain what they contain. (I believe Philips sells software that does read it, for ludicrous prices, to physicians who prescribe their machines. Nice racket.)

If you open the files on the card as ASCII files, you get a bunch of gobbledygook. But the data they contain is
mine
, in the most fundamental sense! I want to be able to read it. Fortunately, some folks have done a fair amount of work to reverse engineer the files. Through them, I was able to find some
guidance
for data from a related machine. Now I know what's in the file, more or less: a header of 25 bytes, 1200 bytes of data-- representing 4 minutes of recording--, and a one-byte footer, repeated ad nauseum (ad somnum? somno contingit?).

Today we show how to read bytes stored in a file as signed integers. For this file, (
download
,) we also trim out the header and footer, and make a simple line plot of the recorded data, which appear to be some function of the variable pressure with which the CPAP machine outputs air. Next time we'll make a more useful plot.

SAS

In SAS, we can use the
infile
statement to read in the data (section 1.1.2).


data test;
infile "c:kencpap000000007.005" recfm=n;
input byte ib1. @@;
run;

The
recfm=n
option tells SAS (for Windows, may differ in other OS) to read the file in binary. The
ib1.
informat tells SAS to read the bytes in the native format. (We cover reading in various formats in section 1.1.3, A.6.4, and several examples.) The
@@
tells SAS to hold this line of input, rather than skipping to a new line, when the data is read. (See
Example 8.1
) . SAS will read bytes until there are no more to read.

Now I have a file with 128,680 observations, each being a signed integer. Some of these are actually nonsense, since the header and footer contain data stored in a variety of formats. To get rid of the header, we'll use the

_n_
implied variable (section 1.4.15) which is effectively the line number, in conjunction with the
mod
function. While we're processing the data set anyhow, we'll also figure out the total elapsed time, which will be useful for plotting.


data cycles;
set test;
if mod(_n_,1226) ge 25 and mod(_n_,1226) lt 1225;
/* otherwise it's a header or the footer */
time_min = (4 * int(_n_/1226)) + (mod(_n_, 1226) - 24 )/(300);
/* 4 minutes for each header-data-footer block +
number of measurements in this data block / 300
(measurements per minute).
run;

Now it's easy to plot the data-- a simple connected line plot across time makes sense, and can be made using the
symbol
statement with the
i=j
(j for join) syntax. The result is shown above.


symbol i = j v = none c = black;
proc gplot data = cycles;
where time_min le 4;
plot byte * time_min;
run;
quit;

R

In R, we'll use the
readBin()
function to actually read the file, but we need to do a little prep, first. The
readBin()
function requires we input the number of data elements to read. An overestimate is OK, but we can easily find the exact length of the file using the
file.info
function; the resulting object has a
size
constituent with the number of bytes. We'll also need a "connection" to the file, which is established in a call to the
file()
function.


finfo = file.info("0000000007.005")
toread= file("0000000007.005", "rb")
alldata = readBin(toread, integer(), size=1, n = finfo$size, endian="little")

The
size
option is the length of the elements, in bytes, and the
endian
option helps describe how the bytes should be read.

Analogous to SAS, the
alldata
vector has 128,680 integers. All that remains is to remove the headers and footers. We'll do that by making a logical test with the
%%
operator, saving the result as a vector, and selecting out the data from among the headers and footers using this logical. All that then remains is to plot the data-- we replicate the SAS plot of the first 1200 observations (4 minutes).


keep = 1:finfo$size %% 1226 > 24 & 1:finfo$size %% 1226 < 1225
cycles = alldata[keep]
# cycles gets only the elements of alldata when the corresponding
# element of keep is TRUE

plot(1:1200, cycles[1:1200], type = "l")

The result is shown below. The first four minutes of my night's sleep were apparently characterized by generally lengthening breaths that became increasingly shallow.



An unrelated note about aggregators
We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences.
SAS and R
is aggregated by
R-bloggers
,
PROC-X
, and
statsblogs
with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at
SAS and R
. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our
license
; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

责编内容by:SAS and R (源链)。感谢您的支持!

您可能感兴趣的

Node v9.10.1 (Current) No additional commits. Due to incorrect staging of the upgrade to the GCC 4.9....
Linux apps on Chrome OS – an overview of its bigge... Here’s all you need to know about Google’s year-long secretive development of Li...
SQL Server 2017 on Linux in the Azure portal SQL Server 2017 now is considered as a hybrid database enterprise solution as i...
配置P4的INT应用 背景 上一篇文章已经把P4的环境在Ubuntu14.04中跑起来了,这里开始尝试在P4环境中配置INT( Inband Network Teleme...
威联通折腾篇七:定时任务 威联通的机器本来就是基于 Linux 定义的,所以想要定时任务就会想到crontab,在威联通中使用 crontab 必须使用 SSH 登录。 ...