AUSTRALIAN LITERATURE

Published by

  • expression écrite
Australian Literature 1 AUSTRALIAN LITERATURE 3. letnik - 2000/2001
  • australia terra australis incognita
  • convicts
  • early post-colonial literatures
  • colonial period
  • social system
  • point of view
  • point view
  • australia
  • land
Published : Tuesday, March 27, 2012
Reading/s : 49
Origin : cs.nyu.edu
Number of pages: 274
See more See less

High Performance Data Mining in Time Series:
Techniques and Case Studies
by
Yunyue Zhu
A dissertation submitted in partial fulflllment
of the requirements for the degree of
Doctor of Philosophy
Department of Computer Science
New York University
January 2004
Dennis Shasha°c Yunyue Zhu
All Rights Reserved, 2004To my parents and Amy, for many wonderful things in life.
iiiAcknowledgments
This dissertation would never have materialized without the contribution of
many individuals to whom I have the pleasure of expressing my appreciation
and gratitude.
First of all, I gratefully acknowledge the persistent support and encourage-
ment from my advisor, Professor Dennis Shasha. He provided constant aca-
demic guidance and inspired many of the ideas presented here. Dennis is a
superb teacher and a great friend.
I wish to express my deep gratitude to Professor Ernest Davis and Profes-
sor Chee Yap for serving on my proposal and dissertation committees. Their
comments on this thesis are precious. I also thank the other members of my
dissertation committee, Professor Richard Cole, Dr. Flip Korn and Professor
Arthur Goldberg, for their interest in this dissertation and for their feedback.
Rich interactions with colleagues improve research and make it enjoyable.
Professor Allen Mincer has both introduced me to high-energy physics and
arrangedtheaccesstoMilagrodataandsoftware. StuartLewishashelpedwith
many exciting ideas and promising introductions to the Magnetic Resonance
Imagery community. Within the database group, Tony Corso, Hsiao-Lan Hsu,
Alberto Lerner, Nicolas Levi, David Rothman, David Tanzer, Aris Tsirigos,
Zhihua Wang, Xiaojian Zhao have lent both voices and helpful suggestions in
ivthe course of this work. This is certainly not a complete list. I am thankful for
many friends with whom I share more than just an academic relationship.
Rosemary Amico, Anina Karmen and Maria L. Petagna performed the ad-
ministrativeworkrequiredforthisresearch. Theywerevitalinmakingmystay
at NYU enjoyable.
Finally and most importantly, I would like to thank my parents for their
efiorts to provide me with the best possible education.
vAbstract
As extremely large time series data sets grow more prevalent in a wide vari-
ety of settings, we face the signiflcant challenge of developing e–cient analysis
methods. This dissertation addresses the problem in designing fast, scalable
algorithms for the analysis of time series.
The flrst part of this dissertation describes the framework for high perfor-
mance time series data mining based on important primitives. Data reduction
transform such as the Discrete Fourier Transform, the Discrete Wavelet Trans-
form, Singular Value Decomposition and Random Projection, can reduce the
size of the data without substantial loss of information, therefore provides a
synopsis of the data. Indexing methods organize data so that the time series
data can be retrieved e–ciently. Transformation on time series, such as shift-
ing, scaling, time shifting, time scaling and dynamic time warping, facilitates
the discovery of exible patterns from time series.
The second part of this dissertation integrates the above primitives into
useful applications ranging from music to physics to flnance to medicine.
StatStream StatStream is a system based on fast algorithms for flnding
the most highly correlated pairs of time series from among thousands of time
series streams and doing so in a moving window fashion. It can be used to flnd
correlations in time series in flnance and in scientiflc applications.
viHumFinderMostpeoplehumratherpoorly. Nevertheless,somehowpeople
have some idea what we are humming when we hum. The goal of the query by
humming program, HumFinder, is to make a computer do what a person can
do. Using pitch translation, time dilation, and dynamic time warping, one can
match an inaccurate hum to a melody remarkably accurately.
OmniBurstBurstdetectionistheactivityofflndingabnormalaggregatesin
datastreams. Oursoftware,OmniBurst,candetectburstsofvaryingdurations.
Our example applications are monitoring gamma rays and stock market price
volatility. The software makes use of a shifted wavelet structure to create a
linear time fllter that can guarantee that no bursts will be missed at the same
time that it guarantees (under a reasonable statistical model) that the fllter
eliminates nearly all false positives.
viiContents
Dedication iii
Acknowledgments iv
Abstract vi
List of Figures xi
List of Tables xviii
I Review of Techniques 1
1 Time Series Preliminaries 2
1.1 High Performance Time Series Analysis . . . . . . . . . . . . . . 8
2 Data Reduction Techniques 11
2.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.3 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . 71
2.4 Sketches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.5 Comparison of Data Reduction Techniques . . . . . . . . . . . . 92
viii3 Indexing Methods 97
3.1 B-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2 KD-B-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.3 R-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4 Grid Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4 Transformations on Time Series 114
4.1 GEMINI Framework . . . . . . . . . . . . . . . . . . . . . . . . 117
4.2 Shifting and Scaling. . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3 Time Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Local Dynamic Time Warping . . . . . . . . . . . . . . . . . . . 129
II Case Studies 135
5 StatStream 136
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.2 Data And Queries . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.3 Statistics Over Sliding Windows . . . . . . . . . . . . . . . . . . 142
5.4 StatStream System . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.5 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6 Query by Humming 173
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.3 Architecture of the HumFinder System . . . . . . . . . . . . . . 179
ix6.4 Indexing Scheme for Dynamic Time Warping . . . . . . . . . . . 185
6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7 Elastic Burst Detection 206
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
7.2 Data Structure and Algorithm . . . . . . . . . . . . . . . . . . . 211
7.3 Empirical Results of the OmniBurst System . . . . . . . . . . . 225
7.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . 238
8 A Call to Exploration 239
Bibliography 241
x

Be the first to leave a comment!!

12/1000 maximum characters.