Classification and regression with functional data: a mathematical optimization approach.
Licencia: Creative Commons (by-nc-nd)
Autor(es): Jiménez, María
TThe goal of this PhD dissertation is to develop new approaches for supervised classification and regression in Functional Data Analysis. articularly, the Mathematical optimization tools analyzed in this thesis exploit the functional nature of the data, leading to novel strategies which may outperform the standard methodologies and link mathematics with real-life applications. Chapter 1 presents the main ideas, challenges and the notation used in this thesis. Chapter 2 addresses the problem of selecting a finite set of time instants which best classify multivariate functional data into two predefined classes. Using, not only the information provided by the function itself but also its high-order derivatives will be crucial to improve the accuracy. To do this, a continuous bilevel optimization problem is solved. Such problem combines the resolution of the well-known technique SVM (Support Vector Machine) with the maximization of the correlation between the class label and the score. Chapter 3 also focuses on the binary classification problem using SVM. However, instead of finding the most important time instants, here we define a functional bandwidth in the so-called kernel function. In this way, accuracy may be improved and the most relevant intervals of the domain of the function, according to their classification ability, are identified, enhancing the interpretability. A bilevel optimization problem is formulated and solved by means of an alternating procedure. Chapter 4 is focused on classifying the so-called hybrid functional data, i.e., data which are formed by functional and static (constant over time) covariates. The goal is to select the features, functional or static, which best classify. An anisotropic kernel which associates a scalar bandwidth to each feature is defined. As in previous chapters, an alternating approach is proposed to solve a bilevel optimization problem. Chapter 5 generalizes the variable selection problem presented in Chapter 2 to regression. The solution approach combines the SVR (Support Vector Regression) problem with the minimization of sum of the squared residuals between the actual and predicted responses. An alternating heuristic is developed to handle such model. All the methodologies presented along this dissertation are tested in synthetic and real data sets, showing their applicability.
[2019]
Compartir:
Una vez que el usuario haya visto al menos un documento, este fragmento será visible.