Simulated trait and spectroscopy data to support retrieval of forest biophysical parameters from spaceborne imaging spectroscopy
Authors | |
---|---|
Year of publication | 2024 |
Type | Conference abstract |
Citation | |
Description | Retrieving forest variables from spaceborne imaging spectroscopy data is challenging due to natural variability in species composition, 3D canopy structure, and phenology. To develop robust, reliable, and fully operational retrievals of high-quality vegetation products from future hyperspectral satellite missions (e.g., CHIME, SBG), field or simulated forest trait data and spectral signatures that capture the potential variability of natural forests are crucial. We present a simulated dataset, so called look-up tables (LUT), for Central European temperate broadleaf forests, demonstrating its potential for machine learning approaches. The dataset was simulated using the 3D Discrete Anisotropic Radiative Transfer (DART) model. Detailed virtual forest scenes, down to the individual leaf level, were generated from terrestrial laser scans of real trees, covering an area of 30 by 30 meters. Leaf-level trait variations and simulations of 2000 leaf-level optical properties were performed using PROSPECT PRO. Canopy reflectance simulations for three different canopy covers, eight LAI levels, nine sun zenith angles, and twelve azimuth geometries were conducted in DART-Lux version 5.10.0, resulting in approximately 3.5M unique combinations. The resulting images were processed into two databases: one containing the reflectance of the entire forest scene and the other containing only reflectance from sunlit pixels. This dataset will be opened to the research community for testing and to support the development of high-level vegetation products from spaceborne imaging spectroscopy data. The optimal amount of training data for machine learning models is not clearly established, but these methods generally benefit from large data volumes. A common guideline is to have at least ten times as many training data points as the number of features. For deep learning, even more data is typically required. Establishing a scalable data collection pipeline is essential. For tasks such as predicting biophysical parameters of vegetation, high-quality data representative of true vegetation conditions is crucial. We explore the quality of LUT and their potential to augment or substitute in-situ measurements. We examine the data characteristics and models that yield the highest prediction accuracy, including preprocessing steps (e.g., normalization, data space transformation) and hyper-parameter selection. We evaluate three data inputs: 1) a limited (<100 data points; not scalable) set of in-situ training data, 2) a dataset closely resembling in-situ data (1000-10k data points) formed using domain expertise and similarity metrics, and 3) training on the entire simulated dataset (>3M data points). We assess the best method and provide recommendations for including LUT in a training pipeline. |
Related projects: |