A formula is derived for the exact computation of Bagging classifiers when the base model adopted is k-Nearest Neighbour (k-NN). The formula, that holds in any dimension and does not require the extraction of bootstrap replicates, proves that Bagging cannot improve 1-Nearest Neighbour. It also proves that, for k > 1, Bagging has a smoothing effect on k-NN. Convergence of empirically bagged k-NN predictors to the exact formula is also considered. Efficient approximations to the exact formula are derived, and their applicability to practical cases is illustrated.
The AdaBoost algorithm is one of the most successful classification methods in use. While the algorithm largely preserves its general and practical applicability, theoretical and experimental work shows that AdaBoost can overfit when it is applied to noisy data.In this paper, a procedure is proposed for bias-variance control when the AdaBoost algorithm is employed in classification tasks. The method is based on an earlier notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. More specifically, the procedure consists in sorting data points by hardness, and in progressively eliminating the hardest among them from the data set. Effectiveness of the method is tested and discussed on synthetic as well as natural data.
The Geographic Resources Analysis, GRASS, a general purpose GIS originally developed by U.S. Army Corps of Engineers Laboratory, has grown into one of the main components of Open Source and Free Software geospatial computational infrastructure. Current developments led by international team of programmers, focus on improving the 2D and 3D raster and vector data processing and analysis tools and 3D visualization capabilities in the wake of publishing of the code under GPL in 1999. Applications in the area of epidemiology, coastal management and water flow modelling provide a snapshot of the capabilities.
In epidemiological modeling, survey data are usually collected at sampling sites and then regionalized within Geographical Information Systems (GIS). To enhance the data density, continuous field data such as land surface temperatures (LST), snow coverage, vegetation indices are commonly derived from satellite data. The recent launches of the new satellite systems Terra and Aqua significantly improve the situation of data availability for scientific purposes and epidemiological studies and predictions. The most interesting sensor onboard is MODIS which daily delivers two global coverages at 250m (Red, NIR), 500m (MIR) and 1000m resolution (TIR).The paper focuses on two of the numerous MODIS data products: Land Surface Temperatures (LST), and vegetation index 16-day composites.
The integration of MODIS satellite data into a GIS requires several pre-processing steps, such as the reprojection from MODIS-ISIN or MODIS-SIN projections to another more common projection (UTM, national coordinate systems etc.). The resulting maps are filtered pixelwise by applying the related quality maps which are provided along the data products. Due to limitations in the official cloud detection algorithm used to create these land surface temperature quality maps, an outlier detection has been implemented. Based on the scene statistics, this outlier filter aims at removing all pixels which contain cloud temperatures instead of the desired land surface temperatures.
Another set of MODIS time series data are NDVI and EVI vegetation indices. They can be implemented into epidemiological models to introduce vegetation dynamics. The 16-day composite product minimizes cloud cover and reflects at a sufficient temporal resolution the current vegetation status.
The integration of MODIS data into epidemiological research enhances the spatio-temporal resolution of climatological data in particular in mountainous regions. The study area, a region of approximately 20000 sqkm, is of complex terrain with elevation ranging from nearly sea level to 3800 meters with a varying density of meteorological stations.
The recent implementation of general time series processing for GRASS raster maps supports univariate statistics for a series of MODIS scenes. By selecting various time ranges and operators, a number of indicators can be calculated. The comparison of LST with ground truth time series from climatic stations showed that the LST match quite well with ground temperatures. While surface and aerial temperatures differ by definition, it is possible to transform surface to aerial temperatures by a regression model. Results and comparisons will be presented in the paper.
In this paper we present a suite of new image processing tools for GRASS. These new programs provide support for image geocoding and image fusion. Moreover, multi- and hyperspectral image analysis has been implemented to derive landuse/landcover maps at subpixel resolution.PART I
The module 'i.linespoints' allows for image registration by defining ground control points as well as corresponding lines. The integration of lines into the registration procedure supports accelerated and simplified search of corresponding structures in source and target images. The resulting table of ground control points is provided as input to the new rectification tool 'i.homography'.
A new module 'i.coregister' provides an alternative semiautomated approach to find corresponding points in two overlapping images. In order to obtain a good registration accuracy, first two regions are roughly indicated on screeen, with very general requirements to image dimensions and overlapping zone characteristics. Given the matching region, the algorithm defines dynamic search windows and computes the cross-correlation function within subwindows. Based on the Fast Fourier Transform, the maximum correlation value delivers the positions of the GCPs, which are saved into the common POINTS structure for later use with 'i.rectify'. The list of GCPs created by above modules can optionally be converted into the POINTS structure of 'i.ortho.photo' by a new script 'i.points2orthophoto.sh'.
A new application of the 'i.ortho.photo' algorithm is proposed for the registration of oblique imagery as produced by hand-held digital cameras. The underlying idea is to improve the visual perception of perspective rendering based on orthophotos. While oblique rendering using a digital elevation model and orthophotos usually suffers from perspective displacements, we show that digital photos even taken by a cheap digital camera can be geocoded and used to improve the visual impression.
PART II
In the next part of this paper, we present two methods related to multi- and hyperspectral cameras. Spectral angle mapping has been implemented in the new module 'i.spectral.sam'. The algorithm is calculating for a set of bands the angles to a set of object spectra read from a spectral library.
Spectral unmixing for landuse/landcover mapping at subpixel precision has been implemented in the module 'i.spectral.unmix'. Multi- and hyperspectral data can be analysed against a spectral library. Instead of single resulting map as received from common classification algorithms, here as many abundance maps as object spectra are generated.
A new script 'i.fusion.brovey' has been written to support PAN sharpening of multispectral satellites such as LANDSAT-7, QuickBird and SPOT. The algorithm performs Brovey transform image fusion of the high resolution panchromatic channel with the multispectral channels at lower resolution.
Finally, we will show a high performance solution for image classification in GRASS at meso-scale and high spatial resolution. A script-based approach to run standard GRASS on an openMosix cluster (20 PCs, 40 CPUs) has been implemented to classify multispectral color orthophotos with SMAP algorithm. The study area covers approximately 6200 square kilometers, the resolution of the orthophotos is at one meter per pixel. In tests, the required time to analyse 280 orthophotos at the given resolution was reduced from estimated 118 days on a single CPU to 5 days on the openMosix cluster.
Since the first edition of Open Source GIS: A GRASS GIS Approach was published in 2002, GRASS has undergone major improvements. This second edition includes numerous updates related to the new development; its text is based on the GRASS 5.3 version from December 2003. Besides changes related to GRASS 5.3 enhancements, the introductory chapters have been re-organized, providing more extensive information on import of external data. Most of the improvements in technical accuracy and clarity were based on valuable feedback from readers.Open Source GIS: A GRASS GIS Approach, Second Edition, provides updated information about the use of GRASS, including geospatial modeling with raster, vector and site data, image processing, visualization, and coupling with other open source tools for geostatistical analysis and web applications. A brief introduction to programming within GRASS encourages new development. The sample data set used throughout the book has been updated and is available on the GRASS web site. This book also includes links to sites where the GRASS software and on-line reference manuals can be downloaded and additional applications can be viewed. Open Source GIS: A GRASS GIS Approach, Second Edition is designed for a professional audience, composed of researchers and practitioners in government and industry. This book is also suitable as a secondary text for graduate-level students in geomatics, computer science and geosciences.
The tick Ixodes ricinus has been recorded in most Italian regions especially in thermo-mesophilous woods and shrubby habitats where the relative humidity allow the tick to complete its 3 year developmental cycle, as predicted for the European climatic ranges. This tick acts both as vector and reservoir for a series of wildlife zoonotic pathogens, especially the agents of Lyme diseases, Tick borne encephalitis and Human Granulocytic Ehrlichiosis, which are emerging in most of Europe. To assess the spatial distribution of these pathogens and the infection risk for humans and animals within the territory of the Province of Trento, we carried out a long term study using a combination of eco-epidemiological surveys and mathematical modelling. An extensive tick collection with a GIS based habitat suitability analysis allowed us to identify the areas where tick occurs at various density. To identify the areas with higher infection risk, we estimated the values of R0 for Borrelia burgdorferi s.l., TBE virus and Anaplasma phagocytophila under different ecological conditions. We assessed the infection prevalence in the vector and in the wildlife reservoir species that play a central role in the persistence of these infections, ie the small mammals A. flavicollis and C. glareolus. We also considered the double effect of roe deer (Capreolus capreolus) which act as reservoir for A. phagocytophila but is an incompetent host for B. burgdorferi and TBE virus, thus reducing the infection prevalence in ticks of these last two pathogens. Infection prevalence with B. burgdorferi and A. phagocytophila in the vector was assessed by PCR screening 1212 I. ricinus nymphs collected by dragging in six main study areas during 2002. The mean infection prevalence recorded was 1.32% for B. burgdorferi s.l. and 9.84% for A. phagocytophila. Infection prevalence in nymphs with TBE virus, as assessed in a previous study was 0.03%. Infection prevalence in rodents was assessed by screening (with ELISA and PCR) tissues and blood samples collected from 367 rodent individuals trapped extensively during 2002 within 6 main study areas. A. flavicollis (N=238) was found to be infected with all three pathogens investigated, with infection prevalence ranging from 3.3% for TBE virus to 11.7% for A. phagocytophila, and 16.6% with B. burgdorferi s.l. C. glareolus (N=108) showed an infection prevalence of 6.5% with A. phagocytophila and 12.7% with B. burgdorferi s.l., while no individuals were infected with TBE virus. We also screened 98 spleen samples collected from roe deer with PCR, resulting in a mean prevalence of infection with A. phagocytophila of 19.8%. Using a deterministic model we explored the condition for diseases persistence under different rodent and roe deer densities. R0 values resulted largely above 1 for B. burgdorferi s.l. in the vast majority of the areas classified as suitable for I. ricinus occurrence in Trentino, while the condition for TBE persistence appeared to be more restricted by a combination of climatic condition and host densities.
Recent developments of the communication technologies in the last years opened a new dimension to Geographical Information Systems and Geoinformation Technologies. This new dimension is mobility. It is simplifying data gathering, processing and presentation independent from the area of application. A new branch, Mobile Geoinformation Technologies, is based on wireless communication systems, mobile personal computers, positioning systems and GIS. There are some proprietary GIS software solutions for mobile or handheld devices available on the market, but they are more focused on data logging tasks than providing full powered GIS support or data processing functions. In this paper, we propose a mobile implementation of the free and easily expandable GRASS GIS software in combination with the GNU/Linux operating system run on handheld devices. This approach supports real time in the field computations, data processing and cooperation of several active mobile clients using wireless networking.