From: beard Subject: RESEARCH: Data Quality Visualization, NCGIA, US EPA, Soil Conservation Svc., Am Statistical Assn Date: Tue, 31 Dec 1992 15:53:45 LCL Message-ID: <9212291414.AA14092@grouse.umesve.maine.edu> Crossposted from comp.infosystems.gis CALL FOR PARTICIPATION VISUALIZATION OF SPATIAL DATA QUALITY CHALLENGE sponsored by National Center for Geographic Information and Analysis U. S. Environmental Protection Agency, Center for Environmental Statistics USDA Soil Conservation Service Statistical Graphics Section of the American Statistical Association This announces an open invitation to participate in a challenge to develop techniques for visualizing spatial data quality. The challenge is sponsored by the National Center for Geographic Information and Analysis (NCGIA) along with the U. S. Environmental Protection Agency, Center for Environmental Statistics; the USDA Soil Conservation Service (SCS); and the Statistical Graphics Section of the American Statistical Association (ASA). The intent of the Challenge is to provide a catalyst for experimental research on effective ways of managing and communicating the quality of spatial data to users of geographic information systems. As geographic information systems (GIS) are now widely used to analyze data and make policy decisions, techniques to help understand and communicate data quality have become important. This challenge will provide an opportunity for exchange among researchers from the disciplines of geography and cartography, statistics, computer science, engineering, and the scientific visualization community. The challenge will run for approximately one year. Tentatively the annual GIS conference GIS/LIS '93, will serve as the concluding event. Challenge participants are invited to submit papers and posters (other display media such as interactive demos or videos are also possible) of their visualization techniques at the concluding conference. Best Data Quality Visualization awards, in the form of certificates of recognition will be presented at the conference. In addition a set of these visualization techniques will be selected for publication in a special journal issue. The challenge is open to any interested parties and there are several ways to participate in the challenge. These include: 1. Develop a prototype visualization technique which communicates one or more aspects of geographic data quality (see References). This should indicate what data quality components were modeled and how they were linked to a visual model. 2. Implement a visualization technique which communicates one or more aspects of geographic data quality (see References). The implementation may be stand alone software, software which links to an existing GIS, or to another existing software package. Challenge awards will presented for: #165# Best Overall Data Quality Visualization #165# Best Student Entry Challenge Rules and Specifications Two data sets consisting of original observations and metadata (information on collection techniques, instrumentation, and processing or compilation procedures) will be made available. Environmental Protection Agency data set The US EPA Chesapeake Bay Program Office will supply this data set. The data set includes the concentrations of dissolved inorganic nitrogen (DIN) measured at 49 stations in the main stem of the Chesapeake Bay. Since nitrogen is an essential phytoplankton nutrient, dissolved inorganic nitrogen concentration is often used to evaluate the condition of nitrogen limitation on algal growth. The sample collection period is October, 1985 through September, 1991. Data are collected 20 times each year (1991 to present data are collected 18 times a year), biweekly in March through October, and monthly thereafter. Two or four water samples are collected at fixed depths at each station for DIN analysis. The complete data record includes the latitude and longitude of the field station, date, depth of sampling, and the concentration of dissolved inorganic nitrogen. If each observation is a vector V=(x,y,z,t,DIN) where x,y are the horizontal coordinates in latitude and longitude, z is depth in the water column, t is date and time, and DIN is the response variable, one may choose to display the uncertainty in DIN in: (1) Time (2) Space (3) An unknown space, time and/or depth, using model specification to interpolate, forecast or predict DIN; or any combination thereof. Alternatively, the uncertainty in the thematic component (i.e. DIN) calculated using univariate descriptive statistics could be an approach taken. Ancillary data include digital boundary files of the Chesapeake Bay, descriptions of the data formats and associated metadata. The complete data package is available from the US EPA as a four diskette set in ASCII DOS format or as a set of files available on Internet via anonymous ftp from ipc1.was.epa.gov (134.67.240.16) in directory pub/chesapeake_bay. To request the data on diskette or to ask questions regarding the data please contact Judy Calem at the US EPA Center for Environmental Statistics at calem.judy@epamail.epa.gov or at (202) 260-8638. Soil Conservation Service data set This data set includes one 7.5 minute quad sheet (State College) of soil data from Centre County (central) Pennsylvania. The soils data take the form of irregularly shaped polygons which correspond to different soil types in the field. The field mapping for this data was carried out between 1965 and 1974 and soil names and descriptions were approved in 1975. The soil survey was published in 1981 at a scale of 1:20,000 on 1977 orthophotography. The soil maps were converted to digital form (SSURGO- Soil Survey Geographic Database) in the mid-1980s. The digital files were prepared by manually digitizing 1:20,000 scale scribe coats. (Scribe coats are traditional negative preparation products used to separate the soil boundary delineations from the half-tone image of the orthophotography). Uncertainty issues associated with this data set include the locational uncertainty of soil map unit boundaries, the differential uncertainty of boundaries (slope breaks vs. profile differences), and the uncertainty of the soil descriptions associated with each soil map unit. Additional uncertainty can be generated by the digital conversion process and subsequent GIS processing and analysis. Ancillary data: USGS digital elevation data (DEM) are available from USGS for this same quadrangle. These data can be requested from SCS in DLG, GRASS or ARC/Info export format. Distribution will be by DOS HD/2S diskettes unless otherwise arranged with SCS. For further information regarding these data sets, participants should contact Sharon Waltman at the National Soil Survey Center, USDA Soil Conservation Service, 100 Centennial Mall North Room 152, Lincoln NE, Internet: sgis@calmit.unl.edu. Participants are required to use these data sets in the development and presentation of their visualization concepts or implementations. The primary emphasis will be on displaying the quality of the data or the quality/reliability of products generated from the data. The data sets will be available as of November 1, 1992. The agencies supplying the data sets (EPA and SCS) request that all visualization concepts and executable code be made available for agency use. All software rights, however, remain with the developers unless they choose to make them public. Instructions for challenge participants To enter the challenge participants should: #165# Obtain and preview a challenge data set(s). #165# Register (be put on the Challenge mailing list) and submit a one-page proposal of the visualization project. The proposal should include a general description, objective, and scope of the proposed implementation, such as anticipated software development environment, hardware platform, and integration with any supporting software. Proposals should be submitted by February 5, 1993. A list of participants will be distributed by February 15, 1993. #165# Submit final project reports by July 19, 1993. These should not exceed 12 pages, should describe the data quality visualization implementation and include appropriate graphics. #165# Challenge winners will be notified by mid August 1993. #165# Final challenge projects will be displayed and challenge awards presented at GIS/LIS, November 1993. Implementation proposals and final papers should be sent to : Kate Beard National Center for Geographic Information and Analysis University of Maine Orono, ME 04469 Phone: (207) 581-2147 Fax: (207) 581-2206 email: beard@mecan1.maine.edu A panel of judges from the Soil Conservation Service, the Environmental Protection Agency, and from the areas of Geography, Cartography, Statistics, and Computer Science will review and judge the entries. Suggested problem areas The following paragraphs describe a number of possible contexts for data quality visualization in addition to the more specific quality issues described above for each data set. These are offered purely as suggestions and participants need not feel constrained by these suggestions. #165##202#Visualization of lineage information. This could cover ways of accessing and visualizing the narrative information describing a data set, such as how the data were collected, over what time period, using what instrumentation, by whom, and how they were compiled and updated. #165# Visualization of data processing errors: GIS processes can introduce errors. For example fuzzy polygon overlay can introduce changes in the positional accuracy of the data, generalization or aggregation can introduce attribute errors or changes in resolution. Visualization methods could be developed to track and document errors generated by specific GIS processes. #165##202#Visualization of product quality. This would include methods for documenting the quality of final products generated by a GIS. This could include visualization techniques for documenting quality on hardcopy products. For additional information on the topic, participants may request a copy of the NCGIA Technical Report 91-26 Report on the Specialist Meeting for I7- Visualization of Data Quality. This report is available via anonymous ftp from ncgia.ucsb.edu in directory pub/tech-reports/postscript. Additional references Beard M. K, and B. Buttenfield. 1992. Spatial, Statistical, and Graphical Dimensions of Data Quality. Proceedings Interface '92. Chrisman, N. R. 1983. The Role of Quality Information in the Long-term Functioning of a Geographic Information System. Cartographica 21 (2/3): 79-87. Clapham, S. and M. K. Beard, 1991. The Development of an Initial Framework for the Visualization of Spatial Data Quality. Proceedings of ACSM 2: 73-82. Defanti, T. A., M. D. Brown and B. H. McCormick. 1989. Visualization: Expanding the Scientific and Engineering Research Opportunities. Computer 22:6 27-38. Goodchild, M. and S. Gopal. 1989. Accuracy of Spatial Databases. New York: Taylor and Francis. Lanter, D. P. 1991. Design of a Lineage-Based Meta-Data Base for GIS Cartography and Geographic Information Systems. 18:4 255-261. Lanter, D. P. and Veregin, H. 1992. A Research Paradigm for Propagating Error in Layer Based GIS. Photogrammetric Engineering and Remote Sensing. 58:6 825-833. Moellering, H. 1988. The Proposed Standard for Digital Cartographic Data: Report of the Digital Cartographic Data Standards Task Force. The American Cartographer, 15:1 (entire issue). Veregin, H. 1989. A Taxonomy of Error in Spatial Databases 89-12 National Center for Geographic Information and Analysis.