Reading Formatted Data

Introduction

Webwinds can read data from formatted text files, provided that you give it a description of the file format.  This section describes two file types:  ASCII and Table. ASCII  data must be on a regular grid, while Table data can be randomly spaced. Table data can be represented either as gridded data or as individual points in the form of an Overlay.

Schema

In order to specify the characteristics of ASCII and Table data, we use a construct called schema when creating metadata. "Schema" is really just a block of text that describes the arrangement of data within a file. The structure of a schema block in the datamanager.txt configuration file is

            schema

                  statements
                    ...

            endschema

When schema are to be specified in Data Source Wizard or FormatDisplay, the keywords schema and endschema should be omitted as well as any quotes. The types of statements that are allowed depend on whether the schema describe Ascii or Table data (HDF-EOS Point files also use schema).

ASCII data

An example of a file containing ASCII data might be the printer output from a FORTRAN program diverted to a file. The data need to be ordered according as if an array were being printed.

Valid statements for ASCII data are labels, edit descriptors, and goto. Edit descriptors look very much like Fortran FORMAT statements. They are made up of lists of field descriptors, followed by a field width on the right, and preceded by a repeat count on the left. The current valid field descriptors are F and N. F denotes floating point fields, and N denotes integers.

Skip descriptors can also be included. The characters "x" or "X" indicate a blank space. The "/" character causes a line to be skipped. These descriptors can also be preceded by repeat counts.

Edit descriptors can be nested indefinitely within parentheses. Nested edit lists can also be preceded by repeat counts. Commas are allowed as delimiters between descriptors. Though they are not required, they can help you make the meaning of an edit descriptor more clear.

Input data sources can be described in the file datamanager.txt. For example

DataSource "N7t"
  file  ga840101.n7t
  format "Ascii"
  schema
        "3/"
        "Label1:"
        "11(x,25F3/)"
        "x,13F3/"
        "goto Label1"
  endschema
.
.
After the schema keyword, the first line is "3/". This tells WebWinds to skip three lines.

Next comes a label. "Label1:" is like a statement label in any programming language. It denotes a place that can be jumped to from a goto statement.

The next two statements are edit descriptors. The first one, "11(x,25F3/)", tells WebWinds read eleven lines, each containing a superfluous character, followed by 25 three-digit floating-point numbers. The "/" is necessary to make Webwinds move to the next line for input. The next descriptor, "x,13F3/" just reads a single line, skipping the first character, and reading thirteen three-digit floats.

Note that the two edit descriptors also work if they're combined into one descriptor, like this:

  schema
        "3/"
        "Label1:"
        "11(x,25F3/),x,13F3/"
        "goto Label1"
  endschema
The line "goto Label1" tells the input method to go back to repeat the edit descriptor block for the next block of input.
 

Table data

A table is a set of data arranged into rows and columns.  For instance, one might have a set of temperatures, scattered over a region on the face of the earth.  They might be arranged into a sequence of ordered triplets of longitude, latitude, and temperature, as in the fictitious example below.

 lon     lat    temperature (C)
-130.123  10.2   24.2
-132.304   2.1   27.0
-150.777   2.2   26.8
    .       .      .
    .       .      .
    .       .      .

As you can see, the data are arranged into columns and rows.  The columns are separated by white space.  Either blanks or tab characters are allowed.  The table reader expects that all the columns are full.  If there are any empty cells, the data that follows will be read into the wrong columns. Table data can be

Although both types of files contain formatted data, Tables are structurally quite different from ASCII files. Tables will, in general, have only certain columns that are to be read in. Although each column is treated as a distinct entity in WebWinds, the user specifies, via the schema statements which columns are to be read and of those columns which are to be treated as the metadata and which one (or ones) is the data.

As an example, lets say we have an ASCII table that has 3 columns. The first two columns are, respectively  longitude and latitude and the third column is temperature data.  The following schema
will cause the data to be read:

      schema
      "columns 3 1 2"
   endschema

This instructs the table reader to read columns 3, 1 and 2. It also instructs it to make the data in column 3 fit onto a grid generated by the data in columns 1 and 2. ( If the last two columns are reversed, then the axes will be reversed when an image is created from the data ). As was the case for ASCII data, the "/" descriptor causes a line to be skipped.

The above example would produce a 2-D data set. If more than 1 column contains data, the other columns can be added in to produce a 3-D data set. To do this, use the  concat descriptor.

Lets say that in the above example column 4 is data at some other altitude, depth, or some other third dimension element. The concat keyword can be used to instruct the table reader that more than one column is data. If the schema is like this:

       schema
      "columns concat ( 3 4 ) 1 2"
   endschema

The table reader will recognize that columns 3 and 4 are data. When the data is dropped into an Image, two slices will be rendered the first slice representing the data in column 3 and the second the data in column 4. If you want these reversed then reverse the numbers in the schema. Note that this approach allows different types of data slices to be stacked into one data cube.

The table reader can also generate a 3D data cube where all 3 axes are specified by dependent variables. For example, it will read four columns treating three columns as dependent variables (the X, Y, and Z axes, respectively). When the data are gridded, they will be gridded in three dimensions and the gridding algorithm will try to preserve aspect ratios that make sense. Lets say that column 4 is data in third dimension, use the following schema:

      schema
      "2/"
      "columns 5 2 1 4"
   endschema

Here, columns 2, 1 and 4 will be treated as metadata for the X, Y, and Z axes respectively. The data in column 5 will be gridded into a cube generated with the data in columns 2, 1 and 4 by the gridding algorithm.

Regridding

Most display tools in Webwinds expect data in uniform grids.  The Table data module computes a grid from the incoming axis data, and rebins the field data into that grid.

Overlays

Once you have created a Table data object, you can create an overlay representation by pressing the "Overlay" button. This will allow you to alter the size, color and shape of the overlay.

Tricks and Gotchas

Format

Subsetting and subsampling ASCII data is not possible at this time.

Speeding up the I/O

When reading, Webwinds tries to allocate a large buffer to minimize internal data transfers, but not one so large that all other activity in Webwinds appears to grind to a halt while I/O is being serviced.  When data are explicitly formatted, WebWinds computes a required buffer size from the repeat counts of your schema statements.  You can help Webwinds optimize its buffer by clever manipulation of these statements.

For example. Consider the following schema.

schema
    "start:"
    "3f24/"
    "goto start"
endschema
Each line in the data file holds three floating point numbers.   The Ascii reader does not try to be excessively clever about how much data could be read in.  It only processes one schema statement at a time, so it simply allocates a buffer with only three elements.

This example was actually used on a file containing only a million double precision numbers, and the resulting amount of interrupts for internal data transfers more  than tripled the time to read the file.  The entire file took over 10 minutes to read.

However, the buffer computation tallies repeat counts across nested descriptors, so for a simple format such as this, you could use a statement such as "(37(11(3f24))".  The buffer is now 37*11*3 = 1221 elements long.   In this case, numbers were chosen to be consistent with the data dimensions.  This change reduced the time to read this file to less that three minutes.

Standard Fortan defines the syntax for the F descriptor as Fw.m, where w is the full width of the field, and m is the width of the mantissa. Webwinds accepts this syntax, but currently does not process the mantissa width. The text-to-double converter that WebWinds uses keys on the decimal point, if it is available, thus processing as many significant digits as it can. Otherwise, it behaves as if m were equal to zero.
 


WebWinds Home / Oct 5, 2001