2D Scatterplot

2D Scatterplot produces scatterplots of one data set vs. another. The output is a 256x256 scatterplot; the data values of each of the input data bases is binned to 256 (or fewer) levels and each pixel within the scatterplot is filled in if there exists a pixel in the two data sets with the horizontal value in the first data set and the vertical value in the second.

The color of the pixel is normally derived from the position of the last found fitting data location. The databases are scanned bottom-to-top, then left-to-right. Green represents the lower left corner of the datasets, red the upper left, cyan the lower right, and magenta the upper right. To see more detail, depress and hold down the Color button on the application.

Alternatively, the scatterplot can display a color based on the number of data values fitting into the bin. To get this type of display, toggle the Frequency/Position button.

Scatterplot also calculates statistical information about the data sets. Several choices of information can be made; see below for more details on that.

Controls

Buttons
The Color button needs to be held down to display the color legend. Release it to return to the normal display.

Save Stats stores all the statistical information computed in a file (stats.dat).

Frequency is a toggle between the default display in which pixels are colored by original position within the data sets and a histogram-like frequency coloring.

Menus
The Stats menu allows one to choose between several sets of statistics displays.

"None" is the fastest; no statistics are computed or displayed. This is most useful for interactive scatterplot work. "Basic" stats are the default; they include: means and standard deviations for each of the data sets, and the correlation coefficient (R) between the two data sets. "X moments" shows the mean, standard deviation, average deviation, skew, and kurtosis for the horizontal data set. "Y moments" does the same for the vertical one. "Linear Regression" does a least-squares fit to the data and displays the correlation coefficient, the slope and y-intercept of the fitted line (and error bars on it) as well as the number of points in the sample and the sum of the squares of the residuals (SSR) (useful in doing one's own stats). See below for definitions of all these statistics.

The main menu allows one to control a bounding box in the application and to control how sliders interact with the application. The bounding box controls are similar to those in Image except that no crosshair exists.

"Freeze Horizontal Slice" and "Freeze Vertical Slice" are switches that, when on, prevent a database's slice selection from changing. This is most useful when scattering a database against itself; one need only link in one slider, freeze an axis at the desired level, then change the other one.

Applications to Connect to This Application

Any Data Object may be connected via the "drag 'n drop" button to this application.

Any application with a bounding box (e.g. Image) will allow 2D Scatterplot to limit its scope to a portion of a data set. These objects must be connected via the "drag 'n drop" button to scatterplot.

Color tool will change the colors of the display in Frequency mode, as will Histogram.

Sliders will control which slices are used to compute the scatterplot. If one slider is linked in, that slider will control each data set's level. If two are linked in, they will control the two data set levels independently.

Applications to Connect This Application to

Image has a special function when a 2D Scatterplot is linked in. It will draw a yellow highlight over any data that contributes to scatterplot pixels within Scatterplot's bounding box. Value View will do the same thing.

Tricks and Gotchas

Be careful about saving data. It is always saved to the same file, so copy it if you want to keep more than one set of statistical data.

Appendix: Definitions of Statistics

Let {xi} be the points in set X.
Let nx be the number points in set X.
Mean(X) = Sumi{xi}/nx.
Define dxi = xi - Mean(X).
Define SSR(X) = Sumi(dxi2)
Basic Stats:
Sigma = Sqrt(SSR(X)/(nx-1))
Correlation coefficient R = Sumi(dxi * dyi))/ Sqrt(SSR(X) * SSR(Y))
Moments:
1st: Mean
2nd: sigma
3rd: skew = Sumi(dxi3)/ (nx * sigma3)
4th: kurtosis = Sumi(dxi4)/ (nx * sigma4) - 3
Ave. Deviation = Sumi(Abs(dxi))/nx
Correlation Coefficient:
slope = Sumi(dxi * dyi))/SSR(X)
y-intercept = Mean(Y) - slope * Mean(X)

WebWinds Home / Oct 5, 2001