4 Nov 2009    mcx q 1.008, 09-308

1.
NAME
2.
SYNOPSIS
3.
DESCRIPTION
4.
OPTIONS
5.
SEE ALSO

NAME

mcxquery — compute simple graph statistics

SYNOPSIS

mcxq is not in actual fact a program. This manual page documents the behaviour and options of the mcx program when invoked in mode q. The options -h, --apropos, --version, -set, --nop, -progress <num> are accessible in all mcx modes. They are described in the mcx manual page.

mcxquery [-imx <fname> (specify matrix input)] [-o <fname> (output file name)] [-vary-threshold <start,end,step,scale> (analyze graphs at similarity cutoffs)] [--vary-correlation (analyze graphs at correlation cutoffs)] [-div <num> (cluster size separating value)] [-report-scale <num> (edge weight/threshold scaling)] [--dim (report native format and dimensions)] [-h (print synopsis, exit)] [--apropos (print synopsis, exit)] [--version (print version, exit)]

DESCRIPTION

The main use of mcxquery is to analyze a graph at different similarity cutoffs. Typically this is done on a graph constructed using a very permissive threshold. For example, one can create a graph from array expression data using mcxarray with a very low pearson correlation cutoff such as 0.2 or 0.3. Then mcxquery can be used to analyze the graph at increasingly stringent thresholds of 0.25, 0.30, 0.35 .. 0.95. Attributes supplied across different thresholds are the number of connected components, statistics (median, average, iqr) on node degrees and edge weights, and a graph plotting the R^2 value of the relationship of log(k) versus the logarithm of the number of nodes of degree at least k (for the graph at a given threshold). Scale-free networks are defined by having a high R^2 value. It should be noted however that in many applications graphs will not be scale-free. Additionally, for the purpose of clustering scale-free networks are to be avoided or transformed, as the highly-connected nodes in scale-free networks obfuscate cluster structure.

OPTIONS

-imx <fname> (input matrix)

The file name for input that is in mcl native matrix format.

 
-o <fname> (output file name)

Set the name of the file where output should be written to.

 
--dim (report native format and dimensions)

This will report the matrix format (either interchange or binary) and the matrix dimensions. For a graph the two reported dimensions should be equal.

 
-vary-threshold <start,end,step,scale> (analyze graphs at similarity cutoffs)

All of start, end, step and scale must be integer numbers. From these a list of threshold is constructed, starting from start / scale, (start + step) / scale, (start + 2 step) / scale, and so on until a value larger than or equal to end / scale is reached.

 
--vary-correlation (analyze graphs at correlation cutoffs)

This instructs mcxquery to use a threshold list suitable for use with graphs in which the edge weight similarities are correlation. The list start at 0.2 and ends at 0.95, with increments of 0.05. If a different start or increment is required it can be achieved by using the -vary-threshold option. For example, a start of 0.10 and an increment of 0.02 are obtained by issueing -vary-threshold 10,100,2,100.

 
-report-scale <num> (edge weight / threshold scaling)

The edge weights mean, average, and inter-quartile range, as well as the different threshold steps are all rescaled in the reported output to avoid printing of fractional part. If -vary-threshold was supplied then scaling factor specified in the argument is used. With --vary-correlation a scaling factor of 100 is used. Either can be overridden by using the present option.

 
-div <num> (cluster size separating value)

When analyzing graphs at different thresholds with one of the options above, mcxquery reports the percentage of nodes contained in clusters not exceeding a specified size, by default 3. This number can be changed using the -div option.

SEE ALSO

mcxio, and mclfamily for an overview of all the documentation and the utilities in the mcl family.