4 Nov 2009 mcx q 1.008, 09-308
mcxquery — compute simple graph statistics
mcxq is not in actual fact a program. This manual page documents the behaviour and options of the mcx program when invoked in mode q. The options -h, --apropos, --version, -set, --nop, -progress <num> are accessible in all mcx modes. They are described in the mcx manual page.
mcxquery [-imx <fname> (specify matrix input)] [-o <fname> (output file name)] [-vary-threshold <start,end,step,scale> (analyze graphs at similarity cutoffs)] [--vary-correlation (analyze graphs at correlation cutoffs)] [-div <num> (cluster size separating value)] [-report-scale <num> (edge weight/threshold scaling)] [--dim (report native format and dimensions)] [-h (print synopsis, exit)] [--apropos (print synopsis, exit)] [--version (print version, exit)]
The main use of mcxquery is to analyze a graph at different similarity cutoffs. Typically this is done on a graph constructed using a very permissive threshold. For example, one can create a graph from array expression data using mcxarray with a very low pearson correlation cutoff such as 0.2 or 0.3. Then mcxquery can be used to analyze the graph at increasingly stringent thresholds of 0.25, 0.30, 0.35 .. 0.95. Attributes supplied across different thresholds are the number of connected components, statistics (median, average, iqr) on node degrees and edge weights, and a graph plotting the R^2 value of the relationship of log(k) versus the logarithm of the number of nodes of degree at least k (for the graph at a given threshold). Scale-free networks are defined by having a high R^2 value. It should be noted however that in many applications graphs will not be scale-free. Additionally, for the purpose of clustering scale-free networks are to be avoided or transformed, as the highly-connected nodes in scale-free networks obfuscate cluster structure.
The file name for input that is in mcl native matrix format.
Set the name of the file where output should be written to.
This will report the matrix format (either interchange or binary) and the matrix dimensions. For a graph the two reported dimensions should be equal.
All of start, end, step and scale must be integer numbers. From these a list of threshold is constructed, starting from start / scale, (start + step) / scale, (start + 2 step) / scale, and so on until a value larger than or equal to end / scale is reached.
This instructs mcxquery to use a threshold list suitable for use with graphs in which the edge weight similarities are correlation. The list start at 0.2 and ends at 0.95, with increments of 0.05. If a different start or increment is required it can be achieved by using the -vary-threshold option. For example, a start of 0.10 and an increment of 0.02 are obtained by issueing -vary-threshold 10,100,2,100.
The edge weights mean, average, and inter-quartile range, as well as the different threshold steps are all rescaled in the reported output to avoid printing of fractional part. If -vary-threshold was supplied then scaling factor specified in the argument is used. With --vary-correlation a scaling factor of 100 is used. Either can be overridden by using the present option.
When analyzing graphs at different thresholds with one of the options above, mcxquery reports the percentage of nodes contained in clusters not exceeding a specified size, by default 3. This number can be changed using the -div option.
mcxio, and mclfamily for an overview of all the documentation and the utilities in the mcl family.