p3-signature-families

Compute Family Signatures

p3-signature-families --gs1=FileOfGenomeIds
                      --gs2=FileOfGenomeIds
                      [--min=MinGs1Frac]
                      [--max=MaxGs2Frac]
   > family.signatures

This script produces a file in which the last field in each line
is a family signature. The first field will be the number of hits against Gs1,
and the second will be the number of hits against Gs2.

Parameters

col

Specifies the (1-based) column index or name of the genome ID column in the two genome input files. The default is 0, indicating the last colummn.

gs1

A tab-delimited file of genomes. These are thought of as the genomes that have a given property (e.g. belong to a certain species, have resistance to a particular antibiotic). If omitted, the standard input is used. The genome IDs must be in the last column.

gs2

A tab-delimited file of genomes. These are genomes that do not have the given property. If omitted, the standard input is used. The genome IDs must be in the last column. Any genomes present in the gs1 set will be automatically deleted from this list.

min

Minimum fraction of genomes in Gs1 that occur in a signature family (default 0.8).

max

Maximum fraction of genomes in Gs2 that occur in a signature family (default 0.2).

verbose

Write progress messages to STDERR.