API Reference¶
sample_sheet
module¶
-
class
sample_sheet.
ReadStructure
(structure: str)[source]¶ Bases:
object
An object describing the order, number, and type of bases in a read.
A read structure is a sequence of tokens in the form
<number><operator>
where<operator>
can describe template, skip, index, or UMI bases.Operator Description T Template base (e.g. experimental DNA, RNA) S Bases to be skipped or ignored B Bases to be used as an index to identify the sample M Bases to be used as an index to identify the molecule Parameters: structure – Read structure string representation. Examples
>>> rs = ReadStructure("10M141T8B") >>> rs.is_paired_end False >>> rs.has_umi True >>> rs.tokens ['10M', '141T', '8B']
Note
This class does not currently support read structures where the last operator has ambiguous length by using
<+>
preceding the<operator>
.Definitions of common read structure uses can be found at the following location:
Discussion on the topic of read structure use in
hts-specs
:-
_sum_cycles_from_tokens
(tokens: List[str]) → int[source]¶ Sum the total number of cycles over a list of tokens.
-
has_indexes
¶ Return if this read structure has any index operators.
-
has_skips
¶ Return if this read structure has any skip operators.
-
has_umi
¶ Return if this read structure has any UMI operators.
-
index_cycles
¶ The number of cycles dedicated to indexes.
-
index_tokens
¶ Return a list of all index tokens in the read structure.
-
is_dual_indexed
¶ Return if this read structure is dual indexed.
-
is_indexed
¶ Return if this read structure has sample indexes.
-
is_paired_end
¶ Return if this read structure is paired-end.
-
is_single_end
¶ Return if this read structure is single-end.
-
is_single_indexed
¶ Return if this read structure is single indexed.
-
skip_cycles
¶ The number of cycles dedicated to skips.
-
skip_tokens
¶ Return a list of all skip tokens in the read structure.
-
template_cycles
¶ The number of cycles dedicated to template.
-
template_tokens
¶ Return a list of all template tokens in the read structure.
-
tokens
¶ Return a list of all tokens in the read structure.
-
total_cycles
¶ The number of total number of cycles in the structure.
-
umi_cycles
¶ The number of cycles dedicated to UMI.
-
umi_tokens
¶ Return a list of all UMI tokens in the read structure.
-
-
class
sample_sheet.
Sample
(data: Optional[Mapping] = None, **kwargs)[source]¶ Bases:
requests.structures.CaseInsensitiveDict
A single sample for a sample sheet.
This class is built with the keys and values in the
"[Data]"
section of the sample sheet. As specified by Illumina, the only required keys are:"Sample_ID"
Although this library recommends you define the following column names:
"Sample_ID"
"Sample_Name"
"index"
If the key
"Read_Structure"
is provided then its value is promoted to the classReadStructure
and additional functionality is enabled.Parameters: - data – Mapping of key-value pairs describing this sample.
- kwargs – Key-value pairs describing this sample.
Examples
>>> mapping = {"Sample_ID": "87", "Sample_Name": "3T", "index": "A"} >>> sample = Sample(mapping) >>> sample Sample({'Sample_ID': '87', 'Sample_Name': '3T', 'index': 'A'}) >>> sample = Sample({'Read_Structure': '151T'}) >>> sample.Read_Structure ReadStructure(structure='151T')
-
class
sample_sheet.
SampleSheet
(path: Union[pathlib.Path, str, TextIO, None] = None)[source]¶ Bases:
object
A representation of an Illumina sample sheet.
A sample sheet document almost conform to the
.ini
standards, but does not, so a custom parser is needed. Sample sheets are stored in plain text with comma-seperated values and string quoting around any field which contains a comma. The sample sheet is composed of four sections, marked by a header.Title name Description [Header]
.ini
convention[<Other>]
.ini
convention (optional, multiple, user-defined)[Settings]
.ini
convention[Reads]
.ini
convention as a vertical array of items[Data]
table with header Parameters: path – Any path supported by pathlib.Path
and/orsmart_open.smart_open
when smart_open is installed.-
add_sample
(sample: sample_sheet.Sample) → None[source]¶ Add a
Sample
to thisSampleSheet
.All samples are validated against the first sample added to the sample sheet to ensure there are no ID collisions or incompatible read structures (if supplied). All samples are also validated against the
"[Reads]"
section of the sample sheet if it has been defined.The following validation is performed when adding a sample:
Read_Structure
is identical in all samples, if suppliedRead_Structure
is compatible with"[Reads]"
, if supplied- Samples on the same
"Lane"
cannot have the same"Sample_ID"
and"Library_ID"
. - Samples cannot have the same
"Sample_ID"
if no"Lane"
has been defined. - The same
"index"
or"index2"
combination cannot exist per flowcell or per lane if lanes have been defined. - All samples have the same index design (
"index"
,"index2"
) per flowcell or per lane if lanes have been defined.
Parameters: sample – Sample
to add to thisSampleSheet
.Note
It is unclear if the Illumina specification truly allows for equivalent samples to exist on the same sample sheet. To mitigate the warnings in this library when you encounter such a case, use a code pattern like the following:
>>> import warnings >>> warnings.simplefilter("ignore") >>> from sample_sheet import SampleSheet >>> SampleSheet('tests/resources/single-end-colliding-sample-ids.csv'); SampleSheet('tests/resources/single-end-colliding-sample-ids.csv')
-
add_samples
(samples: Iterable[sample_sheet.Sample]) → None[source]¶ Add samples in an iterable to this
SampleSheet
.
-
add_section
(section_name: str) → None[source]¶ Add a section to the
SampleSheet
.
-
all_sample_keys
¶ Return the unique keys of all samples in this
SampleSheet
.The keys are discovered first by the order of samples and second by the order of keys upon those samples.
-
experimental_design
¶ Return a markdown summary of the samples on this sample sheet.
This property supports displaying rendered markdown only when running within an IPython interpreter. If we are not running in an IPython interpreter, then print out a nicely formatted ASCII table.
Returns: A visual table of IDs and names for all samples. Return type: Markdown, str
-
is_paired_end
¶ Return if the samples are paired-end.
-
is_single_end
¶ Return if the samples are single-end.
-
samples
¶ Return the samples present in this
SampleSheet
.
-
to_json
(**kwargs) → str[source]¶ Write this
SampleSheet
to JSON.Returns: The JSON dump of all entries in this sample sheet. Return type: str
-
to_picard_basecalling_params
(directory: Union[str, pathlib.Path], bam_prefix: Union[str, pathlib.Path], lanes: Union[int, List[int]]) → None[source]¶ Writes sample and library information to a set of files for a given set of lanes.
BARCODE PARAMETERS FILES: Store information regarding the sample index sequences, sample index names, and, optionally, the library name. These files are used by Picard’s CollectIlluminaBasecallingMetrics and Picard’s ExtractIlluminaBarcodes. The output tab-seperated files are formatted as:
<directory>/barcode_params.<lane>.txt
LIBRARY PARAMETERS FILES: Store information regarding the sample index sequences, sample index names, and optionally sample library and descriptions. A path to the resulting demultiplexed BAM file is also stored which is used by Picard’s IlluminaBasecallsToSam. The output tab-seperated files are formatted as:
<directory>/library_params.<lane>.txt
The format of the BAM file output paths in the library parameter files are formatted as:
<bam_prefix>/<Sample_Name>.<Sample_Library>/<Sample_Name>.<index><index2>.<lane>.bam
Two files will be written to
directory
for alllanes
specified. If the path todirectory
does not exist, it will be created.Parameters: - directory – File path to the directory to write the parameter files.
- bam_prefix – Where the demultiplexed BAMs should be written.
- lanes – The lanes to write basecalling parameters for.
-
write
(handle: TextIO, blank_lines: int = 1) → None[source]¶ Write this
SampleSheet
to a file-like object.Parameters: - handle – Object to wrap by csv.writer.
- blank_lines – Number of blank lines to write between sections.
-