API Reference¶

`sample_sheet` module¶

class sample_sheet.ReadStructure(structure: str)[source]¶

Bases: object

An object describing the order, number, and type of bases in a read.

A read structure is a sequence of tokens in the form <number><operator> where <operator> can describe template, skip, index, or UMI bases.

Operator	Description
T	Template base (e.g. experimental DNA, RNA)
S	Bases to be skipped or ignored
B	Bases to be used as an index to identify the sample
M	Bases to be used as an index to identify the molecule

Parameters:	structure – Read structure string representation.

Examples

>>> rs = ReadStructure("10M141T8B")
>>> rs.is_paired_end
False
>>> rs.has_umi
True
>>> rs.tokens
['10M', '141T', '8B']

Note

This class does not currently support read structures where the last operator has ambiguous length by using <+> preceding the <operator>.

Definitions of common read structure uses can be found at the following location:

https://github.com/nh13/read-structure-examples

Discussion on the topic of read structure use in hts-specs:

https://github.com/samtools/hts-specs/issues/270

_sum_cycles_from_tokens(tokens: List[str]) → int[source]¶: Sum the total number of cycles over a list of tokens.

copy() → sample_sheet.ReadStructure[source]¶: Return a deep copy of this read structure.

has_indexes¶: Return if this read structure has any index operators.

has_skips¶: Return if this read structure has any skip operators.

has_umi¶: Return if this read structure has any UMI operators.

index_cycles¶: The number of cycles dedicated to indexes.

index_tokens¶: Return a list of all index tokens in the read structure.

is_dual_indexed¶: Return if this read structure is dual indexed.

is_indexed¶: Return if this read structure has sample indexes.

is_paired_end¶: Return if this read structure is paired-end.

is_single_end¶: Return if this read structure is single-end.

is_single_indexed¶: Return if this read structure is single indexed.

skip_cycles¶: The number of cycles dedicated to skips.

skip_tokens¶: Return a list of all skip tokens in the read structure.

template_cycles¶: The number of cycles dedicated to template.

template_tokens¶: Return a list of all template tokens in the read structure.

tokens¶: Return a list of all tokens in the read structure.

total_cycles¶: The number of total number of cycles in the structure.

umi_cycles¶: The number of cycles dedicated to UMI.

umi_tokens¶: Return a list of all UMI tokens in the read structure.

class sample_sheet.Sample(data: Optional[Mapping] = None, **kwargs)[source]¶

Bases: requests.structures.CaseInsensitiveDict

A single sample for a sample sheet.

This class is built with the keys and values in the "[Data]" section of the sample sheet. As specified by Illumina, the only required keys are:

"Sample_ID"

Although this library recommends you define the following column names:

"Sample_ID"

"Sample_Name"

"index"

If the key "Read_Structure" is provided then its value is promoted to the class ReadStructure and additional functionality is enabled.

Parameters:	data – Mapping of key-value pairs describing this sample. kwargs – Key-value pairs describing this sample.

Examples

>>> mapping = {"Sample_ID": "87", "Sample_Name": "3T", "index": "A"}
>>> sample = Sample(mapping)
>>> sample
Sample({'Sample_ID': '87', 'Sample_Name': '3T', 'index': 'A'})
>>> sample = Sample({'Read_Structure': '151T'})
>>> sample.Read_Structure
ReadStructure(structure='151T')

to_json() → Mapping[source]¶: Return the properties of this Sample as JSON serializable.

class sample_sheet.SampleSheet(path: Union[pathlib.Path, str, TextIO, None] = None)[source]¶

Bases: object

A representation of an Illumina sample sheet.

A sample sheet document almost conform to the .ini standards, but does not, so a custom parser is needed. Sample sheets are stored in plain text with comma-seperated values and string quoting around any field which contains a comma. The sample sheet is composed of four sections, marked by a header.

Title name	Description
`[Header]`	`.ini` convention
`[<Other>]`	`.ini` convention (optional, multiple, user-defined)
`[Settings]`	`.ini` convention
`[Reads]`	`.ini` convention as a vertical array of items
`[Data]`	table with header

Parameters:	path – Any path supported by `pathlib.Path` and/or `smart_open.smart_open` when smart_open is installed.

_repr_tty_() → str[source]¶: Return a summary of this sample sheet in a TTY compatible codec.

add_sample(sample: sample_sheet.Sample) → None[source]¶

Add a Sample to this SampleSheet.

All samples are validated against the first sample added to the sample sheet to ensure there are no ID collisions or incompatible read structures (if supplied). All samples are also validated against the "[Reads]" section of the sample sheet if it has been defined.

The following validation is performed when adding a sample:

Read_Structure is identical in all samples, if supplied

Read_Structure is compatible with "[Reads]", if supplied

Samples on the same "Lane" cannot have the same "Sample_ID" and "Library_ID".

Samples cannot have the same "Sample_ID" if no "Lane" has been defined.

The same "index" or "index2" combination cannot exist per flowcell or per lane if lanes have been defined.

All samples have the same index design ("index", "index2") per flowcell or per lane if lanes have been defined.

Parameters:	sample – `Sample` to add to this `SampleSheet`.

Note

It is unclear if the Illumina specification truly allows for equivalent samples to exist on the same sample sheet. To mitigate the warnings in this library when you encounter such a case, use a code pattern like the following:

>>> import warnings
>>> warnings.simplefilter("ignore")
>>> from sample_sheet import SampleSheet
>>> SampleSheet('tests/resources/single-end-colliding-sample-ids.csv');
SampleSheet('tests/resources/single-end-colliding-sample-ids.csv')

add_samples(samples: Iterable[sample_sheet.Sample]) → None[source]¶: Add samples in an iterable to this SampleSheet.

add_section(section_name: str) → None[source]¶: Add a section to the SampleSheet.

all_sample_keys¶

Return the unique keys of all samples in this SampleSheet.

The keys are discovered first by the order of samples and second by the order of keys upon those samples.

experimental_design¶

Return a markdown summary of the samples on this sample sheet.

This property supports displaying rendered markdown only when running within an IPython interpreter. If we are not running in an IPython interpreter, then print out a nicely formatted ASCII table.

Returns:	A visual table of IDs and names for all samples.
Return type:	Markdown, str

is_paired_end¶: Return if the samples are paired-end.

is_single_end¶: Return if the samples are single-end.

samples¶: Return the samples present in this SampleSheet.

to_json(**kwargs) → str[source]¶

Write this SampleSheet to JSON.

Returns:	The JSON dump of all entries in this sample sheet.
Return type:	str

to_picard_basecalling_params(directory: Union[str, pathlib.Path], bam_prefix: Union[str, pathlib.Path], lanes: Union[int, List[int]]) → None[source]¶

Writes sample and library information to a set of files for a given set of lanes.

BARCODE PARAMETERS FILES: Store information regarding the sample index sequences, sample index names, and, optionally, the library name. These files are used by Picard’s CollectIlluminaBasecallingMetrics and Picard’s ExtractIlluminaBarcodes. The output tab-seperated files are formatted as:

<directory>/barcode_params.<lane>.txt

LIBRARY PARAMETERS FILES: Store information regarding the sample index sequences, sample index names, and optionally sample library and descriptions. A path to the resulting demultiplexed BAM file is also stored which is used by Picard’s IlluminaBasecallsToSam. The output tab-seperated files are formatted as:

<directory>/library_params.<lane>.txt

The format of the BAM file output paths in the library parameter files are formatted as:

<bam_prefix>/<Sample_Name>.<Sample_Library>/<Sample_Name>.<index><index2>.<lane>.bam

Two files will be written to directory for all lanes specified. If the path to directory does not exist, it will be created.

Parameters:	directory – File path to the directory to write the parameter files. bam_prefix – Where the demultiplexed BAMs should be written. lanes – The lanes to write basecalling parameters for.

write(handle: TextIO, blank_lines: int = 1) → None[source]¶

Write this SampleSheet to a file-like object.

Parameters:	handle – Object to wrap by csv.writer. blank_lines – Number of blank lines to write between sections.

`sample_sheet.util` module¶

sample_sheet.util.is_ipython_interpreter() → bool[source]¶: Return if we are in an IPython interpreter or not.

sample_sheet.util.maybe_render_markdown(string: str) → Any[source]¶: Render a string as Markdown only if in an IPython interpreter.

API Reference¶

sample_sheet module¶

sample_sheet.util module¶

`sample_sheet` module¶

`sample_sheet.util` module¶