Quick Start¶
To demonstrate the features of this library we will use a test file at an HTTPS endpoint. To follow along, ensure you have the smart_open library installed!
>>> from sample_sheet import SampleSheet
>>> url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
>>> sample_sheet = SampleSheet(url)
The metadata of the sample sheet can be accessed with the Header
,
Reads
and, Settings
attributes:
>>> sample_sheet.Header.Assay
'SureSelectXT'
>>> sample_sheet.Reads
[151, 151]
>>> sample_sheet.is_paired_end
True
>>> sample_sheet.Settings.BarcodeMismatches
'2'
The samples can be accessed directly or via iteration:
>>> sample_sheet.samples
[Sample({'Sample_ID': '1823A', 'Sample_Name': '1823A-tissue', 'index': 'GAATCTGA'}),
Sample({'Sample_ID': '1823B', 'Sample_Name': '1823B-tissue', 'index': 'AGCAGGAA'}),
Sample({'Sample_ID': '1824A', 'Sample_Name': '1824A-tissue', 'index': 'GAGCTGAA'}),
Sample({'Sample_ID': '1825A', 'Sample_Name': '1825A-tissue', 'index': 'AAACATCG'}),
Sample({'Sample_ID': '1826A', 'Sample_Name': '1826A-tissue', 'index': 'GAGTTAGC'}),
Sample({'Sample_ID': '1826B', 'Sample_Name': '1823A-tissue', 'index': 'CGAACTTA'}),
Sample({'Sample_ID': '1829A', 'Sample_Name': '1823B-tissue', 'index': 'GATAGACA'})]
>>> first_sample, *other_samples = list(sample_sheet)
>>> first_sample
Sample({'Sample_ID': '1823A', 'Sample_Name': '1823A-tissue', 'index': 'GAATCTGA'})
Defining Sample Read Structures¶
If a column labeled Read_Structure
is provided per sample, then
additional functionality is enabled.
>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure='151T8B151T')
>>> first_sample.Read_Structure.total_cycles
310
>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']
Sample Sheet Creation¶
Sample sheets can be created de novo and written to a file-like object. The following snippet shows how to add attributes to mandatory sections, add optional user-defined sections, and add samples before writing to nowhere.
>>> import os
>>> from sample_sheet import SampleSheet, Sample
>>> sample_sheet = SampleSheet()
# [Header] section
# Adding an attribute with spaces must be done with the add_attr() method
>>> sample_sheet.Header['IEM4FileVersion'] = 4
>>> sample_sheet.Header['Investigator Name'] = 'jdoe'
# [Settings] section
>>> sample_sheet.Settings['CreateFastqForIndexReads'] = 1
>>> sample_sheet.Settings['BarcodeMismatches'] = 2
# Optional sample sheet sections can be added and then accessed
>>> sample_sheet.add_section('Manifests')
>>> sample_sheet.Manifests['PoolDNA'] = "DNAMatrix.txt"
# Specify a paired-end kit with 151 template bases per read
>>> sample_sheet.Reads = [151, 151]
# Add a single-indexed sample with both a name, ID, and index
>>> sample = Sample(dict(Sample_ID='1823A', Sample_Name='1823A-tissue', index='ACGT'))
>>> sample_sheet.add_sample(sample)
# Write the Sample Sheet!
>>> sample_sheet.write(open(os.devnull, 'w'))