Introduction¶
What does ninetysix
do?¶
At its core, ninetysix
is built upon and inspired by the pandas
and holoviews
packages. The main functionality of ninetysix
is the Plate
class, which extends the functionality of pandas
in ways that are helpful for working up data from experiments that return well-value pairs. In other words, if you perform an assay that returns quantitative data in a 96-well plate, ninetysix
can help you analyze this array of 96 well-value pairs.
If you are familiar with the pandas
melt
and groupby
functions or the holoviews
catch-phrase "Stop plotting your data - annotate your data and let it visualize itself", then you will understand the fundamentals of ninetysix
's annotations
and values
-oriented processing.
But don't worry if you aren't familiar with those!
ninetysix
is straightforward to use, and was written to simplify life especially for those who have little-to-no experience with pandas
or holoviews
. (Or with python
for that matter: the main python
structure that is used without explanation is a dictionary, which is simply a container for key-value pairs and is worth getting to know.)
Plate
basics¶
At minimum, a Plate
requires data containing 'well' information (e.g., 'A1' or 'A01') and each associated 'value', or measurement. (Additional information or values can of course be present in the data as well.) This is the simplest form of data that is usually returned by an instrument when it forms measurements on a plate. A Plate
object stores your data with each row describing a single well and each column providing information about that well.
From here, Plate
provides simple and convenient methods to add additional information to describe each well in the plate and further process or standardize the data.
To facilitate analysis, Plate
makes several assumptions about the data (many of which can be explicitly overwritten when needed). The main assumption is that your data can be described as a location
, annotation
, or value
, and Plate
always arrays the data in this order. For example, 'well', 'row', and 'column' are locations
, whereas your measurements are values
. Other generic information that is critical to understanding your data but neither a location
or value
is an annotation. Furthermore, Plate
assumes that you have one value that is particularly important at any given time, which is explicitly specified as the value_name
and set to the furthest right column in the data. When you call methods for processing or visulaizing the data and do not specify a new value name in that method, ninetysix
automatically assumes you mean the right-most value
column.
Let's see how this works in practice.
import ninetysix as ns
import pandas as pd
First, we'll examine our example_data
as a pandas DataFrame
:
df = pd.read_csv('example_data.csv')
df
well | activity | |
---|---|---|
0 | A1 | 11.90 |
1 | A2 | 6.87 |
2 | A3 | 8.30 |
3 | A4 | 8.57 |
4 | A5 | 7.84 |
... | ... | ... |
91 | H8 | 12.34 |
92 | H9 | 8.06 |
93 | H10 | 5.27 |
94 | H11 | 7.38 |
95 | H12 | 6.92 |
96 rows × 2 columns
We have a 96x2 DataFrame
, with each row representing one of 96 wells and each column representing information about that well, either the well name or its measured activity.
Next we'll create a Plate
object from this same data. (Note: Many data input formats are accepted by Plate
. See The Plate class page for more information.)
pt = ns.Plate('example_data.csv')
pt
well | row | column | activity | |
---|---|---|---|---|
0 | A1 | A | 1 | 11.90 |
1 | A2 | A | 2 | 6.87 |
2 | A3 | A | 3 | 8.30 |
3 | A4 | A | 4 | 8.57 |
4 | A5 | A | 5 | 7.84 |
... | ... | ... | ... | ... |
91 | H8 | H | 8 | 12.34 |
92 | H9 | H | 9 | 8.06 |
93 | H10 | H | 10 | 5.27 |
94 | H11 | H | 11 | 7.38 |
95 | H12 | H | 12 | 6.92 |
96 rows × 4 columns
Overall this is very similar, but to aid in downstream processing, ninetysix
automatically adds 'row' and 'column' information, which is derived from the 'well' column. For example, this quickly lets us use the plot_hm()
method to make a Heat-Map from the data:
pt.plot_hm()
Of course, this is not too difficult from the pandas DataFrame
we generated above. We just have to add the appropriate columns and use the plot_hm()
function available from ns.viz
:
df['row'], df['column'] = zip(*df['well'].apply(
lambda well: (well[0], int(well[1:]))
))
ns.viz.plot_hm(df, value_name='activity')
Adding and using annotations
¶
But what about when we have lots of information to add about each well, and further processing we want to do to our 'activity' data?
ninetysix
simplifies these operations.
ns.parsers.well_regex
¶
Dictionaries with key-value pairs that represent a single well and information about it are a powerful way to add information to a plate, but writing 96 key-value pairs is cumbersome. To alleviate this, ninetysix
provides well_regex
in the parsers
module, which accepts well keys written in a simple regex form and expands them.
well_info = {
'[A-D]10': 'standard',
'[A,H][1,12]': 'empty',
}
ns.parsers.well_regex(well_info)
{'A10': 'standard', 'B10': 'standard', 'C10': 'standard', 'D10': 'standard', 'A1': 'empty', 'A12': 'empty', 'H1': 'empty', 'H12': 'empty'}
When this is used in conjunction with the Plate.annotate_wells()
method, it provides a simple way to label your wells with conditions.
# Specify control information
controls = {
'default': 'experiment',
'[A-D]10': 'standard',
'[E-H]10': 'negative',
}
# Label the edge wells
edges = {
'[A,H][1-12]': True,
'[A-H][1,12]': True,
'else': False,
}
# Pass into a new dictionary, where key = new column name
annotations = {
'controls': controls,
'edge well': edges,
}
# Call annotate_wells method with the nested dict
pt = pt.annotate_wells(annotations)
pt
well | row | column | controls | edge well | activity | |
---|---|---|---|---|---|---|
0 | A1 | A | 1 | experiment | True | 11.90 |
1 | A2 | A | 2 | experiment | True | 6.87 |
2 | A3 | A | 3 | experiment | True | 8.30 |
3 | A4 | A | 4 | experiment | True | 8.57 |
4 | A5 | A | 5 | experiment | True | 7.84 |
... | ... | ... | ... | ... | ... | ... |
91 | H8 | H | 8 | experiment | True | 12.34 |
92 | H9 | H | 9 | experiment | True | 8.06 |
93 | H10 | H | 10 | negative | True | 5.27 |
94 | H11 | H | 11 | experiment | True | 7.38 |
95 | H12 | H | 12 | experiment | True | 6.92 |
96 rows × 6 columns
We can now use this information in the Heat-Map we made above:
# Make a declarative color map
cmap = {
'standard': ns.Colors.green,
'negative': ns.Colors.orange,
'experiment': ns.Colors.blue,
}
pt.plot_hm(
# Outline each well with the control information
outline='controls',
# Color the controls accordingly
outline_cmap=cmap,
# Ignore the majority group (experimental wells)
exclude_major=True
)
The annotate_wells
method also takes an Excel spreadsheet as its argument, which can be made using the template found here and described more on The Plate class page.
Additionally, just like a normal DataFrame
, you can place new columns directly into the data. These will be placed as an annotation
, which is set between the locations
on the left and values
on the right:
pt['plate'] = 1
pt
well | row | column | controls | edge well | plate | activity | |
---|---|---|---|---|---|---|---|
0 | A1 | A | 1 | experiment | True | 1 | 11.90 |
1 | A2 | A | 2 | experiment | True | 1 | 6.87 |
2 | A3 | A | 3 | experiment | True | 1 | 8.30 |
3 | A4 | A | 4 | experiment | True | 1 | 8.57 |
4 | A5 | A | 5 | experiment | True | 1 | 7.84 |
... | ... | ... | ... | ... | ... | ... | ... |
91 | H8 | H | 8 | experiment | True | 1 | 12.34 |
92 | H9 | H | 9 | experiment | True | 1 | 8.06 |
93 | H10 | H | 10 | negative | True | 1 | 5.27 |
94 | H11 | H | 11 | experiment | True | 1 | 7.38 |
95 | H12 | H | 12 | experiment | True | 1 | 6.92 |
96 rows × 7 columns
You can also delete columns:
del pt['edge well']
pt
well | row | column | controls | plate | activity | |
---|---|---|---|---|---|---|
0 | A1 | A | 1 | experiment | 1 | 11.90 |
1 | A2 | A | 2 | experiment | 1 | 6.87 |
2 | A3 | A | 3 | experiment | 1 | 8.30 |
3 | A4 | A | 4 | experiment | 1 | 8.57 |
4 | A5 | A | 5 | experiment | 1 | 7.84 |
... | ... | ... | ... | ... | ... | ... |
91 | H8 | H | 8 | experiment | 1 | 12.34 |
92 | H9 | H | 9 | experiment | 1 | 8.06 |
93 | H10 | H | 10 | negative | 1 | 5.27 |
94 | H11 | H | 11 | experiment | 1 | 7.38 |
95 | H12 | H | 12 | experiment | 1 | 6.92 |
96 rows × 6 columns
normalize
based on well information¶
A value
in a Plate
object can be readily normalized in a couple of ways, returning a new column with the prefix 'normalized_'.
# No arguments just sets the max value to 1
pt.normalize().plot_scatter(
color='controls',
cmap=cmap,
ranked=True,
value_name='normalized_activity'
)
# zero=True sets all data between 0 and 1
pt.normalize(
zero=True
).plot_scatter(
color='controls',
cmap=cmap,
ranked=True,
value_name='normalized_activity'
)
Most powerfully though, you can normalize based on specific groups that should have a normalized value, such as a standard in the plate (set to 1) or a negative control (set to 0). This will give you fold-change difference compared to the standard, for example.
# String arguments passed to 'to' and 'zero' specify groups to normalize to
# Compare to the .query() method from pandas.DataFrame
pt.normalize(
to='controls=standard',
zero='controls=negative'
).plot_scatter(
color='controls',
cmap=cmap,
ranked=True,
value_name='normalized_activity'
)
More information on getting the most out of the Plate
class, including using pandas
methods directly from a Plate
object, can be found on The Plate class page.
More information on optimizing your visualizations can be found on the Basic data visualization and Advanced data visualization pages.
Information on constructing and using multi-Plate
objects can be found on the Plates page.