Introduction¶
What does ninetysix do?¶
At its core, ninetysix is built upon and inspired by the pandas and holoviews packages. The main functionality of ninetysix is the Plate class, which extends the functionality of pandas in ways that are helpful for working up data from experiments that return well-value pairs. In other words, if you perform an assay that returns quantitative data in a 96-well plate, ninetysix can help you analyze this array of 96 well-value pairs.
If you are familiar with the pandas melt and groupby functions or the holoviews catch-phrase "Stop plotting your data - annotate your data and let it visualize itself", then you will understand the fundamentals of ninetysix's annotations and values-oriented processing.
But don't worry if you aren't familiar with those!
ninetysix is straightforward to use, and was written to simplify life especially for those who have little-to-no experience with pandas or holoviews. (Or with python for that matter: the main python structure that is used without explanation is a dictionary, which is simply a container for key-value pairs and is worth getting to know.)
Plate basics¶
At minimum, a Plate requires data containing 'well' information (e.g., 'A1' or 'A01') and each associated 'value', or measurement. (Additional information or values can of course be present in the data as well.) This is the simplest form of data that is usually returned by an instrument when it forms measurements on a plate. A Plate object stores your data with each row describing a single well and each column providing information about that well.
From here, Plate provides simple and convenient methods to add additional information to describe each well in the plate and further process or standardize the data.
To facilitate analysis, Plate makes several assumptions about the data (many of which can be explicitly overwritten when needed). The main assumption is that your data can be described as a location, annotation, or value, and Plate always arrays the data in this order. For example, 'well', 'row', and 'column' are locations, whereas your measurements are values. Other generic information that is critical to understanding your data but neither a location or value is an annotation. Furthermore, Plate assumes that you have one value that is particularly important at any given time, which is explicitly specified as the value_name and set to the furthest right column in the data. When you call methods for processing or visulaizing the data and do not specify a new value name in that method, ninetysix automatically assumes you mean the right-most value column.
Let's see how this works in practice.
import ninetysix as ns
import pandas as pd
First, we'll examine our example_data as a pandas DataFrame:
df = pd.read_csv('example_data.csv')
df
| well | activity | |
|---|---|---|
| 0 | A1 | 11.90 |
| 1 | A2 | 6.87 |
| 2 | A3 | 8.30 |
| 3 | A4 | 8.57 |
| 4 | A5 | 7.84 |
| ... | ... | ... |
| 91 | H8 | 12.34 |
| 92 | H9 | 8.06 |
| 93 | H10 | 5.27 |
| 94 | H11 | 7.38 |
| 95 | H12 | 6.92 |
96 rows × 2 columns
We have a 96x2 DataFrame, with each row representing one of 96 wells and each column representing information about that well, either the well name or its measured activity.
Next we'll create a Plate object from this same data. (Note: Many data input formats are accepted by Plate. See The Plate class page for more information.)
pt = ns.Plate('example_data.csv')
pt
| well | row | column | activity | |
|---|---|---|---|---|
| 0 | A1 | A | 1 | 11.90 |
| 1 | A2 | A | 2 | 6.87 |
| 2 | A3 | A | 3 | 8.30 |
| 3 | A4 | A | 4 | 8.57 |
| 4 | A5 | A | 5 | 7.84 |
| ... | ... | ... | ... | ... |
| 91 | H8 | H | 8 | 12.34 |
| 92 | H9 | H | 9 | 8.06 |
| 93 | H10 | H | 10 | 5.27 |
| 94 | H11 | H | 11 | 7.38 |
| 95 | H12 | H | 12 | 6.92 |
96 rows × 4 columns
Overall this is very similar, but to aid in downstream processing, ninetysix automatically adds 'row' and 'column' information, which is derived from the 'well' column. For example, this quickly lets us use the plot_hm() method to make a Heat-Map from the data:
pt.plot_hm()
Of course, this is not too difficult from the pandas DataFrame we generated above. We just have to add the appropriate columns and use the plot_hm() function available from ns.viz:
df['row'], df['column'] = zip(*df['well'].apply(
lambda well: (well[0], int(well[1:]))
))
ns.viz.plot_hm(df, value_name='activity')
Adding and using annotations¶
But what about when we have lots of information to add about each well, and further processing we want to do to our 'activity' data?
ninetysix simplifies these operations.
ns.parsers.well_regex¶
Dictionaries with key-value pairs that represent a single well and information about it are a powerful way to add information to a plate, but writing 96 key-value pairs is cumbersome. To alleviate this, ninetysix provides well_regex in the parsers module, which accepts well keys written in a simple regex form and expands them.
well_info = {
'[A-D]10': 'standard',
'[A,H][1,12]': 'empty',
}
ns.parsers.well_regex(well_info)
{'A10': 'standard',
'B10': 'standard',
'C10': 'standard',
'D10': 'standard',
'A1': 'empty',
'A12': 'empty',
'H1': 'empty',
'H12': 'empty'}
When this is used in conjunction with the Plate.annotate_wells() method, it provides a simple way to label your wells with conditions.
# Specify control information
controls = {
'default': 'experiment',
'[A-D]10': 'standard',
'[E-H]10': 'negative',
}
# Label the edge wells
edges = {
'[A,H][1-12]': True,
'[A-H][1,12]': True,
'else': False,
}
# Pass into a new dictionary, where key = new column name
annotations = {
'controls': controls,
'edge well': edges,
}
# Call annotate_wells method with the nested dict
pt = pt.annotate_wells(annotations)
pt
| well | row | column | controls | edge well | activity | |
|---|---|---|---|---|---|---|
| 0 | A1 | A | 1 | experiment | True | 11.90 |
| 1 | A2 | A | 2 | experiment | True | 6.87 |
| 2 | A3 | A | 3 | experiment | True | 8.30 |
| 3 | A4 | A | 4 | experiment | True | 8.57 |
| 4 | A5 | A | 5 | experiment | True | 7.84 |
| ... | ... | ... | ... | ... | ... | ... |
| 91 | H8 | H | 8 | experiment | True | 12.34 |
| 92 | H9 | H | 9 | experiment | True | 8.06 |
| 93 | H10 | H | 10 | negative | True | 5.27 |
| 94 | H11 | H | 11 | experiment | True | 7.38 |
| 95 | H12 | H | 12 | experiment | True | 6.92 |
96 rows × 6 columns
We can now use this information in the Heat-Map we made above:
# Make a declarative color map
cmap = {
'standard': ns.Colors.green,
'negative': ns.Colors.orange,
'experiment': ns.Colors.blue,
}
pt.plot_hm(
# Outline each well with the control information
outline='controls',
# Color the controls accordingly
outline_cmap=cmap,
# Ignore the majority group (experimental wells)
exclude_major=True
)
The annotate_wells method also takes an Excel spreadsheet as its argument, which can be made using the template found here and described more on The Plate class page.
Additionally, just like a normal DataFrame, you can place new columns directly into the data. These will be placed as an annotation, which is set between the locations on the left and values on the right:
pt['plate'] = 1
pt
| well | row | column | controls | edge well | plate | activity | |
|---|---|---|---|---|---|---|---|
| 0 | A1 | A | 1 | experiment | True | 1 | 11.90 |
| 1 | A2 | A | 2 | experiment | True | 1 | 6.87 |
| 2 | A3 | A | 3 | experiment | True | 1 | 8.30 |
| 3 | A4 | A | 4 | experiment | True | 1 | 8.57 |
| 4 | A5 | A | 5 | experiment | True | 1 | 7.84 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 91 | H8 | H | 8 | experiment | True | 1 | 12.34 |
| 92 | H9 | H | 9 | experiment | True | 1 | 8.06 |
| 93 | H10 | H | 10 | negative | True | 1 | 5.27 |
| 94 | H11 | H | 11 | experiment | True | 1 | 7.38 |
| 95 | H12 | H | 12 | experiment | True | 1 | 6.92 |
96 rows × 7 columns
You can also delete columns:
del pt['edge well']
pt
| well | row | column | controls | plate | activity | |
|---|---|---|---|---|---|---|
| 0 | A1 | A | 1 | experiment | 1 | 11.90 |
| 1 | A2 | A | 2 | experiment | 1 | 6.87 |
| 2 | A3 | A | 3 | experiment | 1 | 8.30 |
| 3 | A4 | A | 4 | experiment | 1 | 8.57 |
| 4 | A5 | A | 5 | experiment | 1 | 7.84 |
| ... | ... | ... | ... | ... | ... | ... |
| 91 | H8 | H | 8 | experiment | 1 | 12.34 |
| 92 | H9 | H | 9 | experiment | 1 | 8.06 |
| 93 | H10 | H | 10 | negative | 1 | 5.27 |
| 94 | H11 | H | 11 | experiment | 1 | 7.38 |
| 95 | H12 | H | 12 | experiment | 1 | 6.92 |
96 rows × 6 columns
normalize based on well information¶
A value in a Plate object can be readily normalized in a couple of ways, returning a new column with the prefix 'normalized_'.
# No arguments just sets the max value to 1
pt.normalize().plot_scatter(
color='controls',
cmap=cmap,
ranked=True,
value_name='normalized_activity'
)
# zero=True sets all data between 0 and 1
pt.normalize(
zero=True
).plot_scatter(
color='controls',
cmap=cmap,
ranked=True,
value_name='normalized_activity'
)
Most powerfully though, you can normalize based on specific groups that should have a normalized value, such as a standard in the plate (set to 1) or a negative control (set to 0). This will give you fold-change difference compared to the standard, for example.
# String arguments passed to 'to' and 'zero' specify groups to normalize to
# Compare to the .query() method from pandas.DataFrame
pt.normalize(
to='controls=standard',
zero='controls=negative'
).plot_scatter(
color='controls',
cmap=cmap,
ranked=True,
value_name='normalized_activity'
)
More information on getting the most out of the Plate class, including using pandas methods directly from a Plate object, can be found on The Plate class page.
More information on optimizing your visualizations can be found on the Basic data visualization and Advanced data visualization pages.
Information on constructing and using multi-Plate objects can be found on the Plates page.