It was launched in January of 2020, when we made the fly “hemibrain” connectome available on the internet — an online database that provides the morphological structure and synaptic connection of almost half of the brain of a fruit fly (Drosophila melanogaster). This database and its related visualisation has redefined the way that neural circuits are investigated and understood in the fly brain. Because the fruit fly brain is small enough to acquire a reasonable and full map using contemporary mapping techniques, the gains made are, at most, only somewhat instructive in regard to comprehending the most important topic in neuroscience – the human brain.
Today, in collaboration with the Lichtman Laboratory at Harvard University, we are releasing the “H01” dataset, a 1.4 petabyte rendering of a small sample of human brain tissue, alongside a companion paper that presents the findings of a connectomic study of a petascale (1,000,000,000,000,000) cortical fragment of human cerebral cortex. The H01 sample was scanned at 4nm-resolution by serial section electron microscopy, reconstructed and annotated by automated computer techniques, and studied with the hope of getting early insights into the anatomy of the human cortex. The dataset contains imaging data that is applied to the analysis of roughly one cubic millimetre of brain tissue, and contains tens of thousands of reconstructed neurons, millions of neuron fragments, 130 million annotated synapses, 104 proofread cells, and other additional subcellular annotations and structures — all of which are easily accessible with the Neuroglancer browser interface. Currently, the H01 sample represents the biggest amount of brain tissue scanned and reconstructed to this degree of detail in any species, and is the first comprehensive study of synaptic connection in the human cortex that covers various cell types throughout all levels of the cortex. The project’s main objectives are to provide a fresh resource for the study of the human brain and to help further develop and scale the underlying connectomics technology.
The Human Cortex is an interesting discovery
The cerebral cortex is the thin surface layer of the brain, present in vertebrate animals, that has developed most recently, with the largest diversity in size across mammalian species (it is especially large in humans). The cerebral cortex is made up of six cells (e.g., L2), with each layer containing a distinct form of nerve cell (e.g., spiny stellate). The cerebral cortex plays a significant role in most higher-level cognitive activities, including as thinking, memory, planning, perception, language, and attention. Much of the research that has been done has advanced our knowledge of the macroscopic structure of this incredibly sophisticated tissue, but there is still much that is unclear about the architecture of individual nerve cells and their interconnecting synapses.
The scientific study of the human brain connectome, from surgical biopsy to a 3D database
The method for mapping the structure of the brain at the resolution of individual synapses, as implemented using high-resolution microscopy methods, needs biochemically stabilised (fixed) tissue. With the cooperation of MGH brain surgeons in Boston, we worked with them to obtain tissue from the brain for use in their operation to treat epilepsy. The surgeons in this hospital sometimes take pieces of normal human cerebral cortex to access a site deeper in the brain where seizures begin, so that they can help patients avoid a seizure. This tissue was contributed by patients who ordinarily throw it away, and it was sent to the Lichtman lab, where it may be used for research. To perform this analysis, the Harvard researchers sectioned the tissue into over 53,000 individual 30 nanometer sections using an automated tape-based system that created ultra-microtome sections, and then mounted those sections on silicon wafers before imaging the tissue at 4 nm resolution in a custom-built 61-beam parallelized scanning electron microscope.
Photomosaicking each of the 53,000 physical sections would result in about 225 million individual 2D pictures. Our team used a computer process to align and assemble all of this data into a single 3D volume. While the quality of the data was typically high, these alignment systems had to robustly address a variety of complications, including image artefacts, missing sections, fluctuation in microscopy settings, and physical stretching and compression of the tissue After everything was said and done, an extremely large Google Cloud TPUs cluster (3,000 of them) was aligned, then multiscale flood-filling software was used to 3D segment each individual cell in the tissue. Additional machine learning processes were applied to identify and describe 130 million synapses, identify each 3D fragment as belonging to one of four basic subcompartments (axon, dendrite, or cell body), and identify additional structures of interest such as myelin and cilia. The reconstruction of the data was automated, but there were still some issues, thus human checks were done on around one hundred cells to make sure the reconstruction was correct. In the long term, we intend to enhance the set of certified cells by doing further human efforts, and also by making greater progress in automation.
The imaging data, reconstruction results, and annotations are viewable through an interactive web-based 3D visualization interface, called Neuroglancer, that was originally developed to visualize the fruit fly brain. Neuroglancer is available as open-source software, and widely used in the broader connectomics community. Several new features were introduced to support analysis of the H01 dataset, in particular support for searching for specific neurons in the dataset based on their type or other properties.
Analysis of the Human Cortex
In a companion preprint, we show how H01 has already been used to study several interesting aspects of the organization of the human cortex. In particular, new cell types have been discovered, as well as the presence of “outlier” axonal inputs, which establish powerful synaptic connections with target dendrites. While these findings are a promising start, the vastness of the H01 dataset will provide a basis for many years of further study by researchers interested in the human cortex.
In order to accelerate the analysis of H01, we also provide embeddings of the H01 data that were generated by a neural network trained using a variant of the SimCLR self-supervised learning technique. These embeddings provide highly informative representations of local parts of the dataset that can be used to rapidly annotate new structures and develop new ways of clustering and categorizing brain structures according to purely data-driven criteria. We trained these embeddings using Google Cloud TPU pods and then performed inference at roughly four billion data locations spread throughout the volume.
Managing Dataset Size with Improved Compression
H01 is a petabyte-scale dataset, but is only one-millionth the volume of an entire human brain. Serious technical challenges remain in scaling up synapse-level brain mapping to an entire mouse brain (500x bigger than H01), let alone an entire human brain. One of these challenges is data storage: a mouse brain could generate an exabyte worth of data, which is costly to store. To address this, we are today also releasing a paper, “Denoising-based Image Compression for Connectomics”, that details how a machine learning-based denoising strategy can be used to compress data, such as H01, at least 17-fold (dashed line in the figure below), with negligible loss of accuracy in the automated reconstruction.
Random variations in the electron microscopy imaging process lead to image noise that is difficult to compress even in principle, as the noise lacks spatial correlations or other structure that could be described with fewer bytes. Therefore we acquired images of the same piece of tissue in both a “fast” acquisition regime (resulting in high amounts of noise) and a “slow” acquisition regime (resulting in low amounts of noise) and then trained a neural network to infer “slow” scans from “fast” scans. Standard image compression codecs were then able to (lossily) compress the “virtual” slow scans with fewer artifacts compared to the raw data. We believe this advance has the potential to significantly mitigate the costs associated with future large scale connectomics projects.
But storage is not the only problem. The sheer size of future data sets will require developing new strategies for researchers to organize and access the rich information inherent in connectomic data. These are challenges that will require new modes of interaction between humans and the brain mapping data that will be forthcoming.