Search for content, post, videos

The age of big data

Managing big data is becoming increasingly important in the life science industry.

The field of life science is rapidly changing. For example, genomics and next-generation sequencing (NGS) technologies have revolutionized the research field. Genomes are being sequenced in a jiffy,sequencing costs are steadily plummeting,and the data generated continues to grow faster and faster. Tailored treatments and personalized medicine is on the horizon and the amount of data received from each experiment or diagnosis evaluation is rapidly increasing. Scientific institutes, healthcare systems, hospitals and companies all harvest a tremendous amount of data today.But do we have the tools to handle, store, share and analyze all of this large volume of, and sometimes sensitive, information?

Combining ICT with the biosector
The Finnish Funding Agency for Technology and Innovation (Tekes) has started a new BioIT program “BioIT – Solutions for biological information” and hopes to better arm the life science sector for these challenges.The aim is to help small and medium biosector enterprises to develop business activities and to bring players in the biosector and information and communication technology (ICT) sector together.
“By bringing together experts in different areas of life science with IT experts we hope to create new value networks and hence, new enterprises that can grow and have international success,” says TeppoTuomikoski, manager of the new program.

The program will focus on several areas of the biosector, including pharmaceutical development, genomics, nutrition science and environmental measurements.
“Bioinformatics management and handling information is hugely increasing. It is a challenge to handle, analyze and interpret all of this data,” says Teppo Tuomikoski. “We need both technical solutions and infrastructure solutions.”

The global market for bioinformatics is expected to reach more than USD 6 billion next year (Tekes) and the demand for new services and tools is high.
“In Finland I would say the capacity of people with skills in the ICT fields is good. We have strong expertise in IT and mobile solutions from Nokia and many are shifting their skills to other industries,” says TeppoTuomikoski. “But we need to create a common understanding. IT experts need to understand what biology is all about and life science experts need to understand ICT.”

Personalized health
Teppo Tuomikoski mentions personalized health as an area where the need for solutions to handle data is great. “There is a trend towards tailored solutions for individual needs and this requires in-depth knowledge of the biological origins of illnesses. There is a need for high data-processing capacity,“ he says.

At the Norwegian Cancer Genomics Consortium work has started to incorporate genome sequencing into the national healthcare system. The country aims to DNA sequence each cancer patient’s tumor to provide personalized treatments. The researchers are focusing on the somatic DNA, i.e. concentrating on the cancer, and utilizing the person’s genomic DNA sequence to help to identify mutations in the cancer, since each individual carries many normal DNA variants that are not yet mapped. They are now in the process of sequencing all of the human genome’s protein coding genes, the exome. The researchers will then be able to get a better picture of all the changes in the DNA, in order to better identify the signaling systems in the cancer cells that can be attacked with treatment.

“As each cancer has its own set of mutations, this appears to be an important aspect of the personalized treatment concept that is now starting to have its impact on cancer management,” says Eivind Hovig, professor at the Department of Tumor biology at the Norwegian Radium Hospital, and one of the project’s key investigators.

He and his colleagues are primarily performing Illumina sequencing, based on the HiSeq. They are essentially sequencing sets of well-characterized pairs of samples, i.e. matched normal DNA and tumor DNA from the same individual, and in some cases more than one tumor sample from the same patient, in order to identify the mutations that may be present in the tumor.
“In fact, there may be many mutations in each tumor, in some cases these may be present in almost all the tumor cells, and in some cases only in a subset of the cells. The occurrence of mutations is a dynamic process, and thus it is of interest to observe also when monitoring treatment response,” explains Eivind Hovig.

There are a number of IT challenges with the project. First of all, Eivind Hovig mentions the output of sequencing, which necessitates storage solutions approaching petabyte scales. Another challenge is the raw data, which needs a data analysis stream to identify all the types of variation possible, in order to compare normal and tumor DNA and to provide quality information.
“This requires significant processing power to go with the data,” says Eivind Hovig. “Thirdly, as all of this data is sensitive in nature, the systems need to be properly secured. This is achieved through integration and collaborating with the national academic high-performance computing solutions, as well as the scientific computing group at the University of Oslo Center for IT. We are also working closely with data security officers at our institutions in order to insure that the solutions we devise are sustainable. Data sharing is another challenging concept in light of the security restrictions that we are working to alleviate.”

Eivind Hovig and his colleagues are contributing to develop the necessary infrastructure for taking the sequencing of cancer towards a clinical reality for the benefit of cancer patients.
“The main advantage we have in Norway is that the project represents the major academic institutions nationwide and thus may be able to enter into a fruitful dialog with the clinical systems, and serve as facilitators for the transition of treatment that is a likely result of the sequencing possibilities that are opening up now,” says Eivind Hovig.

Biobanks – the biological back-end of medicine
One area with perhaps the greatest data handling challenges is biobank management. With today’s fast sequencing methods and large amounts of data generated, biobanks are goldmines for researchers. They have for example provided us with better cancer treatments, diagnostic markers, HPV vaccine etc. However, the storage requirements for genomic information are huge and analysis requires both massive parallel computing infrastructure and data-intensive computing tools and services. Researchers need help to locate the most useful samples, receive data and see the full potential, otherwise the biobanks will be underused.

At, the Biobanking and Molecular Resource Infrastructure of Sweden, the largest investment that the Swedish Research Council has made so far within the area of medical infrastructure, the work of building a common national infrastructure for the storing and analysis of biobank samples has begun. Hosted by Karolinska Institutet, BBMRI collaborates with all medical faculties in Sweden.
“For the future handling of biobank data we need large national projects with a common structure. I think that foremost we need to have the will to do it together, to create quality registers and biobanks,” says Jan-Eric Litton, professor at Karolinska Institutet and Executive Manager at

Infrastructure and legal work takes time but Jan-Eric Litton says he can already see a facilitated use of biobanks in Sweden. In the beginning of 2013 all seven medical faculties/universities in Sweden signed a consortia agreement for a Biobank and Molecular Resource Infrastructure. “For the first time we have joined forces for a common infrastructure,” says Jan-Eric Litton. has for example established an Information Standards for Biobank Samples for blood and blood derivates, with the purpose of supporting the standardization of sample management processes for biobanking, utilizing the healthcare infrastructure. Another solution is MolMeth, a service for free protocols online, aiming to promote harmonization by allowing researchers to share high quality practices and form communities that serve to enhance collaboration between biobanks and end users. The service is free but authors are credited.

Jan-Eric Litton also mentions a project called BiobankCloud for scalable, secure storage of biobank data. The EU-funded project, coordinated together with The Royal Institute of Technology in Sweden, started in November last year and will continue for three years. A cloud computing platform-as-a-service (PaaS) for biobanking that supports highly-available, secure storage and analysis of petabytes of sensitive data will be developed. The PaaS framework will be designed to run primarily on private cloud platforms. The stack will provide biobanks with platform services for the storage and analysis of sequence data, as well as the interconnection of biobanks for data sharing.

A common biobank structure in the Nordic region
The work of a common biobank infrastructure for all Nordic countries is also underway.
“All Nordic countries have a long history of large-scale biobank-based research and have similar systems with population based health data registries, which can all be linked using unique personal identification numbers,” says Jan-Eric Litton.

The network BBMRI Nordichas for example established a joint biobank-based study; a pilot project on colon cancer. The project has come halfway and aims to prove that it is possible on a Nordic level to assemble a similar, jointly validated study base with samples and data that could be used for collaborative Nordic research. The project will result in joint Nordic sampling handling reports and quality control and assurance protocols.

Having initiatives, collaborations, projects, infrastructures, information and computing skills, will and efforts like these is essential for successful life science research and future healthcare.