Data Analytics and Collaborations

Digital Research Platform (DRP) Designed by the Research Informatics Group. We make research more agile, efficient, collaborative, and secure, while helping us to provide higher quality data to researchers. Now it’s easier to make scientific discoveries!

The DRP is a virtual research platform where we can centralize enterprise research data in an environment that facilitates bringing analytics to data and that cuts down on the redundancy of enterprise data warehouses. It also allows us to provide industry standard data science tools to all our researchers, enables scaling of resources when needed, and allows for consistent security and governance.The DRP is built on a hybrid infrastructure that relies heavily on Microsoft Azure and Databricks, but it includes other elements as well (and in the future will incorporate an electronic lab notebook).

There are many ways the DRP can benefit your research. The following are a few examples:

Big data: Most data in the DRP are stored using Spark, a distributed computing platform that can handle very large data efficiently. If you have big datasets that your computer might struggle to load, they can be brought into the DRP for analysis.

Machine learning and AI: the DRP uses Databricks, a data science platform that can enable you to use tools such as Pytorch, Tensorflow, and MLFlow, without struggling to get them installed properly. They just work.

Scalable compute: Using the DRP, we can provide the computational power you need for a study, without purchasing it in advance or needing to maintain it permanently. Compute is provided on demand. We can provide a letter of support for your grant describing the infrastructure.

Privacy: Sensitive data can be stored in the DRP and easily managed in a HIPAA-compliant way. More granular control is available in the DRP then using folders, because access can be controlled down to the level of columns in the data.

Consistency: Instead of sending datasets to an analyst, we can grant access to your dataset in an environment that allows analysis. Not only does this help to protect privacy, it means you can make sure everyone is working off the same version of the data.

Availability: Your data and analysis are available from wherever you can login, so you don’t have to worry if your computer goes down, or you need to unexpectedly work from another machine.

You Might Also Like