«Technical Overview Abstract Over the past few decades, the amount of data generated in mass spectrometry laboratories has increased exponentially due ...»
Managing Large Amounts of
Mass Spectrometry Data Using
Agilent OpenLAB ECM
Over the past few decades, the amount of data generated in mass spectrometry
laboratories has increased exponentially due to the fact that newer instruments
are processing many more samples in the same amount of time and governments
have expanded their regulatory requirements for electronic data management in
certain industry sectors. The complexity and cost associated with data storage have also increased, adding to the challenges that laboratory managers and researchers face as they try to find a way to manage their data. OpenLAB ECM offers an effective solution that simplifies data management, saves time, and reduces costs while addressing data archival, data sharing, application integration, and regulatory requirements in the laboratory and around the globe. This technical overview describes key aspects of the OpenLAB ECM solution and provides a range of system recommendations that suit the varying data requirements of today’s mass spectrometry laboratories.
Introduction Over the past few decades, the amount of data generated each day in mass spectrometry laboratories has continually increased. Researchers and laboratory managers are constantly faced with the challenge of collecting and storing very large amounts of data on a daily basis as the result of an increase in samples processed each day and stringent regulatory requirements.
Increasing sample numbers Laboratories are constantly looking for instruments and systems that process more samples in less time to improve efficiency and productivity in discovery, development, and manufacturing. As a result, instrument manufacturers are developing high-speed analytical instruments and data systems which generate reliable data within shorter periods of time. These improvements, while increasing productivity, have also increased the amount of data generated each day; for example, today’s mass spectrometers easily generate several gigabytes (GB) of data every day.
Regulatory compliance In addition, some industries are required to adhere to regulations in their laboratories; for example, US FDA 21 CFR Part 11, EU Annex 11, and SFDA.
Regulatory compliance defines rules for how electronic records must be handled in the laboratory. It requires that the integrity of data is maintained, and that it is kept secure, and that it is traceable for much longer periods of time. It also requires that additional data be generated and maintained in the form of electronic signatures, audit trails, and instrument qualification documentation. These requirements have added to the overall need to manage large amounts of data in the laboratory.
Increasing data storage costs Data generation is also increasing with advances in technology: a singlequadrupole (SQ) mass spectrometer in a high-throughput laboratory can generate approximately 250, 5 MBfiles daily (1.25 GB/day); a high-end, triple-quadrupole (QQQ) LC/MS system, operating at high-throughput generates about 1000, 1 MB files daily (1 GB/day); and a quadrupole time-of-flight (Q-TOF) LC/MS system generates single files up to 20 MB in size (20 GB/day). This means that the data generated by running one Q-TOF at high-throughput for one day would fill four DVDs.
Such massive quantities of data require powerful workstations for processing and faster, more expensive disk drives for storage. A storage configuration with a 500 GB or 1 TB raid array of SCSI disks on such a workstation allows laboratories to keep only a few months’ worth of data. That is not nearly enough to comply with industry and government regulations. As a result, laboratories are investing in more efficient networking infrastructure and offline storage mechanisms that allow analysts to free up workstation space by moving data over the network to offline storage.
The Solution: Agilent OpenLAB ECM
The OpenLAB solution OpenLAB ECM is part of the Agilent OpenLAB Suite, a well-integrated solution for a multitude of data management needs in today’s scientific laboratories.
The OpenLAB Laboratory Software Suite includes the following components:
OpenLAB ECM, described here; the Business Process Management (BPM) add-on for OpenLAB ECM, which brings powerful process automation capabilities to improve the efficiency and productivity of laboratory operations and workflows;
the Intelligent Reporter add-on for OpenLAB ECM, a cross-sequence, crosstechnique, multivendor reporting system that brings you state of the art capabilities for reporting scientific data; and OpenLAB ELN, the well-integrated and highly adaptable Electronic Lab Notebook (ELN) that helps document and organize your experiments, while providing IP protection.
OpenLAB ECM OpenLAB Enterprise Content Manager (ECM) is a software solution that helps you make better decisions faster than ever. By providing a secure, central repository and rich content services, OpenLAB ECM allows you to create, manage, collaborate, archive, and re-use all of your business critical information with ease.
OpenLAB ECM manages raw data and human readable documents of any type, from any supplier, and its simple web-based interface drastically reduces the learning curve for new users. It is a highly scalable system for the scientific world which can start as a data management solution for a single workgroup, and easily scale into a multisite, multicontinent solution for the entire enterprise.
OpenLAB ECM comes with out of the box compatibility to leading storage solutions, from vendors such as NETAPP, EMC, IBM, and HP, which are based on Windows Shares (CIFS protocol). In addition, OpenLAB ECM has a published API that can be used to easily interface with other existing systems in the laboratory, such as a laboratory information management system (LIMS) or an enterprise resource planning (ERP) system.
Architecture OpenLAB ECM is built on very simple architectural principles. Files are stored on one or more external NAS devices and OpenLAB ECM indexes, organizes, and keeps track of the files. In addition, the system makes use of a database for information such as folder structures, links to files in storage locations, security configurations, metadata, and indexes. ECM has three main components as shown
in Figure 1:
• The ECM Web Application, which provides a user-interaction interface that displays a visualization of folders and files, and allows users to access the system using Internet Explorer.
• The File Transfer Service, which uploads files into the system and transfers them to an appropriate storage location. It also helps while retrieving files from OpenLAB ECM and while moving files between storage locations.
• The ECM Application Server, which is a file filtering service that scans through the uploaded file and filters key pieces of metadata. It extracts metadata from files based on their data type using multiple filters. OpenLAB ECM comes with filters for most popular data systems and exposes an SDK for extending its reach to other formats.
Figure 1. OpenLAB ECM high-level architecture.
Scalability and Availability The OpenLAB ECM architecture is such that it can scale all the way from an all-in-one OpenLAB ECM Workgroup server to a multiserver OpenLAB ECM Enterprise system. In OpenLAB ECM Workgroup edition installation (Figure 2), all components are deployed on a single server.
Figure 2. All-in-one solution for the small laboratory.
For the enterprise-level solution (Figure 3), ECM has the added ability to span components across multiple servers, increasing performance while providing redundancy at the same time. OpenLAB ECM Enterprise is capable of handling several million files.
Figure 3. Scalable architecture with components spanning multiple servers.
This architecture removes single points of failure in the system while bringing maximum availability. Figure 3, shows a single database; however, Oracle and SQL Server support clustering and replication mechanisms that provide maximum availability for the database component.
Support for Distributed Systems Another key capability in the OpenLAB ECM Enterprise Edition is its ability to optimally serve geographically distributed users (Figure 4). OpenLAB ECM can be used to create separate accounts for each location, each with local storage, web servers, file transfer servers, and application servers. Actual files are moved over the WAN only when a download or upload attempt is made across accounts;
local account operations do not require the files to be moved over the WAN.
Additionally, the file transfer server’s caching capabilities reduce network traffic even further. In a distributed installation, users that logon to different accounts through the same web server are still able to perform cross-account searches and retrieve files from other accounts.
Figure 4. Configuration for geographically-distributed users.
Archiving data from Agilent MassHunter to ECM Interactive archival MassHunter studies and batches can be archived into OpenLAB ECM directly from MassHunter Quant. Scientists can archive completed LC/MS studies and GC/MS batches using an action menu (Figure 5).
Figure 5. Archive into ECM with an action menu in MassHunter Quant.
This menu makes use of the ECM Send To tool by invoking it and passing the necessary parameters to it (Figure 6).
Figure 6. The OpenLAB ECM Send To tool.
In ECM Send to, Quant users select the desired destination location, choose MassHunter Study Profile, and hit the Upload button. The Send To tool will then process the study/batch folder and present a preview of what it will do next.
The preview dialog (Figure 7) shows which folders will be created and what files will be uploaded. Subfolders in the study get uploaded as SSIZIP files and files in the study folder get uploaded as is. If you do not want to upload a certain part of the study, you have the choice to deselect it from the preview dialog.
Figure 7. The OpenLAB ECM Send To tool preview dialog.
OpenLAB ECM Send To is a general ECM utility that is available on the ECM downloads page. It can be installed on any ECM client machine. This utility integrates into the Microsoft Windows Explorer Send To menu and provides a simple way to send any folder or file right from Windows Explorer. The Send To function provides several prebuilt profiles or configurations specific to various use cases and data systems. Users can create custom profiles as well. The Send To tool has two profiles specific to MassHunter: 04. Upload a MassHunter Study or Batch – Standard Profile and 05. Upload a MassHunter Data Folder – Standard Profile. The former is used from MassHunter Quant to archive studies and batches into ECM. The later can be used to archive individual data folders into ECM.
This ECM Send To tool can also be used to archive studies and batches directly from Windows Explorer (Figure 8). Following this approach you do not need to open the studies in MassHunter and you may select multiple studies at the same time and upload them into ECM.
Figure 8. The OpenLAB ECM Send To capability in Windows Explorer.
Automated archival Users who wish to avoid any user interaction and completely automate the archiving of data from MassHunter workstations into ECM can make use of ECM Scheduler (Figure 9). ECM Scheduler is also an ECM add-in that can be installed from the ECM downloads page and configured on MassHunter workstations.
The scheduler service on the MassHunter workstation needs to be configured to look into the MassHunter data folder. Once configured, the scheduler monitors the MassHunter folder and will upload any new files that are saved there.
Figure 9. Scheduler options for uploading MassHunter data.
Figure 10. Mapping MassHunter folders to ECM using Winmapper.
The general add-in should be used for uploading MassHunter data. Users can use the add-in to specify the path as well as the file/folder specifications (Figure 10), and a schedule when the folders should be scanned. A typical schedule for sending data to ECM is shown in Figure 11.
Figure 11. Typical schedule for sending data to ECM.
In an alternate configuration, the scheduler can be installed on a remote machine (Figure 12). In this mode, the scheduler polls the MassHunter workstation for new data. In this configuration, ECM software does not need to be installed on the MassHunter workstation, just the root data folder needs to be shared so that the scheduler can access the data. Note that this configuration has slightly lower performance, since there is an extra hop on the network for the MassHunter files before they reach ECM.
Figure 12. ECM Scheduler polling multiple MassHunter workstations.
Indexing and Searching Mass Hunter Data A file that is stored on a local or a network drive is typically found by browsing through folders or by searching for a file name or modified date. However, when you need to look for a study or a batch, and you do not know the filenames or other general information about the file, it can take a much longer time to find your files. OpenLAB ECM looks into the MassHunter files and extracts key scientific metadata and then tags and indexes this information for searching (Figure 13).
As a result, with OpenLAB ECM users can find their data quickly and easily with sample specific information such as sample id, operator name, or compound contained in the MassHunter files. The MassHunter specific keys that ECM is capable of extracting are shown in Table 1.
Figure 13. The application server extracts metadata as files are stored into ECM.
Table 1. MassHunter filter keys MassHunter Filter Keys
OpenLAB ECM has an easy-to-use query tool that can be used to perform simple searches on one or more general terms and conditions (Figure 14). This query tool also allows users to perform complex queries based on specific keys (Figure 15).
Figure 14. Quick search using a compound name and operator name.