Digitization labotary platform – Arnau Dunjó Workspace

Context

It’s frequent that in a laboratory there is a great variety of instruments that cover the complexity of the analyzes to be carried out. The big brands (Agilent, Waters, Qiagen …) have developed very expensive programs that allow process the data generated by their instruments. In the best case, they also allow communication between the instruments (as long as they are all of the same brand), so the results obtained in one of them can be used to fill in part of the parameters that the other needs to start his analysis.

Generally, this type of software is very closed and the import / export possibilities are quite limited, so transferring data between different softwares became a problem that generated more inconvenience than advantage. At best, the brands offered complementary applications that allowed the transfer reliably, but they were only valid solutions with their products and quite expensive.

Goals

It was intended to reach the following objectives:

That would allow the transfer of the results in a homogeneous way for different softwares (both laboratory and other areas).
Avoid transcription errors.
Reduce time on repetitive tasks for laboratory staff.
Traceability of the entered results.
Proof of concept. If we obtained the expected results, it could serve as a starting point for other projects.

Requirements

Aside from meeting the goals, he had to do it appropriately:

He had to be able to ingest all the information generated.
Flexible to be able to work with the different instruments of the company (not exclusively from the laboratory) and on those for which no driver software was available.
Easily scalable both horizontally and vertically.
That will guarantee the maximum traceability and transparency of the data.
That it consisted of an open solution, which did not imply “tying up” with a specific supplier
The results were validated before generating the export file, once entered in the ERP, the analysis had to be closed and validated.

Implementation

Finally, whe chose implement a CDF (Cloudera Data Flow), which was based on the following main components:

Distributed file system (Apache Hadoop).
ETL (Apache NiFi).
Distributed data transmission / publication platform (Apache Kafka).

On the ERP side, it was decided to receive the information, through a web service that requires the minimum possible data, since it would be used to ingest data from different software, from different companies in the group with different mechanisms and degrees of integration with the ERP:

The parameters to identify “where” must write the result (inspection plan identifier, operation code and sample identifier).
The result in itself.
Meta-data: User, instrument identifier and date / time.

The majority of the software was configured in the laboratory to export in a very simple report (text file) with the most standard structures possible. The platform monitored specific folders on the server where reports were, looking for new files every few minutes. Depending on the location of these files, the platform expected a specific structure and if it matched it would extract the data and generate the request to the ERP webservice.

To ensure traceability, each line in the file was configured to be treated separately so that if one line / result failed, the rest of the file would continue to be processed. For each line, the request made in SAP and the response obtained were saved on distributed file system. Likewise, in case of error, an e-mail was sent to different distribution lists, depending if it was an “operational” error (poorly formatted file, lack of data …) or a technical error (dropped services, communication error, unexpected errors …). In these last cases, a flow was configured to automatically retry it for a certain time. Also at the machine level, the entire platform was monitored by Nagios.

My contribution

First, I was in charge of looking for platforms that apparently met the established requirements, of doing a small study with the advantages, disadvantages and the cases in which each of them is usually implemented. The recommended solution included a description of the components that could form it and a diagram of the interaction between them.

Once the recommended option was validated with the team, I was in charge of providing a list of possible providers and my opinion (based on the information found online) of each of them.

Later I wrote the RFI, the RFP and the development of the requirements (Functional, Architecture, Security, Training …).

Once the project started, my responsibility consisted of monitoring the implementation and being the nexus with the chosen provider.

Finally, when the project was finishing, I was in charge of writing the Project Administration SOP, designing the tests that would be carried out to validate the implementation, and providing the evidence in the QA department.

Conclusions

Despite the difficulties discussed above, the project was finally a success. Not only was the transfer of the results achieved in compliance with the established requirements, but it also allowed having the platform mounted and validating the capabilities and operation. In this way the bases were established.

Possible improvements

As I mentioned earlier, it is necessary to enter some parameters in the software that manages the instrument so that they later appear in the report that the CDF platform reads. These parameters will later be sent to the ERP so that it knows where to place the data that we are providing. These parameters are obtained from the ERP.

To improve the user experience, a solution could be implemented that would allow an automatic introduction of these parameters in the instrument software. A possible solution would be that in the report that accompanies the sample to be tested, there is a BIDI code that contains these parameters and that can be captured by a reader.