Organizing a scholarly database

Context

I work at a medical research organization, and we've got a data lake's worth of unlabeled scholarly articles, old and new.


How can I quickly organize these documents, so that I can start getting value out of all this information?

  1. 01. Ingest

    Most of my documents are long PDFs, which I can upload to Pienso Ingest in a ZIP folder. And if I choose to break the longer documents into smaller 'chunks,' I can do that systematically using Ingest's 'Split' function.

  2. 02. Fingerprinting

    I want to create an ontology that encompasses all the topics in my database, so I'm going to use my ingested data to train a Pienso Fingerprint model.

  3. 03. Annotate

    Once I've finalized my Fingerprint model, I can apply it to a new, unlabeled data set using Pienso Annotate.

  4. 04. Deep Learning

    Deep Learning models are the best for hard-labeling new documents. So I’ll spin up a deep learning model from the labeled training data I just created using Annotate.

  5. 05. Deploy

    Next, I'll create a deployment that includes my Fingerprint model, my Deep Learning model, and a Named Entity Recognition model provided by Pienso. Now, each new document that hits this deployment will be scored by all three models at once.

  6. 06. Dashboard

    Finally, I'll use the Pienso Dashboard associated with my this deployment to watch in real time as new documents get scored by my models.

    From here, I can assess: Which topics are trending up or down? How do these trends compare to historic baselines?

  7. 07. Outcomes

    As articles are hard-labeled by my deployed Deep Learning model, they can then be fed back into our external data store via Pienso’s restful API. Now I've got a data base filled with tagged, easily searchable articles.

01

Ingest

01 Ingest

Pienso Ingest lets you prepare your raw text for use as training data, whether it's structured or unstructured.

View

View

02

Explore

02 Explore

Pienso Explore lets you search a data set for documents that match your interest.

View

View

03

Annotate

03 Annotate

Pienso Annotate empowers you to convert your data sets from unstructured to structured — without manually labeling a single document.

View

View

04

Fingerprinting

04 Fingerprinting

Pienso Fingerprinting is an interactive semi-supervised approach to training generative models.

View

View

05

Deep Learning

05 Deep Learning

Pienso Deep Learning uses labeled training data to create a powerfully accurate topic model.

View

View

06

Analysis

06 Analysis

Pienso Analysis lets you use a finalized Fingerprint or Deep Learning model to investigate a data set through an array of granular views.

View

View

07

Dashboard

07 Dashboard

Pienso Dashboard lets you monitor the results of your Fingerprint or Deep Learning models in real-time as they score new documents as part of your Pienso API Deployment.

View

View

08

Deploy

08 Deploy

Pienso Deploy is a UI-driven API deployment manager. Use it to put your models on an API production slot that you can send your live data to for scoring.

View

View