I work at a medical research organization, and we've got a data lake's worth of unlabeled scholarly articles, old and new.
How can I quickly organize these documents, so that I can start getting value out of all this information?
Most of my documents are long PDFs, which I can upload to Pienso Ingest in a ZIP folder. And if I choose to break the longer documents into smaller 'chunks,' I can do that systematically using Ingest's 'Split' function.
I want to create an ontology that encompasses all the topics in my database, so I'm going to use my ingested data to train a Pienso Fingerprint model.
Once I've finalized my Fingerprint model, I can apply it to a new, unlabeled data set using Pienso Annotate.
Deep Learning models are the best for hard-labeling new documents. So I’ll spin up a deep learning model from the labeled training data I just created using Annotate.
Next, I'll create a deployment that includes my Fingerprint model, my Deep Learning model, and a Named Entity Recognition model provided by Pienso. Now, each new document that hits this deployment will be scored by all three models at once.
Finally, I'll use the Pienso Dashboard associated with my this deployment to watch in real time as new documents get scored by my models.
From here, I can assess: Which topics are trending up or down? How do these trends compare to historic baselines?
As articles are hard-labeled by my deployed Deep Learning model, they can then be fed back into our external data store via Pienso’s restful API. Now I've got a data base filled with tagged, easily searchable articles.