InfoTechTarget and Informa Tech's Digital Businesses Combine.

Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.

SORT IT: Build a PDF Processor

Presented by

Adam Jelley, Data Scientist

About this talk

As the world moves ever more digital, many businesses have a need for automated processing of documents. In this webinar, we’ll walk through an example end-to-end project for extracting, classifying and summarising PDF documents, and show how you can use a combination of cutting-edge open-source technologies, together with your own in-house expertise and requirements, to build you own PDF Processor with Dataiku DSS. PDF2Image (https://pypi.org/project/pdf2image/) Tesseract OCR (https://tesseract-ocr.github.io/tessdoc/Home.html) Pytesseract (https://pypi.org/project/pytesseract/#description) The Plugin Store (https://www.dataiku.com/product/plugins/) The Text Summarisation Plugin (https://www.dataiku.com/product/plugins/text-summarization/) Sci-kit Learn 20 Newsgroups Dataset (https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html#) "Surprising Findings in Document Classification" (https://towardsdatascience.com/surprising-findings-in-document-classification-7a79e30f1666) Webinar (tomorrow): How to Reduce Data Labelling Costs (+ Increase Data Quality) With Active Learning (https://www.brighttalk.com/webcast/17108/394533?utm_campaign=channel-feed&utm_source=brighttalk-portal&utm_medium=web)
Dataiku

Dataiku

59773 subscribers285 talks
Everyday AI, Extraordinary People
Dataiku is the Universal AI Platform, uniting the technology, teams, and operations needed for companies to build intelligence into their daily operations, from modern analytics to generative AI. Together, they design, develop and deploy new AI capabilities, at all scales and in all industries. Organizations that use Dataiku enable their people to be extraordinary, creating the AI that will power their company into the future. More than 700 companies worldwide use Dataiku, driving diverse use cases from predictive maintenance and supply chain optimization, to quality control in precision engineering, to marketing optimization, generative AI customer proof, and everything in between.
Related topics