Our client provides services to convert print ads to online ads to most of the top retailers in USA.
Among other flex based ads processing tools, a rules-based extraction system was created in which user-specified rules are used to ‘understand’ and automatically extract and parse unstructured data in print ads files and save that information in separate fields in a database, for subsequent access by multiple applications. This system also extracts images and creates hotspots, thus reducing the workload of the graphics team.
Streamline the current cumbersome and manual effort required in the extraction of data and images from pdf files and creation of hotspots, with, a semi-automated procedure supported by this Radicle developed application.
A proof of concept was created in an initial 3-week analysis phase. Based on the understanding from this 3-week effort, a rules-based extraction process was employed in which user-specified rules were used to “understand” and automatically extract and parse unstructured data in pdf files and save that information in separate fields in a database, for subsequent access by multiple applications.
This process also extracts images and creates hotspots, thus reducing the workload of the graphics team. Further, to improve system performance, the application uses low-resolution pdf for the client front-end application. The original high-quality images at the server backend are then extracted from high-resolution pdf files using the image information gathered during the extraction process.
- Rule Engine – Users can create rules based on text attributes, keywords, and patterns for different listing fields and specify formatting to parsed text.
- Text Extraction – Users can apply for retailer or promotion specific rules to extract, parse and format text and Images for a listing from pdf files.
- As part of the listing extraction process, the user can choose to automatically create hotspots for a listing on a pdf page. These hot spots can be attached to one or more advertisement listings.
- Image Extract – Users can extract listing images from a low-resolution pdf and tie one of the images to a listing. The application extracts a high-resolution image from a high-resolution pdf on the backend through a batch process based on the image information gathered at the front end. It also provides simple workflow capability to graphics users to correct the images.
- System Integration – Integrated with existing systems so that application could be in place by the 2006 holiday season without requiring major changes and training to other existing systems.
DotNet, C#, SQL Server and a 3rd party pdf extraction library.