Parsing Text to database, php / curl (ID:6024)
Project Creator: |
webprop
FC Member For 5663 Days
Credits 20 Completed Proj. Num. 0 / 1 Total payment USD 0.00 Avg Daily Online 0.00 h (From 21/5/2007) Available on MSN/Skype No Last Login 8/25/2009 Peers Rating 0.00% ![]() ![]() ![]() |
---|---|
Budget: | 250 - 500 |
Created: | 8/24/2009 6:59:50 PM EST |
Bidding Ends: | 8/27/2009 6:59:50 PM EST ( Expired ) |
Development Cycle: | 2 Days |
Bid Count: | 7
|
Average Bid: | 455.71 |
Project Description:
I need the following completed right away to extract data which will be done in daily, weekly or monthly intervals. It needs to be entirely automated once I input the url for the web page into the mysql database. There are basically three types of pages to extract data from. A web page with no form. A web page with a form to get the desired data. A web page or pages that may require more than one form or may use AJAX or javascript to update menus used to fill the form. The first project will be data extraction with no form on the page. After this first project the next project will use form filling as well. Requirements: Linux, php5 minimum and mysql5 minimum. This script will be run as cron job. A db query will be done based upon the interval type like daily, weekly or monthly. After each url is fetched, cookies and cache/session must be deleted. (CURL??) I also need a separate script to pull table headers from page to help input field mapping when inputting url before the main script is run. How it runs: on server with linux/php/mysql/curl 1.Run Query - Fetch URL page & field mapping from query & parse text in tables to extract data according to field name mapping table and upload extracted data to database tables. 2.Update tables with last run date for extraction. 3.Create log entry for errors where data not extracted. Total of two scripts. Need this immediately. 1. Script to pull table headers from page and enter into database. 2. Script to parse table data and upload to mysql database. Variables will be used so that cron jobs will use the same script for different queries. Newly added descriptions: The script will do this: Script will need some way to assign variables, for example, if different database is required. 1st script: from url will determine # of tables and table headers (xpath etc) and save them to the database. 2nd script: After person has reviewed saved data from above script, will then parse tables and will insert data into another database where the field names from the previous script will map to new field names. For example, if the xpath from script one contained a table header with text of "Pts/Discount" it would insert into field in database "Points" when the extraction is run. I need to be able to edit script variables. |
|
Job Type | PHP |
Attached Files: | N/A |