Database Guru Needed (ID:2722)
Project Creator: |
interlincx
FC Member For 6559 Days
Credits 20 Completed Proj. Num. 0 / 1 Total payment USD Avg Daily Online 0.00 h (From 21/5/2007) Available on MSN/Skype No Last Login 8/2/2007 Peers Rating 0.00% ![]() ![]() ![]() |
---|---|
Budget: | 500 - 1000 |
Created: | 7/23/2007 12:17:55 AM EST |
Bidding Ends: | 8/7/2007 12:17:55 AM EST ( Expired ) |
Development Cycle: | 30 Days |
Bid Count: | 12
|
Average Bid: | 737.42 |
Project Description:
We are looking for someone who is familiar working in a LAMP environment, and working with MySQL. We have built a reporting engine that tracks web statistics. We want to figure out how to better line up our stats, between our partners. We are looking for someone who understands working with reporting engines, and experience with Google would help. Newly added descriptions: When we purchase traffic (clicks) from Google, we can either buy traffic from their search engine which is AdWords, or through the ads they put on websites, which is AdSense. We tend to buy a lot of traffic from their AdSense product. One of the limitating factors with AdSense traffic is that Google doesn't report clicks sold to us on an individual keyword basis. You only get Group level reporting. It would be helpful to have previous experience working with Google, as you will have a better foundation to understanding the project. When buying traffic from Google, you have: 1 Overall Account = which is the main account, login + password etc. 2 Campains = when you login you can have up to 26 ad campaigns, you can use this for different websites, products, etc... or how ever you'd like to organize your Account. 3 AdGroups = these are used to bundle together different keywords you buy, and an account will support about 100-300 AdGroups per campaign. 4 Keyword = when you buy traffic for a specific search term, i.e. dog collar. (We aren't 100% the best threshold of keywords per AdGroup.) Google let's you have about 1,000.. however we understand that it is better to have tight bundles. Like 5-50 keywords per AdGroup. You can see if we want to purchase traffic on hundreds of thousands of keywords that organizing our account becomes a very important factor. So if we purchase traffic on the keyword "dog collar" and it has it's own AdGroup, we can get actual clicks, reported from Google. Let's say Google tells us in their login that they sold us 10 clicks at $1 per click, for $10. Now if we had several keywords that we were buying. i.e. Dog Collar, Dog Leash, Dog Leashes etc.. and had all of them in 1 AdGroup, Google would only tell us top-line stats for the AdGroup. If we only had 1 keyword in that AdGroup then we'd get exact stats, and that is what we've been doing up to now. Our objective is to be able to rely on our own internal calculations. Thus for the AdGroup we purchase 100 clicks at an average of $1 per click at $100. If we had all that traffic coming in for Dog Leash, Dog Collar, Dog Leashes.. then how do we break down what each keyword did. That is why us decoding and lining up our numbers with Google is so important. Let's say that out of the 100 clicks 80 came from Dog Collar. We cannot get this data from Google as it only shows the top-line numbers per the AdGroup. We then use our own internal statistics that we record in our database to get these numbers. The big issue is that we aren't counting accurately. Since we've been buying 1 Keyword per AdGroup we can see that our internal statistics aren't lining up. We want to figure out how to get these numbers more accurate. This becomes very important as we are making micro decisions. Thus if we buy 4 clicks at $1 (Total Cost = $4) and we have one action per that which we get $4 then we break even on the transaction. If we over-counted at 5 clicks, when Google sells us 4, then we think the cost was $5 and if we made $4 we think we are losing and will want to change our bid on the keyword or even cut it. If we only counted 3 clicks purchased then we'd think we've made $1 (25% Profit) which in fact we just broke even. If the underlying data is inaccurate, then we will be making bad decisions. Getting our numbers to line up becomes very critical when we want to make such accurate decisions and also be able to work on a more micro decision basis. Over time we'd like to write in behavioral algorithms that allow use to have the system optimize for profitibilty. Having a background in mathematics will be very helpful. I hope this helps better describe our objective and why we want to make sure that all the impressions, clicks, etc are as accurate as possible in our system. I do know that with technology, it is never exact, however we want to minimize that window as much as possible. The same goes for clicks. If we count 4 clicks and our partner tells us they only pay us on 3.. the same issue can mess with us on the backend. So our goal with is to dive into the details and find a better way to make sure we are as accurate as we can be. My understanding is that you will need to look at the data we have recorded in the database to start and start to compare that to the days in the google login... This can be done on an AdGroup basis since the majority of what we do only has 1 keyword in the AdGoup. From there once you start seeing the discrepancies, then we can start to look at why these things are happening, and find a better solution to close the gap on the discrepancy. The basic idea of what you're going to do is attempt to match up internal stats with 2 external sources. The flow of the user essentially works like this: Click on a google ad ----> Land on adfox.net (our page) ----> click on an adfox ad ----> go to client page So the two external reporting sources are Google on the incoming, and our backend partners on the outgoing, with our internal reporting in the middle. The problem at hand is that our internal data doesnt seem to match up to the external data. We need you to look into this and attempt to develop a metric by which all data matches up, or gets within 5% accuracy. The matches we need are: Google Clicks = Internal Impressions Internal Clicks = Backend Clicks Don't worry about revenue for now. Basic tech instructions below. Please be sure you are capable of performing this project, before applying. Right click on the top two links in the first page and choose "save link as" or whatever is the equivalent in your browser. One is the table creation code (mysql 5), and the other is a slice of the stats database. Once you have created the table using the first file, you can use the standard mysql command "load data infile" to insert the rows from the second file. The linked csv files are the raw backend data. I have included half of May, let me know if you need the second half. The reports you see under the "All Keywords" tab have been generated from the raw stats file using this select (with a bit of post-select organization) The columns in our "site_stats" table are as follows: tracking_id - This is the id used to link data to the backend partner data. When attempting to make sense of backend data, keep in mind that some keywords are grouped under one tracking id. This is shown in the internal reports. access_date - the date of the action browser - the browser of the user performing the action os - the os of the user performing the action referer - the referer url (where the click / impression came from). ip - the users ip type - the type of action performed. The various possibilities are: B = bot/spider V = impression R = refresh (impression) C = click SELECT tracking_id, type, count(distinct ip_address), count(*), sum(value), sum(value_estimate), keyword FROM site_stats WHERE access_date >= "20070601000000" and access_date <= "20070601235959" AND (browser like "%Mozilla%" or browser not like "%bot%") GROUP BY keyword, type A few possible sources of the discrepancy between google numbers and internal numbers is the fact that they may not count bots as a click, while we do log it as an impression. Additionally, google may not count multiple clicks from the same IP as a valid click, while we are still logging it. |
|
Job Type | PHP |
Attached Files: | N/A |