CSV stands for Comma Separated Values. It is used to port files not only across different packages, but also across entirely different systems. Each value is separated by a comma, and in a spreadsheet, a file such as this...
182.45,65,765.34
243,37.6,13
76.3,54.65,13.4
...when imported in to a spreadsheet, comes out like this...
What the comma does is mark the end of the values so when the spreadsheet, "imports," the CSV file it knows that when it sees a comma it takes that as meaning a new column. Of course, the countries that use a "," as their decimal point end up with problems, as would a number which was formatted for human readable, like, "1,000" instead of "1000" ... but to be honest most inter-machine transfers don't need human readable and when CSV filters are written for countries that use a comma as the decimal value point, they just choose a different separator.
Now, you'll notice that in our log we have lines like this...
66.249.67.154 - - [27/Nov/2008:04:33:47 -0800] "GET /photos/displayimage.php?album=11&pos=10 HTTP/1.1" 200 14747 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
... which is a Google bot, so I'm not going to hide their IP addy.
Now look at that line ... you should see that the "space" is actually our separator. So we have to open the file as a CSV file and tell the system to use space as the separator, which is quite easy. We just point at the file and select "CSV" from the import filter list.
Then it asks us some more details so we untick the comma and put a tick by the space.
Now that we have it within the spreadsheet, we can do a full sheet sort by various columns to get some order in to the results. The first thing I do is a sort by the referrer page (usually column I for me) and that quickly enables me to isolate the rows that came from people viewing my own web sites.
Within a few minutes, I've taken 6,138 mishmash rows down to 2619 ordered rows. That says that only a little over half of my traffic is easily regocnisable as legitimate browsing. Scary, isn't it! The majority of those are listing "-" under their source, so I can't tell where they've come from. 2,516 of them, to be precise. Unsocial bastards.
However, now that you've got the data in this format, it makes it easier for you to sort the columns by various sections and spot trends. There are other techniques you can use for whatever is left, such as lines like these...
http://209.85.135.104/translate_c?hl=cs&sl=en&u=http://shutter-fug.blogspot.com/
2007/07/sensor-cleaning.html&prev=/search%3Fq%3DCLEAN%2Bsensor%2Bcleaning%2B
for%2Bpentax%26hl%3Dcs%26lr%3D%26sa%3DG&usg=ALkJrhheBcC6GQh-Zkkx1OEp
W7bYSYvFbg
... someone used Google to translate a page of this very blog, in to what appears to be Checkoslovakian!
After you've gotten used to looking at the various pages of information you should be able to spot trends, like if you sort by IP address and see it constantly referring to a password protected page, trying to log on, or something.
The long story short is that once you've got used to handling the CSV import filter, sorting the columns gives you the ability to look through the data quickly to spot trends in your log files.



Post a Comment