Working on RSS feeds and fetching content from other websites.

TransAmDan

Forum Admin
Staff member
As you have probably seen we have an RSS feeds section. The website reads files provided by other website for content, generates threads on them. However the data is gets is just a single one liner, wouldn't it be nice to have a paragraph and image, just to entice you more into the article, there will be a link to the original article too.

So how do we go about doing this, well I have had it running on the website some time ago, but it was hard to modify. Basically it will follow the link, read the website and grab snippets of info. Stored in a database, it generate an RSS file that the forum then reads in to create a thread. So basically a bit of software i write sites in the middle fetching and filtering info for our website.

My new version of this will have an online admin section so I can add/remove sources easier, and adapt what it searched for on the pages. I have written loads of things in classic ASP, and I tend to use MSAccess databases with an ODBC connection via a DSN. This works fine on a windows server which I have. but eventually want to move certainly resources to the Linux website, the one this forum is on. So before I start the database, should I go the MySQL route? or MSAccess (this is how the database is stored for the forum), basically there are two main sources for databases, MSAccess or MySQL. the MySQL is probably the better one of the two.

I want to be able to develop it on my PC, and then test on the web server. I don't want to test/develop on the webserver, want to keep it all local to begin with before going online.

As this website is on a Linux server, and ASP is Microsoft, we can run ASP on the Linux server as there is an option for this, so I will need to run further testing on this. I have downloaded and installed MySQL server on windows so I can use that for testing.

Can Classic ASP connect to MySQL, well yes, ASP connection to MySQL

Although I have spent many hours last night, and tried various ways of doing it, but never got it working. I will keep trying. As MySQL would be nice if it's working, if all else fails I will need to do it on the Windows Server as I usually do.

<%
Dim oConn, oRs
Dim qry, connectstr
Dim db_name, db_username, db_userpassword
Dim db_server

db_server = "mysql.secureserver.net"
db_name = "your_dbusername"
db_username = "your_dbusername"
db_userpassword = "your_dbpassword"
fieldname = "your_field"
tablename = "your_table"

connectstr = "Driver={MySQL ODBC 3.51 Driver};SERVER=" & db_server & ";DATABASE=" & db_name & ";UID=" & db_username & ";PWD=" & db_userpassword

Set oConn = Server.CreateObject("ADODB.Connection")
oConn.Open connectstr

qry = "SELECT * FROM " & tablename

Set oRS = oConn.Execute(qry)

if not oRS.EOF then
while not oRS.EOF
response.write ucase(fieldname) & ": " & oRs.Fields(fieldname) & "
"
oRS.movenext
wend
oRS.close
end if

Set oRs = nothing
Set oConn = nothing

%>

Something like the above should do it, although the MySQL driver isn't playing ball. Although tried it thouh a Dreamweaver connection, and although the various MySQL connections were listed, none of them actually talked to the database. Perhaps it's an issue with the way MySQL is on my testing PC.

Anyway, this is what I am currently working on. :p
 
Still working on this one.
I have the MySQL testing server, I have ODBC drivers. Can I get ASP to talk to it.... no.... there must be a way, and perhaps its a configuration in MySQL server to allow incoming connections, but its on the local machine and username and passwords created. i thought i was getting somewhere.

My other reason for visiting this route is to get rid of my Windows web server if I can do it all on the Linux one. So this was also a test to see if it was possible. I guess there is a reason why there are Windows web servers and Linux ones, if both are good at their areas of work.

So at the moment I'm tempted to go down the ASP with MS Access database on the Windows server, which is a route I am familiar with. :)
 
Still working on this one. I was going down the old classic route of ASP and Access and thought I would just try the MySQL again. Creating a DSN, loaded the ODBC application to configure this, and MySQL was on this list, however open ODBC32.exe which contains the DSN 's I gave access too and I don't have the same options. It would seem 64bit and 32bit applications are creating hassle as one not playing ball.
Okay so tomorrow I may go further down the MySQL route, I checked there is an ODBC driver for the webserver so there could ve a chance it could work.
 
Good luck with it Dan ;)
 
I think I can get the SQL server connection to work on this testing computer now. So need to think ahead of other features i would require, as if my code is running on Linux and not microsoft what add-ons do I require and will the function.
Well A couple of features I usually used is reading the contents of other web pages to strip out the info. This is a Microsoft call
set xmlhttp = CreateObject("MSXML2.ServerXMLHTTP")
however this wont work on Linux, although probably an equivalent. although would take extra time to change my existing code to use an equivalent feature.
Another feature i use is called ASPJpeg, this fetches an image and can resize it. There is an equivalent in Linux, but would probably cause further headaches.
So that's a couple more points to the choice of keeing the Windows Server or not.
I also made a rather large website on the Windows Server for an estate agent. So if that cant move to the Linux server too, then I've gotta keep the windows server.

So, if I am sticking with Windows server, I have a choice of MySQL or MSAccess....

I have the database is MSAccess format, and created a DSN connection to it in ODBC32.
Dreamweaver can see the DSN, however when I run the website, a connection to the DSN cant be made.
error '80004005'
So it would appear IIS (Internet Information Services, basically a webserver) is having trouble with the connection.

So that is my mission at the mo.
 
This mod is coming along well.

I have created the database in MySQL, and writing code in PHP. As I'm developing I'm making subtle changed to the database, some for logging, some to make it easier to adjust for various formats of pages to where it is fetching the data from.
The page fetch is working well. Next it to fetch the image and store locally, give it a name which will be the title of the article but spaces changed to a valid file name character.
Once that working well, then need to create something to fetch the links to articles. This is almost done with the same function as the article fetch.

Keeping me busy and brain active.
Once place to fetch article from is Automotive Articles - SAE International nice and technical, full of useful information that is vehicle related. Another place is Automobile - New Car Drives, Automotive News, Exclusive Photos , and LS1Tech.com - GM LS Performance Forum Also Classic American has some good news articles:- - Classic American Magazine

There are many more too. If you find a website with good articles on it, let me know. I can add it to the list. the website will eventually visit the page, take a snippet of the article and create a forum thread for it. With a link to the full article....
 
Last edited:
Spend many days on this over the long weekend (Mondays and Fridays off work) so got a good many hours in on this.
The code no reads a webpage, recognises links to articles, and stores these links in the database.
It also reads the article page, picks out the title, who wrote the article, a link to the photo, and content. This is currently stored in RAM ready to write to the database, however I believe certainly characters in the test is stopping it writing to the database. There are work around for this. When i write down what i have done it doesn't seem a lot. there are some good solid functions I've written that can be reused.

Looking forward to getting this going. there are some good articles out there that make interesting reading. Its nice to have a snippet of them on here, as could make some good comments/opinions on newly released cars, or american cars for sale.

Its a shame it wont increase my post count though. Don't think I will make 10k before xmas.
 
Still working on this one, got many articles written into the database where they have been fetched remotely. This is all on the local computer, so eventually I will move this to the website.

The part I'm working on at the moment is fetching the image from a website storing locally. It works in a single test, just writing code to fetch all that haven't been fetched before.

Wrote some code earlier this present the data in XML format, this is perfect for the forum to read and add the info to a forum thread. There will be a lot of articles in one go, perhaps 100. After that maybe only a couple per day.
 
Well its running, serving data from the database on the web server, images are on the renegades website. links for the source of info to. Like to the Top Gear website where the article was written, or where its the section where cars are for sale then links to their website.
Not fully automated yet, I manually run the pages to check results, soon this will be on a timer to run themselves.

All looking great so far, maybe some minor formatting tweaks, but overall happy with it all. Taken a long time to get to this stage. The rss_auto_poster user account will soon have more posts than me (although excluded in the stats we see at the bottom of the main forum)
 
Seems to be working, a few more sources to add, but it is pulling in a main page, scanning for article links, then gathering information from those pages including an image.

Some interesting reading, all car related :)
 
Disabled a day or so ago. It's fetching data, but not storing, perhaps database permissions or file access due to moving servers, will look at this and re-enable. There was some interesting articles and cars for sale popping up in there.
 
Been looking at this all evening. Debugging many areas. It's getting stored as files and in the database correctly. However the generation of the XML file is adding <head/> near the top for the fun of it just to bugger things up.
 
Been dabbling in this again. The glitch is caused on the new server, which is stopping the feeds coming into a thread.

So the list of processes are
  • Fetch data from URL to extract links of articles, add link to database is doesn't already exist
  • Fetch page data from the link we just extracted, save page locally
  • Examine page and extract info, like title, contents, and URL for image
  • Fetch image and store local

All of the above is working perfectly.

Then when a datafeed is requested from the automatic RSS poster, it want to read an XML file, so I have written an XML export, so what a page is called it displays in XML. This worked fine before

However at the top of the page I now get

<?xml version="1.0" encoding="UTF-8"?>
<head/><rss>
<channel>
<item>
<articleid>1743</articleid>
<feedid>2</feedid>

The <head/> should not be there, it is confusing the XML reader(part of vbulletin) so how did it get there? Well its not in the source code. I have even written it as plain text, but when it gets passed though the system to be displayed as a page, it somehow gets added. There are slight variations of PHP versions between my testing server and the webserver.

I have an idea, and it will certainly make things more efficient too.

The articles are fetched and stored in the database to be exported in XML. However as they are in the database, and my code has access to the vbulletin database tables. Therefore I can write code to generate a thread and post the content. This will work out more efficient too. Just have to get the code right, with the speed of the server, i could create thousands of threads per second. I wont put any delete facility in my code. All will be fine. :)
 
Made good progress on this last night and today. Automatically creates a thread and post entry. Updates counters. Limited it to 10 posts per hour as there are around 200 in the system already fetched.
One thing I have noticed is they don't appear in 'What's New' button at the top. This must be another database table I need to update.
 
Made good progress on this last night and today. Automatically creates a thread and post entry. Updates counters. Limited it to 10 posts per hour as there are around 200 in the system already fetched.
One thing I have noticed is they don't appear in 'What's New' button at the top. This must be another database table I need to update.

Dan, do you mean the "new posts" link up top? If so it does show them.
 
Ah yes, so it does. That's good then. :)

There is another button on the main menu 'Whats New' I think that one is called activity stream. It used to also show all new posts in there too.
 
I found the table that controls that bit. Called Activity stream. I have most of the info for it apart from ContentID, I have other ID, but they not relevant. I won't drive into this further at this time. Could be opening a can of wormies.
 
Are you leaving the RSS feeds to show in the "new posts" link?
 
Are you leaving the RSS feeds to show in the "new posts" link?
Yes, was thinking of leaving it how it is at the mo.
Unless you think it's not good to do so. At the moment the feeds are playing catch up. I think it will be only 3 per hour once it's settled down.
 
Leave it as it is Dan. If it gets too much I can find another way to view new posts ;) ;)

Thanks
 
Back
Top