c-- styles for logos and headline links do not modify internet, red, or black styles -->

Intranet Journal   Earthweb  
Events Jobs Premium Services Media Kit Network Map E-mail Offers Vendor Solutions Webcasts

   Intranet Journal Subjects
Search Earthweb

Privacy Policy



internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

internet commerce
Be a Commerce Partner
















 

[ Home | Discussion Forum | How Do I... | Lotus Notes Intranets | Microsoft SharePoint | Products | Shopping  ]

free news!

Content Management Made Easy with ASP Page 3


Chris Payne

Printer Friendly Version

  • Step 2 - Manipulation

    Okay, now that the file is uploaded and you have the filename, now's the time to do some manipulation of the file. We'll use the file system object here to do the grunt work.

    If you converted the document from a word processor's format to HTML, chances are you'll have a bunch of extraneous HTML tags that you don't want in there. For instance, Microsoft Word adds in a bunch of XML and CSS tags that are used to convert the HTML document back into a Word document, should you ever choose to do so. However, those extra tags add a lot of overhead to the HTML document, so if you're never planning on converting the document back to the Word format, you should get rid of those extra tags. This can be done with a bunch of replaces.

    
    
    Const fsoForReading = 1
    Dim objFSO
    Set objFSO = Server.CreateObject("Scripting.FileSystemObject")     <-- create the filesystem object
    Set objTextStream = objFSO.OpenTextFile("C:\SomeFile.html", fsoForReading)
    <-- open the uploaded file
    txtFileContents = objTextStream.ReadAll
    objTextStream.Close     <-- close the file
    
    
    If instr(1, txtFileContents, "<xml>", vbTextCompare) then  <-- if <xml> is found in the text, remove it
            txtFileContents = Replace(txtFileContents, "<xml>", "", 1, -1,
    vbTextCompare)
    End If
    .  <-- remove any other tags that we don't want
    .
    .
    
    
    Now write the changes we just made
    Const fsoForWriting = 2
    Set objTextStream = objFSO.OpenTextFile("C:\SomeFile.html", fsoForWriting)
    <-- open the uploaded file
    objTextStream.Write(txtFileContents)
    objTextStream.Close     <-- close the file
    
    
    set objFSO = Nothing
    
    

    Note that it may take a lot of these loops and fine tuning to get rid of all the junk HTML in the files. Though this step is fairly easy, it takes a while to get it right. You can also perform any other file manipulation here, for instance, if you need to change the filename, do a global replace, add in a style sheet, etc.

    NOTE: If you use the file system object (FSO) to change the filename, make sure that you also update the filename field in the database. Otherwise you'll have a database entry that points no where, and a file that doesn't belong to anything.

    If you'd like to split the file into multiple pages, you could do that here too. Though I won't go into detail, I will list a few guidelines on how to do so:

    1. Create a 'pages table,' tblPages. A table that contains information about the pages in the document. This table would contain data such as: Document ID, which would tell you which document in tblFiles this page belongs to; PageTitle, a title for the individual page, ie Page 2; and PageNumber, so you can see the order of the pages.

    2. Add a field in tblFiles called NumberOfPages and increment that number everytime you add another page. This way, you'll know how many pages are in every document without having to see how many actual files there are.

    3. Name the new files something based off of the original file. For instance, if the original document was test.html, name pages 2 and 3, test2.html and test3.html

    4. Parse the file for a logical separator, ie a paragraph break <p> You could then split the page according to the number of paragraphs, or allow the user to select which paragraph to place the page break in. If you do the latter, you'll have to mark each spot the user selected somehow. A good way to do this would be to use a form with numbered checkboxes for each paragraph; ie if the user selects checkbox 2, there should be a page break at paragraph 2. For each page break, write the contents in a new file. This is the hardest step. Here is some logic and pseudo-code on how to do so:

    
    
    dim CursorFirst, CursorLast
    strNextText = file contents
    for each paragraph in strNextText
     Set CursorFirst to beginning of paragraph (ie at position of <p>)
     If there is more than one paragraph in strNextText then
      If this paragraph is not marked with a page break, then
       Put all the text to the next paragraph in variable strText
       Point CursorLast to CursorFirst
      Else
       Update tblFiles and tblPages with new page info
       Write new file with strText
       Clear strText
       Put all the text to the next paragraph in variable strText
       Put all the text after current paragraph in variable strNextText
       Point CursorLast to first <p> in strNextText
      End if
     Else if only one paragraph then
      Write strText to a file
     End If
    
    

    You'll probably want to put this functionality on a separate page. This should get you started on splitting the original document into multiple pages.

    NOTE: If you split the original document in pieces, be sure to keep track of those pieces as well. They should each receive an entry in tblPages. If you move/rename/delete one, make sure you do the same with all the others.

    Step 3 - Layout

    By now you should have one or more pages with properly formatted HTML (with all extraneous tags removed). All pages should have an entry in tblPages, with a many-to-one relationship to tblFiles (ie there will be many entries in tblPages that will correspond to one entry in tblFiles). Now we must apply the format and layout of the existing site to the new content.

    Choose or create a template that will be the basis of the new content. Separate out the content that will remain static (things that won't change from article to article) and the content that will change every time. Read here for a good short tutorial on templates. You can either store the static portions in a database, or in a file. Just make sure you know where the dynamic content goes - the title goes in the title section, the content goes in the body section, etc.

    Now to extract out the portion of the new content that you need, after all, you don't want another set of <HTML> and <HEAD> tags, since that part should already be in the static template. You can use the following function to extract the necessary HTML from the document:

    
    Function GetHTML(strContent, strStartTag, strEndTag)
    
    
            ' This procedure returns the portion of the HTML in strContent
        ' beginning with the HTML tag in the strStartTag variable and
        ' ending with the HTML tag in the strEndTag variable, not including the
        ' start and end HTML tags
    
    
        ' First get all of the HTML in the document.
        strText = strContent
    
    
            intStart = instr(1, strText, left(strStartTag,len(strStartTag)-1),
    vbtextcompare)
            if intStart <> 0 then
                    intStart = instr(intStart+1, strText, right(strStartTag,1),
    vbtextcompare)
                    intEnd = InStr(intStart, strText, strEndTag, vbtextcompare)
                    GetHTML = Mid(strText, intStart + 1, intEnd - intStart - 1)
            else
                    GetHTML = " "
            end if
    
    
    End Function
    
    

     

    Printer Friendly Version

Of Interest
· Intranet eXchange Discussion Board

· Advice and Opinions