|
|
|
![]() Importing Directly From a Text File Instead of defining a consecutive file and then reading it, it is possible to read and parse text files directly in APPX. First, here is the actual code:
SAS-00 Spike & Sparky Sample App Page: 1
JPN 12/01/97 15:06 Technical Documentation (ILF Only)
Process Type: SUBROUTINE Process Name: IMPORT TEXT FILE
Description: Import Text File
STMT PROC KEY IMPORT TEXT FILE
Frm Class: RECORD Frm Seq: Alt Img No: Chld Seq
or Opt
Start Of Process
================================================================================
* See Note 1
CNV BIN --- CDF HEX00 = 0
CNV BIN --- CDF HEX0A = 10
SET --- CDF PATH & FILE NAME = /tmp/custin.txt
APPEND --- CDF PATH & FILE NAME 0 --- CDF HEX00
SET --- CDF TYPE = r
APPEND --- CDF TYPE 0 --- CDF HEX00
PASS --- CDF PATH & FILE NAME FIELD SHARE? Y
PASS --- CDF TYPE FIELD SHARE? Y
PASS --- CDF FILE POINTER FIELD SHARE? Y
CALL ,RT_OPEN_STREAM RESIDENT? Y END? N FAIL 0
IF --- RETURN CODE NE +1
T CANCEL Unable to Open Input File
* See Note 2
LABEL :READ A LINE
SET --- TEMP 256 =
PASS --- TEMP 256 FIELD SHARE? Y
SET --- LI = 256
PASS --- LI FIELD SHARE? Y
PASS --- CDF FILE POINTER FIELD SHARE? Y
CALL ,RT_READ_STREAM RESIDENT? Y END? N FAIL 0
IF --- RETURN CODE EQ +1
T GOSUB :PARSE LINE
T GOSUB :MOVE TO OUTPUT
T GOTO :READ A LINE
* Standard close routine
PASS --- CDF FILE POINTER FIELD SHARE? Y
CALL ,RT_CLOSE_STREAM RESIDENT? Y END? N FAIL 0
RETURN
*
* Subroutines
*
LABEL :PARSE LINE
* parses tab delimited input field, returns results in WORK ARRAY
* See Note 3
SET --- TEMP 1 =
BEG LOOP II = 001 TO 030 STEP 001
SET SAS WORK ARRAY II =
END LOOP II
* first strip trailing binary data
IF --- TEMP 256 IN --- CDF HEX00
T SET --- XI = --- TEXT AT POSITION
T COMPUTE --- XI - 1
T SET --- TEMP 512 =
T SET TEMP 512 AT 001 FOR XI FROM 001 OF --- TEMP 256
T SET --- TEMP 256 = --- TEMP 512
IF --- TEMP 256 IN --- CDF HEX0A
T SET --- XI = --- TEXT AT POSITION
T COMPUTE --- XI - 1
T SET --- TEMP 512 =
T SET TEMP 512 AT 001 FOR XI FROM 001 OF --- TEMP 256
T SET --- TEMP 256 = --- TEMP 512
CNV BIN --- CDF HEX0A = 13
IF --- TEMP 256 IN --- CDF HEX0A
T SET --- XI = --- TEXT AT POSITION
T COMPUTE --- XI - 1
T SET --- TEMP 512 =
T SET TEMP 512 AT 001 FOR XI FROM 001 OF --- TEMP 256
T SET --- TEMP 256 = --- TEMP 512
* begin parsing
CNV BIN --- TEMP 1 = 9
SET --- II = 1
LABEL :PARSE AGAIN
SET --- TEMP 512 =
IF --- TEMP 256 IN --- TEMP 1
T SET --- XI = --- TEXT AT POSITION
T COMPUTE --- XI - 1
T SET TEMP 512 AT 001 FOR XI FROM 001 OF --- TEMP 256
T SET SAS WORK ARRAY II = --- TEMP 512
T COMPUTE --- XI + 2
T SET --- TEMP 512 =
T SET TEMP 512 AT 001 FOR 512 FROM XI OF --- TEMP 256
T SET --- TEMP 256 = --- TEMP 512
T COMPUTE --- II + 1
T GOTO :PARSE AGAIN
SET SAS WORK ARRAY II = --- TEMP 256
RETURN
*
LABEL :MOVE TO OUTPUT
SET 1EX VENDOR NO = SAS WORK ARRAY 001
SET 1EX VENDOR NAME = SAS WORK ARRAY 002
WRITE 1EX VENDOR FAIL 0
RETURN
Note 1This code simply opens the text file for input. The code is exactly the same code that is generated when you use the 'Generate Delimited Update' option in the Data Dictionary Toolbox, except instead of passing a 'w' (for write) in CDF TYPE, we pass a 'r' (for read). Sparky sez: "Note the test for a RETURN CODE of +1. This tells us we were able to actually open the file. Anything else means a failure of some kind, it could be as simple as a misspelled file name, or a permissions problem. Remember, you are executing as user APPX, and it must have permissions to access the text file." Note 2 This next section of code actually reads the text file. We pass it 3 parameters, an alpha field to contain the data, a numeric field where we indicate the maximum length we can accept, and the file pointer that was returned to us when we opened the file. The parameters must be passed in that order. Spike sez: "Note that we initialise the alpha and numeric field before every call to the RT_READ_STREAM routine. This is necessary for 2 reasons: if a subsequent read returns less data, the old data may still be in the alpha field, and secondly, the RT_READ_STREAM uses the numeric field for it's own nefarious purposes therefore we have to set it every time. Also note our test for RETURN CODE equal to +1. Just as with the RT_OPEN_STREAM, this indicates a successful read. Anything else means we have read all the data in the file and we can quit." Note 3 In this section, we parse the TEMP 256 alpha field to separate all the fields. Our example is written for a tab-delimited input file. The WORK ARRAY is an alpha work field, 128 characters in length with 30 occurrences. This routine will parse the fields out of TEMP 256 and return the individual fields in the WORK ARRAY field for processing by the MOVE TO OUTPUT routine. Sparky sez: "Tab delimited fields are the easiest to work with, and the fastest to process, as we can use the IN condition of the IF statement to look for separators. We don't have to know how long each field is, the Tab will tell us (so long as it's within the max. length for WORK ARRAY, we are OK). If you have to process a comma delimited file, that will be a lot slower, as you have to examine each character to see if it is a comma, and if so, is it in the middle of some quotes, in which case you ignore it because its part of the field data.". Spike sez: "The first thing we do here is to initialise our WORK ARRAY field, just in case some lines contain fewer fields than other lines. This is just good programming practice. The next thing is to strip off the trailing binary data from TEMP 256. When APPX returns the data from the text file, it includes the CR/LF characters, or just a LF character, and a null (Hex '00') at the end of each line. Normally, under Unix the file would only have a LF/Hex 00 at the end, and under Windows it would have a CR/LF/Hex 00 at the end. However, since we don't know what platform we are on, we should check for all three, and remove whatever we find. Even if we knew what platform we were on, we still don't know where the file came from, we could be processing a CR/LF terminated file under Unix, or a LF terminated file under DOS/Windows.". Sparky: "Good point, Spike. We wrote these tests as 3 complete sections of code, we realise that this could have been turned into another subroutine. We did it this way for clarity. The next part of the program just looks for the Tab character (Hex 09), and moves each field to a separate occurrence of WORK ARRAY. We use TEMP 512 as a scratch pad to temporarily hold the contents of TEMP 256 as we manipulate it. Note that we are 'consuming' TEMP 256 as we go, i.e., once we find the first field, we remove it from TEMP 256. When there are no more Tabs in TEMP 256, then we have parsed all the fields, and whatever is left in TEMP 256 is the last field. This is simpler than trying to keep track of which fields have already been processed, and allows us to use the fastest technique for finding field separators (the IN condition). Also notice how we always set TEMP 512 to blank before using it in a SET TEMP statement. This ensures that data from the previous field will not carry over to the next field.". You can download the above code by clicking here for any Intel based platform, or here for HP/AIX platforms. To install the example code, define a new application SAS/00 in the APPX System Administration files, and then create the design files. Move these files to the $APPXPATH/00/SAS/Data directory and uncompress them (PKZIP for Intel, uncompress/tar for HP/AIX). « Return For additional information, contact tips@cwi-appx.com [back to top] | |