Greetings from The Data Magician! "Read Me" for version 1.5 (Release 1.0) Lawrence Folland Sept. 15, 1997 This file will provide last minute notes about the status of the software, deviations from the manual, errata, known "bugs", enhancements, etc. PLEASE BE SURE TO FILL OUT YOUR SOFTWARE REGISTRATION FORM SO THAT YOU WILL BE NOTIFIED OF ANY SOFTWARE UPGRADES. You may register on-line at http://www.folland.com/register You may leave this program by pressing the Esc key. You may print these notes by pressing F2 (when using the README program). ============================================================================ Version 1.5, Release 1.0 has added the following new features: New Authorization code system: - there will be a single executable file that will act as a demo or fully functional version. If the executable is not serialized, it will act as a demo version and have a prompt to enter an Authorization code. Once authorized, the serial number will be assigned. That same authorization code can be used to authorize further releases of Version 1.5, which will be available from the Folland Software Services FTP site at folland.com. Visit the Web site at http://www.folland.com for more information. Input File Handling: - now handles DB/TextWorks (INMAGIC for Windows) dump files with multi-word field tags (eg: 'Publication Date' ), plus paragraph markers ("<"). - can read DB/Text/Works structure files (.DBS). If no extension is given when specifying an INMAGIC structure filename, it will try .STR, then .DBS - can handle "blocked" or "spanned" MARC Communication files - STAR input routines improved - has built-in support for Library Master format files - can read Fixed Length Fields files - bug in CDS/ISIS file reading fixed (when end-of-record marker ("##") split) Output File Handling: - provides more flexibility in defining ASCII delimited files - has a new option for defining customized Tagged output files - handles DB/TextWorks multi-word field tags by putting apostophes around the entire field tag, when required. - now provides a Global "Post" Processing codes field. These are codes that will be applied to each Output field *after* the Pre-proceeing Codes and the field-specific codes have been processed. New Processing Codes: - Break eXtract "text" (BX "text"): extracts a piece of text from a field and leaves it blank if not found - Break Marc "..." now handles many more features, eg: BM "$abc" extract data from subfields a, b and c - drop the subfield codes themselves BM "$xyz$" extract subfields x, y, and z and retain subfield markers BM "$*" extract data from all the subfields - drop the subfield markers BM "$.ab" extract the two indicator codes (subfield "."), plus subfields "a" and "b" BM "$!cba" extract data from subfields *in the order specified - Conditional Lower CL sets the conditional flag to be True if the current field is all lowercase (or non-alphabetic) - Conditional Upper CU sets the conditional flag to be True if the current field is all uppercase (or non-alphabetic) - F"fieldºoptions" many new options available in the Field code: F"SUBJº(1-3)" take occurrences 1 to 3 of the SUBJ field F"ISBNº[5-14]" take characters 5 to 14 of the ISBN field F"AUº/O" take the *Output* field called "AU" F"245º$adh" take subfields a, c, and h from the 245 tag - also handles all the options of "BM" above - Number Count "text" Replaces the contents of the current field with a count of the number of occurrences of the text. This would generally be done on a *copy* of the field. - Substitute command new features: S"ºold"new" substitutes all occurrences of the old text *if it starts at the beginning of an occurrence* with the new text S"oldº"new" now it must *end* the occurrence S"x*y"A*B" wildcards now allowed in the replacement text. The wildcards are matched one-for-one with the search text. Every "*" or "?" that matches text can be included in the replacement text. eg: Old data: x12345y67z Codes: s"x*y??z"a*??bc" New data: a1234567bc - Text Strings you can now save and recall strings of text from one field or record to the next: TS1 will store the current field contents in "box" 1 TR1 will recall the text in box "1" and append it to the current field TB1 will prefix the current data with the text in "box" 1 if-and-only-if there is already some text in the current field Editor: Most fields will now allow you to go into a full-screen editor mode by pressing Shift-F7. For global processing codes, for instance, this means you can enter an entire screenful of codes, not just 3 lines. Also this is a scrolling edit window, so the size is not limited by the screen. When the text is too long to display on the screen, it will be displayed with << at the left (if your cursor is at the end), or >> if there is text to the right. Command Line options: - /F automatically appends recognized fields to both the input and output list of fields. This is particularly useful when processing MARC records and you want to generate a MARC tag. A code like FC will then copy the data from the new input field to the new output field. - /H or /? provides command-line help - /1=text or /2=text, etc. Pass a string of text into the conversion from the command line. This text can then be extracted using the TB and TR codes (described above) ============================================================================ Version 1.4, Release 3.0 has added the following new features: - new subfield code: BX "text" - Break eXtract This new code works similarly to the B"text" code except that where B"text" will not affect the data if "text" is not found, BX"text" will ONLY return "text", otherwise it clears the data. Examples: DATA: Vol. 6 (No. 3) CODES: BX"(No*)" RESULT: (No. 3) ! Just leaves "(No*)" DATA: Vol.17 CODES: BX"(No*)" RESULT: ! Field left blank, since "(No*)" not there DATA: Vol. 17 CODES: B"(No*)" RESULT: Vol. 17 ! No effect when "text" not found - the BM"xxx" (Break Marc) code has been greatly expanded in its flexibility. In previous releases, it was only able to extract a single MARC subfield (eg: BM"$a"). You may now specify all the subfields you wish to extract at once, including the two indicator codes at the beginning of a field (by specifying subfield "."). You may extract all subfields (except the indicator codes) by specifying "*". You can get all the subfields plus the indicator codes by specifying ".*". The Data Magician extracts the subfields in the same order as the original data, by default. You can force it to extract them in any order you wish by putting a "!" as the first "subfield". Normally, it will NOT include the subfield delimiters themselves when extracting the subfields. But, if you need them, you can include the delimiter by appending the subfield character itself to the end of the list of subfields. Examples: DATA: 10$aBullfrog Press,$bGuelph, Ontario$c1994 CODES: BM"$ac" RESULT: Bullfrog Press, 1994 ! Just take subfields a & c DATA: 00$aAutomobiles$xSweden$y1992 CODES: BM"$*" ! Get all subfields RESULT: Automobiles Sweden 1992 DATA: 01$a782.35$bREF$c1$d1987$x39877000045617 CODES: BM"$!ba" ! Take subfield b then a RESULT: REF 782.35 DATA: 01$a782.35$bREF$c1$d1987$x39877000045617 CODES: BM"$.abd$" ! include delimiters RESULT: 01$a782.35$bREF$d1987 - the F"field" code has been greatly expanded. This is the key code to move data from the Input fields to the Output fields (NOTE to NEW users -> one of the most frequently asked questions I get from novice users is why they don't seem to get any output when they run their conversion. You MUST specify in your Output fields where to get the data FROM! This is usually from one of your Input fields, and you get that with this F"field" code). The original code only allowed you to specify the name of the input field you wanted to get. It then extracted the entire Input field, including all repeat occurrences of the field. The new version allows you to specify a range of multiple occurrences of the field (eg: "(1-5)"), a range of characters to extract (from every occurrence) (eg: "[10-17]"), draw the data from an OUTPUT field (eg: "/o") and specify which MARC subfields you want included (allowing the same features as the new BM code, described above. To separate the field name from these special insstructions, you are required to put The Data Magician's field separator character ("º", which you can get by pressing F6 in DM) after the field name, but before the special options. Examples: Input Field ("AU") DATA: Smith, JoshuaºWatson, WilliamºDavis, Vernon CODES: F"AU" ! Take all the data from the AU field, above RESULT: Smith, JoshuaºWatson, WilliamºDavis, Vernon ("AU") DATA: Smith, JoshuaºWatson, WilliamºDavis, Vernon CODES: F"AUº(2-)" ! Take 2nd and subsequent authors RESULT: Watson, WilliamºDavis, Vernon ("008") DATA: 940425s1989 cau a b eng CODES: F"008º[8-11] ! Get 2nd date RESULT: 1989 ("245") DATA: 14$aSpace:$bthe final frontier$h[Video] CODES: F"245º$ab" !get title and subtitle from 245 RESULT: Space: the final frontier ("100") DATA: 10$aSmith$hJoshuaº10$aWatson$hWilliamº$aDavis $hVernon CODES: F"100º(1-2)$!ha" ! Just take 1st authors, subfield h, then a RESULT: Joshua SmithºWilliam Watson - another new processing code was added to make the Break Marc code (described above) more useful. The new code is SM"text" - Substitute Marc delimiters. This allows you to change the character(s) separating the subfields extracted with BM from the default space to whatever you wish to specify. This also works with extracting MARC subfields with the F"field" code. Examples: DATA: 00$aAutomobiles$zSweden$y1992 CODES: SM" -- " BM"$*" ! Get all subfields, separate with " -- " RESULT: Automobiles -- Sweden -- 1992 ("090") DATA: 00$aREF$b45$clef$d1989 CODES: sm"/" f"090º$*" ! Get 090 field, all subfields, separate with a "/" RESULT: REF/45/lef/1989 - there is a new command line option "/F" which directs The Data Magician to automatically add any new fields encountered in the Input File to the list of Input and Output Fields. This is particular useful when copying between two similar formats and where you have instructed The Data Magician to copy any fields from the Input Fields to the Output Field with the same name. This is done in the Output Global Processing Codes with the "FC" code. For instance, if you wanted to convert MARC records from one format (eg: MicroLIF) into another (eg: MARC Communications), and you want to make sure it will include all fields in the Input file, set up the conversion settings with any fields you want to do some specific processing (eg: 090), but do not include all the other MARC fields. Then, start The Data Magician: DATAMAGE /F. When the conversion is performed, unknown fields are automatically added to the Input Fields and processed into the Output file. - when you are test-reading a record at a time with the F3 key, and it encounters an unknown field, you now have the option of [A]ppending it to the end of the Input Fields, [I]nserting in a position after the last recognized field was found, [S]kip that field, or [Q]uit reading the current record (especially useful if you have not correctly specified how to recognize the start and end of a record). When adding new MARC tags, it will add them in the correct numerical sequence, if you choose Insert. - the settings files have all been updated to use these new codes, especially the MARC conversions. There may be some differences in the results using the new settings compared to the old, but generally the new settings are more robust and will include more of the data. Bug Fixes in Version 1.4, Release 3.0: - Release 2.0 started in monochrome. Release 3.0 will start in colour, if detected. - the screen displayed was getting messed up if you tried to change drive and directory with the "=" sign when saving or loading settings. This has been fixed. - fixed problems with time estimation when a conversion would run past midnight, or start at a record beyond the first. - fixed a problem reading command line options with spaces - fixed display of MARC records when [V]iewing output during a conversion. The control characters for End-of-Record, End-of- Field, and the subfield delimiter are now shown as <029>, <030>, <031>, respectively. ======================================================================== Version 1.4, Release 2.0 added the following features: - a new input/output format has been added. Type 8 now refers to files created by or for Library Master from Balboa Software. The format is similar to other tagged files (eg: INMAGIC and STAR) but has some special features, especially a Record Type as the first field in curly brackets {}. Each field name is surrounded by square brackets []. The Library Master manual should be referenced for more detail on the Tagged format for importing and exporting. A sample file has been included called MARC2LM.SET to convert MARC records to the BIBLIO1 data structure. When defining settings, you may specify a Library Master data structure (.STR), and The Data Magician will automatically read in all the valid field names for that file. The field name RECTYPE ***MUST*** be the first field name in an output file and should be filled with a valid Record Type for the file you are trying to create (eg: BOOK, BOOK IN SERIES, etc.). - When specifying search strings, you may now specify that the string MUST appear at the beginning and/or end of a field (or sub-entry) by including the double-bar (º) in the search string. Remember that you can use the F6 key to insert the double-bar symbol. For example to remove the word "The" only at the beginning of a field, you could give the following codes: s"ºThe "" (which means "substitute the characters 'The ' with nothing, IF at the beginning of the field") Likewise, you may specify that the text must be at the end by giving a command like: c"1992º" (which means "if 1992 appears at the end of the field") Finally, you can combine the two, by including the double-bar at BOTH ends: s"º(19*)º"" (which would delete the entire field (or sub-entry) if it starts with "(19" and ends with a ")" ) Note that wildcards can be in the search string. - The ability to read "blocked" MARC Communications format files has been added. These are files, usually from larger library systems, like Dynix, where the records are written onto tape in fixed-length blocks. These blocks are usually in increments of 512 bytes (eg: 512, 1024, 2048, 4096, etc.). Most MARC Communications format files are NOT in this format. When reading type 5 (MARC Communications format), you will be prompted to enter the "MARC Block Length", with a default of 0. Leave it with the value 0 unless you are sure otherwise. If you give the incorrect value, you may get error messages like "Invalid MARC Communications Format", or even System Error Messages. Try changing the value back to 0, or find out if the data really is blocked. We would be glad to offer any help with this, if you have any questions. - The Data Magician now converts any special codes entered with angle brackets (eg: <010>) back to the angle bracket notation when it saves settings files. In the past, it did not and that caused problems with special codes like end-of-line codes (ie: <010>, <013>) and possibly others, when trying to reload those settings. This should not otherwise affect the loading or saving of settings files from earlier versions. - Standard errors (eg: Field name not recognized) are now written to the log file (if you have specified one) as the conversion progresses. This may make the log file a little more useful in diagnosing any problems with a conversion. ========================================================================= Version 1.4, Release 1.1 fixes a problem reading STAR records (Input Type 6) that existed in Release 1.0. Here are a couple of the highlights of version 1.4: - processing codes XP, XE and XB. These are all designed to remove punctuation from the End or Beginning of a field. XP removes a fixed list of punctuation symbols from the end of a field (ie: /-,.space) XE allows you to specify exactly which ones to strip (eg: XE ":,; " would remove any colons, commas, semi-colons, or spaces it found at the end of a field, in any order). XB does the same except at the beginning of a field. - version 1.4 will now allow you to Store and Recall particular events (conditions) that have occurred. You may set a conditional processing flag with CE (Conditional Empty), or C"text" (look if "text" is there). These were both already available in version 1.3. You may now Store the current condition with CSn where "n" is replaced by a number from 1 to 99. Later you may recall that condition with CRn where "n" would be the same number that you stored before. - All settings files have been revised to include the new processing codes.