I be taught issues like most of you:
- studying (blogs!)
- doing the factor
- and probably by doing the factor I examine in a weblog
I’ve been attempting to maintain up with AI Vector Search in 23ai (sure, I’m however a mere human), and one of many new ideas to me, is Hierarchical Navigable Small World (HNSW) Indexes.
The weblog I’m presently studying is from our product supervisor Andy Rivenes, Utilizing HNSW Vector Indexes in AI Vector Search.
I’m hoping to have the ability to perceive higher what my execution plans are telling me after I begin evaluating vectors utilizing these new kinds of indexes, for instance –

Andy’s HNSW submit is utilizing this knowledge set, let’s load it!
Chicago’s Knowledge Portal has a Crimes 2001-Current knowledge set, and I’m going to obtain the CSV model of that.
As soon as it’s downloaded, right here’s the overall take I do.
Spark up SQLcl, use the LOAD command to generate some DDL
I’m going to make use of the ‘CD’ command to leap to my downloads folder, after which I’m going to make use of the LOAD command with the ‘SHOW’ key phrase to only ‘present me’ what the LOAD command would do with this new CSV knowledge if it have been to be was a desk.
Be aware I’ve performed zero setup, that is simply my ‘hey, perhaps this will likely be ok, try.’
SQL> cd /customers/thatjeffsmith/downloads
SQL> load crime_data chicago-crimes.csv present
csv
column_names on
delimiter ,
enclosures ""
double
encoding UTF8
row_limit off
row_terminator default
skip_rows 0
skip_after_names
Present DDL for desk HR.CRIME_DATA
#INFO COLUMN 2: Case Quantity => CASE_NUMBER
#INFO COLUMN 3: Date => DATE$
#INFO COLUMN 4: Block => BLOCK
#INFO COLUMN 6: Main Sort => PRIMARY_TYPE
#INFO COLUMN 7: Description => DESCRIPTION
#INFO COLUMN 8: Location Description => LOCATION_DESCRIPTION
#INFO COLUMN 9: Arrest => ARREST
#INFO COLUMN 10: Home => DOMESTIC
#INFO COLUMN 11: Beat => BEAT
#INFO COLUMN 12: District => DISTRICT
#INFO COLUMN 13: Ward => WARD
#INFO COLUMN 14: Group Space => COMMUNITY_AREA
#INFO COLUMN 15: FBI Code => FBI_CODE
#INFO COLUMN 16: X Coordinate => X_COORDINATE
#INFO COLUMN 17: Y Coordinate => Y_COORDINATE
#INFO COLUMN 18: Yr => YEAR
#INFO COLUMN 19: Up to date On => UPDATED_ON
#INFO COLUMN 20: Latitude => LATITUDE
#INFO COLUMN 21: Longitude => LONGITUDE
#INFO COLUMN 22: Location => LOCATION
CREATE TABLE HR.CRIME_DATA
(
ID NUMBER(10),
CASE_NUMBER VARCHAR2(26),
DATE$ VARCHAR2(128),
BLOCK VARCHAR2(128),
IUCR VARCHAR2(26),
PRIMARY_TYPE VARCHAR2(128),
DESCRIPTION VARCHAR2(128),
LOCATION_DESCRIPTION VARCHAR2(128),
ARREST VARCHAR2(26),
DOMESTIC VARCHAR2(26),
BEAT VARCHAR2(26),
DISTRICT VARCHAR2(26),
WARD NUMBER(4),
COMMUNITY_AREA NUMBER(4),
FBI_CODE VARCHAR2(26),
X_COORDINATE NUMBER(9),
Y_COORDINATE NUMBER(9),
YEAR NUMBER(6),
UPDATED_ON VARCHAR2(128),
LATITUDE NUMBER(13, 9),
LONGITUDE NUMBER(13, 9),
LOCATION VARCHAR2(128)
)
;
SUCCESS: Processed with out errors
SQL>
🚩🚩🚩Purple Flag: DATES as VARCHARs
DATE$ VARCHAR2(128) -- ew, no no no
Why is that this a pink flag? You’ll wanting to question these dates as dates, and to take action you’ll want to make use of a perform to transform them, that’s each ugly and gradual(er) than if it have been simply saved correctly, a DATE.
To repair this, we have to inform the LOAD command what date format to anticipate when streaming the information via from the file to the desk.
Let’s ‘peek’ on the knowledge

Alright, now I understand how Chicago’s knowledge portal is selecting to speak when a criminal offense has occurred, now I would like to inform SQLcl that.
present load and set load
‘present load’ will inform me what the present settings are for doing a LOAD job.
SQL> present load
batch_rows 50
batches_per_commit 10
clean_names remodel
column_size rounded
commit on
date_format
errors 50
map_column_names off
technique insert
timestamp_format
timestamptz_format
locale
scan_rows 100
truncate off
unknown_columns_fail on
SQL> set load date_format MM/DD/YYYY HH:MI:SS AM
Now that I’ve performed this, let’s strive the LOAD command, once more.

I hate that new column identify, however altering column names is trivial, let’s load that sucker!
Creating/Loading the brand new desk
On the load command, as an alternative of utilizing ‘present’ key phrase on the top, I merely use ‘new.’
SQL> load crime_data chicago-crimes.csv new
csv
column_names on
delimiter ,
enclosures ""
double off
encoding UTF8
row_limit off
row_terminator default
skip_rows 0
skip_after_names
Create new desk and load knowledge into desk HR.CRIME_DATA
batch_rows 50
batches_per_commit 10
clean_names remodel
column_size rounded
commit on
date_format MM/DD/YYYY HH:MI:SS AM
errors 50
map_column_names off
technique insert
timestamp_format
timestamptz_format
locale English United States
scan_rows 100
truncate off
unknown_columns_fail on
#INFO COLUMN 2: Case Quantity => CASE_NUMBER
#INFO COLUMN 3: Date => DATE$
#INFO COLUMN 4: Block => BLOCK
#INFO COLUMN 6: Main Sort => PRIMARY_TYPE
#INFO COLUMN 7: Description => DESCRIPTION
#INFO COLUMN 8: Location Description => LOCATION_DESCRIPTION
#INFO COLUMN 9: Arrest => ARREST
#INFO COLUMN 10: Home => DOMESTIC
#INFO COLUMN 11: Beat => BEAT
#INFO COLUMN 12: District => DISTRICT
#INFO COLUMN 13: Ward => WARD
#INFO COLUMN 14: Group Space => COMMUNITY_AREA
#INFO COLUMN 15: FBI Code => FBI_CODE
#INFO COLUMN 16: X Coordinate => X_COORDINATE
#INFO COLUMN 17: Y Coordinate => Y_COORDINATE
#INFO COLUMN 18: Yr => YEAR
#INFO COLUMN 19: Up to date On => UPDATED_ON
#INFO COLUMN 20: Latitude => LATITUDE
#INFO COLUMN 21: Longitude => LONGITUDE
#INFO COLUMN 22: Location => LOCATION
#INFO DATE format detected: MM/DD/YYYY HH:MI:SS AM
CREATE TABLE HR.CRIME_DATA
(
ID NUMBER(10),
CASE_NUMBER VARCHAR2(26),
DATE$ DATE,
BLOCK VARCHAR2(128),
IUCR VARCHAR2(26),
PRIMARY_TYPE VARCHAR2(128),
DESCRIPTION VARCHAR2(128),
LOCATION_DESCRIPTION VARCHAR2(128),
ARREST VARCHAR2(26),
DOMESTIC VARCHAR2(26),
BEAT VARCHAR2(26),
DISTRICT VARCHAR2(26),
WARD NUMBER(4),
COMMUNITY_AREA NUMBER(4),
FBI_CODE VARCHAR2(26),
X_COORDINATE NUMBER(9),
Y_COORDINATE NUMBER(9),
YEAR NUMBER(6),
UPDATED_ON DATE,
LATITUDE NUMBER(13, 9),
LONGITUDE NUMBER(13, 9),
LOCATION VARCHAR2(128)
)
;
#INFO Desk created
#INFO Variety of rows processed: 28,268
#INFO Variety of rows in error: 0
#INFO Final row processed in remaining dedicated batch: 28,268
SUCCESS: Processed with out errors
SQL>
And is that the anticipated outcome? Let’s test the data loaded vs the file.

Yup, that tracks, I’ve an additional data, being the CSV header, which doesn’t get loaded as a row.
Let’s take a look at the desk
However first, let’s acquire the stats.

INFO+ is like DESC, however on steroids – the great variety! It exhibits the desk definition, however with statistics thrown in.
Let’s rename that column
I can use VS Code and our SQL Developer Extension for that.

Oh, my date format is off, let’s repair that too.

And now that I’ve obtained the information, I can begin taking part in AND studying from Andy’s getting began with AI Vector Search submit.

For those who made it this far…
You is perhaps occupied with seeing my different posts the place I speak about how you can load knowledge from CSV to your Oracle Database tables