Logo na Zephyrnet

Bincika bayanai cikin sauƙi: Yi amfani da SQL da Rubutu-zuwa-SQL a cikin Amazon SageMaker Studio JupyterLab littattafan rubutu | Ayyukan Yanar Gizo na Amazon

kwanan wata:

Amazon SageMaker Studio yana ba da cikakkiyar mafita ga masana kimiyyar bayanai don ginawa, horarwa, da tura ƙirar na'ura (ML). A cikin aiwatar da ayyukansu na ML, masana kimiyyar bayanai galibi suna fara aikinsu ta hanyar gano tushen bayanan da suka dace da haɗawa da su. Sannan suna amfani da SQL don bincika, tantancewa, hangen nesa, da haɗa bayanai daga maɓuɓɓuka daban-daban kafin amfani da su a cikin horon ML ɗinsu da fahimtar su. A baya can, masana kimiyyar bayanai sukan sami kansu suna juggling kayan aiki da yawa don tallafawa SQL a cikin aikin su, wanda ke hana yawan aiki.

Muna farin cikin sanar da cewa littattafan rubutu na JupyterLab a cikin SageMaker Studio yanzu sun zo tare da ginanniyar tallafi don SQL. Masana kimiyyar bayanai na iya yanzu:

  • Haɗa zuwa shahararrun sabis ɗin bayanai gami da Athena na Amazon, Redshift na Amazon, Amazon DataZone, da Snowflake kai tsaye a cikin littattafan rubutu
  • Bincika kuma bincika bayanan bayanai, tsare-tsare, teburi, da ra'ayoyi, da samfoti bayanai a cikin mu'amalar littafin rubutu
  • Mix SQL da lambar Python a cikin littafin rubutu guda don ingantaccen bincike da canza bayanai don amfani a ayyukan ML
  • Yi amfani da fasalulluka na samarwa masu haɓakawa kamar cika umarnin SQL, taimakon tsara lamba, da nuna alama don taimakawa haɓaka haɓaka lambar da haɓaka yawan haɓakar haɓakawa gabaɗaya.

Bugu da kari, masu gudanarwa za su iya sarrafa haɗin kai zuwa waɗannan ayyukan bayanan cikin aminci, ba da damar masana kimiyyar bayanai su sami damar bayanai masu izini ba tare da buƙatar sarrafa takaddun shaida da hannu ba.

A cikin wannan sakon, muna jagorantar ku ta hanyar kafa wannan fasalin a cikin SageMaker Studio, kuma muna bi da ku ta hanyoyi daban-daban na wannan fasalin. Sannan za mu nuna yadda zaku iya haɓaka ƙwarewar cikin littafin SQL ta amfani da damar Rubutu-zuwa-SQL da aka samar ta hanyar manyan manyan harsunan ƙira (LLMs) don rubuta hadaddun tambayoyin SQL ta amfani da rubutun yaren halitta azaman shigarwa. A ƙarshe, don baiwa ɗimbin masu amfani damar samar da tambayoyin SQL daga shigar da harshe na halitta a cikin littattafan rubutu, muna nuna muku yadda ake tura waɗannan samfuran Rubutu-zuwa-SQL ta amfani da SageMaker na Amazon karshen.

Bayanin bayani

Tare da haɗin gwiwar SQL na SageMaker Studio JupyterLab, zaku iya haɗawa zuwa shahararrun hanyoyin bayanai kamar Snowflake, Athena, Amazon Redshift, da Amazon DataZone. Wannan sabon fasalin yana ba ku damar yin ayyuka daban-daban.

Misali, zaku iya bincika tushen bayanai na gani kamar ma'ajin bayanai, teburi, da tsare-tsare kai tsaye daga yanayin yanayin ku na JupyterLab. Idan mahallin littafin ku yana gudana akan Rarraba SageMaker 1.6 ko sama, nemi sabon widget a gefen hagu na haɗin JupyterLab ɗin ku. Wannan ƙari yana haɓaka samun damar bayanai da gudanarwa a cikin yanayin ci gaban ku.

Idan ba a halin yanzu ba a kan Shawarwari na SageMaker (1.5 ko ƙasa) ko a cikin yanayi na al'ada, koma zuwa shafi don ƙarin bayani.

Bayan kun saita haɗin (wanda aka kwatanta a sashe na gaba), zaku iya jera hanyoyin haɗin bayanai, bincika bayanan bayanai da tebur, sannan bincika tsarin tsari.

SageMaker Studio JupyterLab wanda aka gina a cikin SQL yana ba ku damar gudanar da tambayoyin SQL kai tsaye daga littafin rubutu. Littattafan rubutu na Jupyter na iya bambanta tsakanin SQL da lambar Python ta amfani da %%sm_sql umarnin sihiri, wanda dole ne a sanya shi a saman kowane tantanin halitta mai ɗauke da lambar SQL. Wannan umarnin yana sigina zuwa JupyterLab cewa umarni masu zuwa sune umarnin SQL maimakon lambar Python. Ana iya nuna fitowar ta tambaya kai tsaye a cikin littafin rubutu, tana sauƙaƙe haɗawa da ayyukan SQL da Python a cikin binciken bayananku.

Za a iya nuna fitar da abin tambaya a gani a matsayin tebur na HTML, kamar yadda aka nuna a hoto mai zuwa.

Hakanan ana iya rubuta su zuwa ga a pandas DataFrame.

abubuwan da ake bukata

Tabbatar cewa kun gamsu da waɗannan abubuwan da ake buƙata don amfani da ƙwarewar SQL na SageMaker Studio littafin rubutu:

  • SageMaker Studio V2 – Tabbatar cewa kana gudanar da mafi na zamani version na naka yankin SageMaker Studio da bayanan bayanan mai amfani. Idan a halin yanzu kuna kan SageMaker Studio Classic, koma zuwa Hijira daga Amazon SageMaker Studio Classic.
  • IAM rawar – SageMaker yana buƙatar wani Gano AWS da Gudanar da Samun Dama (IAM) rawar da za a sanya zuwa yankin SageMaker Studio ko bayanin martabar mai amfani don sarrafa izini yadda ya kamata. Ana iya buƙatar ɗaukaka aikin kisa don kawo binciken bayanai da fasalin tafiyar SQL. Manufar misali mai zuwa tana bawa masu amfani damar bayarwa, jera, da gudana AWS Manne, Atine, Sabis na Sauƙi na Amazon (Amazon S3) Manajan Sirrin AWS, da Amazon Redshift albarkatun:
    {
       "Version":"2012-10-17",
       "Statement":[
          {
             "Sid":"SQLRelatedS3Permissions",
             "Effect":"Allow",
             "Action":[
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload",
                "s3:PutObject"
             ],
             "Resource":[
                "arn:aws:s3:::sagemaker*/*",
                "arn:aws:s3:::sagemaker*"
             ]
          },
          {
             "Sid":"GlueDataAccess",
             "Effect":"Allow",
             "Action":[
                "glue:GetDatabases",
                "glue:GetSchema",
                "glue:GetTables",
                "glue:GetDatabase",
                "glue:GetTable",
                "glue:ListSchemas",
                "glue:GetPartitions",
                "glue:GetConnections",
                "glue:GetConnection",
                "glue:CreateConnection"
             ],
             "Resource":[
                "arn:aws:glue:<region>:<account>:table/sagemaker*/*",
                "arn:aws:glue:<region>:<account>:database/sagemaker*",
                "arn:aws:glue:<region>:<account>:schema/sagemaker*",
                "arn:aws:glue:<region>:<account>:connection/sagemaker*",
                "arn:aws:glue:<region>:<account>:registry/sagemaker*",
                "arn:aws:glue:<region>:<account>:catalog"
             ]
          },
          {
             "Sid":"AthenaQueryExecution",
             "Effect":"Allow",
             "Action":[
                "athena:ListDataCatalogs",
                "athena:ListDatabases",
                "athena:ListTableMetadata",
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:RunQuery",
                "athena:StartSession",
                "athena:GetQueryResults",
                "athena:ListWorkGroups",
                "athena:GetDataCatalog",
                "athena:GetWorkGroup"
             ],
             "Resource":[
                "arn:aws:athena:<region>:<account>:workgroup/sagemaker*",
                "arn:aws:athena:<region>:<account>:datacatalog/sagemaker*"
             ]
          },
          {
             "Sid":"GetSecretsAndCredentials",
             "Effect":"Allow",
             "Action":[
                "secretsmanager:GetSecretValue",
                "redshift:GetClusterCredentials"
             ],
             "Resource":[
                "arn:aws:secretsmanager:<region>:<account>:secret:sagemaker*",
                "arn:aws:redshift:<region>:<account>:dbuser:sagemaker*/sagemaker*",
                "arn:aws:redshift:<region>:<account>:dbgroup:sagemaker*/sagemaker*",
                "arn:aws:redshift:<region>:<account>:dbname:sagemaker*/sagemaker*"
             ]
          }
       ]
    }

  • JupyterLab Space - Kuna buƙatar samun dama ga sabunta SageMaker Studio da JupyterLab Space tare da Rarraba SageMaker v1.6 ko daga baya sigar hoto. Idan kana amfani da hotuna na al'ada don JupyterLab Spaces ko tsofaffin nau'ikan Rarraba SageMaker (v1.5 ko ƙasa), koma zuwa shafi don umarni don shigar da fakitin da suka dace don kunna wannan fasalin a cikin mahallin ku. Don ƙarin koyo game da SageMaker Studio JupyterLab Spaces, koma zuwa Haɓaka haɓakawa akan Amazon SageMaker Studio: Gabatar da sararin JupyterLab da kayan aikin AI na haɓaka.
  • Shaidar samun damar tushen bayanai - Wannan fasalin littafin rubutu na SageMaker Studio yana buƙatar sunan mai amfani da damar kalmar sirri zuwa tushen bayanai kamar Snowflake da Amazon Redshift. Ƙirƙiri sunan mai amfani da hanyar shiga ta tushen kalmar sirri zuwa waɗannan kafofin bayanan idan ba ku da ɗaya. Samun tushen OAuth zuwa Snowflake ba sifa mai goyan bayan wannan rubutun ba.
  • Load SQL sihiri – Kafin ku gudanar da tambayoyin SQL daga tantanin halitta na Jupyter, yana da mahimmanci don loda tsawo na sihirin SQL. Yi amfani da umarnin %load_ext amazon_sagemaker_sql_magic don kunna wannan fasalin. Bugu da ƙari, za ku iya gudu da %sm_sql? umarni don duba cikakken jerin zaɓuɓɓukan da aka goyan baya don tambaya daga tantanin halitta SQL. Waɗannan zaɓuɓɓukan sun haɗa da saita ƙayyadaddun ƙayyadaddun ƙayyadaddun tambaya na 1,000, gudanar da cikakken cirewa, da alluran sigogin tambaya, da sauransu. Wannan saitin yana ba da damar sassauƙa da ingantaccen sarrafa bayanan SQL kai tsaye a cikin mahallin littafin ku.

Ƙirƙirar haɗin bayanai

Ginin binciken SQL da ikon aiwatarwa na SageMaker Studio an haɓaka ta hanyar haɗin AWS Glue. Haɗin AWS Glue abu ne na AWS Glue Data Catalog wanda ke adana mahimman bayanai kamar takaddun shaidar shiga, igiyoyin URI, da kuma bayanan girgije masu zaman kansu (VPC) don takamaiman shagunan bayanai. Ana amfani da waɗannan haɗin gwiwar ta hanyar AWS Glue crawlers, ayyuka, da ƙarshen ci gaba don samun dama ga nau'ikan shagunan bayanai. Kuna iya amfani da waɗannan hanyoyin haɗin don tushen tushen da bayanan manufa, har ma da sake yin amfani da haɗin guda ɗaya a cikin crawlers da yawa ko cirewa, canzawa, da lodi (ETL).

Don bincika tushen bayanan SQL a cikin sashin hagu na SageMaker Studio, da farko kuna buƙatar ƙirƙirar abubuwan haɗin AWS Glue. Waɗannan haɗin gwiwar suna sauƙaƙe samun dama ga tushen bayanai daban-daban kuma suna ba ku damar bincika abubuwan ƙirarsu.

A cikin sassan da ke gaba, muna tafiya ta hanyar ƙirƙirar masu haɗin AWS Glue takamaiman SQL. Wannan zai ba ku damar samun dama, dubawa, da kuma bincika ma'aunin bayanai a cikin ma'ajin bayanai iri-iri. Don ƙarin cikakkun bayanai game da haɗin AWS Glue, koma zuwa Haɗa zuwa bayanai.

Ƙirƙiri haɗin AWS Glue

Hanya daya tilo don kawo tushen bayanai cikin SageMaker Studio yana tare da haɗin AWS Glue. Kuna buƙatar ƙirƙirar haɗin AWS Glue tare da takamaiman nau'ikan haɗi. Har zuwa wannan rubuce-rubuce, kawai hanyar da aka goyan bayan ƙirƙirar waɗannan haɗin gwiwa shine amfani da Hanyar Layin Umarnin AWS (AWS CLI).

Fayil na JSON na haɗin kai

Lokacin haɗawa zuwa maɓuɓɓugan bayanai daban-daban a cikin AWS Glue, dole ne ka fara ƙirƙirar fayil ɗin JSON wanda ke bayyana kaddarorin haɗin kai-wanda ake magana da shi azaman fayil ma'anar haɗi. Wannan fayil ɗin yana da mahimmanci don kafa haɗin Glue na AWS kuma yakamata yayi cikakken dalla-dalla duk saitunan da ake buƙata don samun damar tushen bayanai. Don mafi kyawun ayyuka na tsaro, ana ba da shawarar amfani da Manajan Sirrin don adana mahimman bayanai kamar kalmomin shiga. A halin yanzu, ana iya sarrafa sauran kaddarorin haɗin kai kai tsaye ta hanyar haɗin AWS Glue. Wannan hanyar tana tabbatar da cewa an kiyaye mahimman bayanai yayin da har yanzu ana samun damar daidaitawar haɗin kai da sarrafa.

Mai zuwa shine misalin ma'anar haɗin gwiwa JSON:

{
    "ConnectionInput": {
        "Name": <GLUE_CONNECTION_NAME>,
        "Description": <GLUE_CONNECTION_DESCRIPTION>,
        "ConnectionType": "REDSHIFT | SNOWFLAKE | ATHENA",
        "ConnectionProperties": {
            "PythonProperties": "{"aws_secret_arn": <SECRET_ARN>, "database": <...>}"
        }
    }
}

Lokacin kafa haɗin Glue AWS don tushen bayanan ku, akwai wasu mahimman jagororin da za ku bi don samar da duka ayyuka da tsaro:

  • Stringification na kaddarorin - A cikin ciki PythonProperties key, tabbatar da duk kaddarorin suna maɓalli masu ƙima-ƙimar nau'i-nau'i. Yana da mahimmanci don tserewa yadda ya kamata a guje wa zance biyu ta amfani da halin baya () a inda ya cancanta. Wannan yana taimakawa kiyaye tsari mai kyau da kuma guje wa kura-kurai a cikin JSON ku.
  • Gudanar da mahimman bayanai – Ko da yake yana yiwuwa a haɗa duk kaddarorin haɗi a ciki PythonProperties, yana da kyau kada a haɗa da cikakkun bayanai kamar kalmomin shiga kai tsaye a cikin waɗannan kaddarorin. Madadin haka, yi amfani da Manajan Sirrin don sarrafa mahimman bayanai. Wannan hanyar tana kiyaye mahimman bayanan ku ta hanyar adana su a cikin yanayi mai sarrafawa da ɓoyewa, nesa da manyan fayilolin daidaitawa.

Ƙirƙiri haɗin AWS Glue ta amfani da AWS CLI

Bayan kun haɗa dukkan filayen da ake buƙata a cikin ma'anar haɗin haɗin ku fayil JSON, kuna shirye don kafa haɗin AWS Glue don tushen bayanan ku ta amfani da AWS CLI da umarni mai zuwa:

aws --region <REGION> glue create-connection 
--cli-input-json file:///path/to/file/connection/definition/file.json

Wannan umarnin yana ƙaddamar da sabon haɗin AWS Glue dangane da cikakkun bayanai dalla-dalla a cikin fayil ɗin JSON naku. Mai zuwa shine saurin rushewar sassan umarni:

  • –yanki - Wannan yana ƙayyade yankin AWS inda za a ƙirƙiri haɗin haɗin AWS ɗin ku. Yana da mahimmanci don zaɓar yankin inda tushen bayanan ku da sauran sabis suke don rage jinkiri da biyan buƙatun zama na bayanai.
  • -cli-input-json fayil:///path/to/file/connection/definition/file.json - Wannan sigar tana jagorantar AWS CLI don karanta tsarin shigarwa daga fayil na gida wanda ya ƙunshi ma'anar haɗin yanar gizon ku a cikin tsarin JSON.

Ya kamata ku sami damar ƙirƙirar haɗin haɗin AWS tare da umarnin AWS CLI na baya daga tashar tashar Studio JupyterLab. A kan fayil menu, zaɓi New da kuma Terminal.

idan create-connection Umurnin yana gudana cikin nasara, yakamata ku ga tushen bayanan ku da aka jera a cikin mashin bincike na SQL. Idan baku ga an jera tushen bayanan ku ba, zaɓi Refresh don sabunta cache.

Ƙirƙiri haɗin kan dusar ƙanƙara

A cikin wannan sashe, muna mai da hankali kan haɗa tushen bayanan Snowflake tare da SageMaker Studio. Ƙirƙirar asusun Snowflake, ma'ajin bayanai, da ma'ajin ajiya sun faɗi a waje da iyakokin wannan sakon. Don farawa da Snowflake, koma zuwa Jagorar mai amfani da dusar ƙanƙara. A cikin wannan sakon, muna mai da hankali kan ƙirƙirar fayil ɗin JSON ma'anar Snowflake da kafa haɗin tushen bayanan Snowflake ta amfani da AWS Glue.

Ƙirƙiri sirrin Manajan Sirrin

Kuna iya haɗawa zuwa asusun ku na Snowflake ta hanyar amfani da ID na mai amfani da kalmar wucewa ko amfani da maɓallai masu zaman kansu. Don haɗi tare da ID na mai amfani da kalmar wucewa, kuna buƙatar adana bayanan shaidarku amintacce a cikin Manajan Sirrin. Kamar yadda aka ambata a baya, ko da yake yana yiwuwa a saka wannan bayanin a ƙarƙashin PythonProperties, ba a ba da shawarar adana bayanai masu mahimmanci a cikin tsararren rubutu ba. Koyaushe tabbatar cewa ana sarrafa bayanai masu mahimmanci amintacce don gujewa yuwuwar haɗarin tsaro.

Don adana bayanai a cikin Mai sarrafa Sirri, kammala waɗannan matakai:

  1. A kan na'ura mai sarrafa sirri, zaɓi Ajiye sabon sirri.
  2. Ma Nau'in sirri, i Wani nau'in sirrin.
  3. Don maɓalli-darajar biyu, zaɓi Bayani kuma shigar da wadannan:
    {
        "user":"TestUser",
        "password":"MyTestPassword",
        "account":"AWSSAGEMAKERTEST"
    }

  4. Shigar da suna don sirrin ku, kamar sm-sql-snowflake-secret.
  5. Bar sauran saitunan azaman tsoho ko siffanta idan an buƙata.
  6. Ƙirƙiri sirrin.

Ƙirƙiri haɗin manne na AWS don Snowflake

Kamar yadda aka tattauna a baya, haɗin AWS Glue yana da mahimmanci don samun damar kowane haɗi daga SageMaker Studio. Kuna iya samun jerin sunayen duk abubuwan haɗin haɗin da aka goyan baya don Snowflake. Mai zuwa shine ma'anar haɗin haɗin haɗin gwiwa JSON don Snowflake. Maye gurbin ma'auni tare da ma'auni masu dacewa kafin ajiye shi zuwa faifai:

{
    "ConnectionInput": {
        "Name": "Snowflake-Airlines-Dataset",
        "Description": "SageMaker-Snowflake Airlines Dataset",
        "ConnectionType": "SNOWFLAKE",
        "ConnectionProperties": {
            "PythonProperties": "{"aws_secret_arn": "arn:aws:secretsmanager:<region>:<account>:secret:sm-sql-snowflake-secret", "database": "SAGEMAKERDEMODATABASE1"}"
        }
    }
}

Don ƙirƙirar abun haɗin haɗin AWS don tushen bayanan Snowflake, yi amfani da umarni mai zuwa:

aws --region <REGION> glue create-connection 
--cli-input-json file:///path/to/file/snowflake/definition/file.json

Wannan umarnin yana haifar da sabuwar hanyar haɗin tushen bayanan Snowflake a cikin mashigin bincikenku na SQL wanda za'a iya lilo, kuma kuna iya gudanar da tambayoyin SQL akansa daga tantanin halitta na JupyterLab.

Ƙirƙiri haɗin Amazon Redshift

Amazon Redshift cikakken sarrafawa ne, sabis na sikelin bayanai na petabyte wanda ke sauƙaƙawa da rage farashin nazarin duk bayanan ku ta amfani da daidaitaccen SQL. Hanyar ƙirƙirar haɗin Amazon Redshift kusa da madubi wanda don haɗin Snowflake.

Ƙirƙiri sirrin Manajan Sirrin

Kama da saitin Snowflake, don haɗawa da Amazon Redshift ta amfani da ID na mai amfani da kalmar wucewa, kuna buƙatar adana bayanan sirri amintacce a cikin Manajan Sirrin. Cika matakai masu zuwa:

  1. A kan na'ura mai sarrafa sirri, zaɓi Ajiye sabon sirri.
  2. Ma Nau'in sirri, i Takaddun shaida don gungu na Redshift na Amazon.
  3. Shigar da takaddun shaidar da aka yi amfani da su don shiga don samun damar Amazon Redshift azaman tushen bayanai.
  4. Zaɓi gunkin Redshift mai alaƙa da asirin.
  5. Shigar da suna don sirrin, kamar sm-sql-redshift-secret.
  6. Bar sauran saitunan azaman tsoho ko siffanta idan an buƙata.
  7. Ƙirƙiri sirrin.

Ta bin waɗannan matakan, kuna tabbatar da ana sarrafa bayanan haɗin ku amintacce, ta amfani da ingantaccen fasalin tsaro na AWS don sarrafa mahimman bayanai yadda ya kamata.

Ƙirƙiri haɗin AWS Glue don Amazon Redshift

Don saita haɗi tare da Amazon Redshift ta amfani da ma'anar JSON, cika filayen da suka dace kuma adana tsarin JSON mai zuwa zuwa faifai:

{
    "ConnectionInput": {
        "Name": "Redshift-US-Housing-Dataset",
        "Description": "sagemaker redshift us housing dataset connection",
        "ConnectionType": "REDSHIFT",
        "ConnectionProperties": {
            "PythonProperties": "{"aws_secret_arn": "arn:aws:secretsmanager:<region>:<account>:sm-sql-redshift-secret", "database": "us-housing-database"}"
        }
    }
}

Don ƙirƙirar abun haɗin AWS Glue don tushen bayanan Redshift, yi amfani da umarnin AWS CLI mai zuwa:

aws --region <REGION> glue create-connection 
--cli-input-json file:///path/to/file/redshift/definition/file.json

Wannan umarnin yana haifar da haɗi a cikin AWS Glue mai alaƙa da tushen bayanan Redshift ɗin ku. Idan umarnin ya yi nasara, za ku iya ganin tushen bayanan ku na Redshift a cikin littafin rubutu na SageMaker Studio JupyterLab, a shirye don gudanar da tambayoyin SQL da yin nazarin bayanai.

Ƙirƙiri haɗin haɗin Athena

Athena sabis ne na tambayar SQL mai cikakken sarrafawa daga AWS wanda ke ba da damar nazarin bayanan da aka adana a cikin Amazon S3 ta amfani da daidaitaccen SQL. Don saita haɗin Athena azaman tushen bayanai a cikin mai binciken SQL na JupyterLab, kuna buƙatar ƙirƙirar ma'anar haɗin haɗin Athena JSON. Tsarin JSON mai zuwa yana daidaita cikakkun bayanai masu mahimmanci don haɗawa zuwa Athena, ƙayyadaddun kas ɗin bayanai, da S3 tsarin jagora, da Yanki:

{
    "ConnectionInput": {
        "Name": "Athena-Credit-Card-Fraud",
        "Description": "SageMaker-Athena Credit Card Fraud",
        "ConnectionType": "ATHENA",
        "ConnectionProperties": {
            "PythonProperties": "{"catalog_name": "AwsDataCatalog","s3_staging_dir": "s3://sagemaker-us-east-2-123456789/athena-data-source/credit-card-fraud/", "region_name": "us-east-2"}"
        }
    }
}

Don ƙirƙirar abun haɗin haɗin AWS don tushen bayanan Athena, yi amfani da umarnin AWS CLI mai zuwa:

aws --region <REGION> glue create-connection 
--cli-input-json file:///path/to/file/athena/definition/file.json

Idan umarnin ya yi nasara, za ku iya samun damar shiga kundin bayanan Athena da teburi kai tsaye daga mai binciken SQL a cikin littafin rubutu na SageMaker Studio JupyterLab.

Bayanan tambaya daga tushe da yawa

Idan kuna da hanyoyin bayanai da yawa da aka haɗa cikin SageMaker Studio ta hanyar ginanniyar mai bincike ta SQL da fasalin littafin rubutu SQL, zaku iya hanzarta aiwatar da tambayoyin kuma ku canza ba tare da wahala ba tsakanin bayanan tushen bayanan a cikin sel na gaba a cikin littafin rubutu. Wannan damar tana ba da damar sauye-sauye marasa daidaituwa tsakanin mabambantan bayanai ko tushen bayanai yayin tafiyar aikin binciken ku.

Kuna iya gudanar da tambayoyi akan tarin tushen bayanai daban-daban kuma ku kawo sakamakon kai tsaye cikin sararin Python don ƙarin bincike ko gani. An sauƙaƙe wannan ta hanyar %%sm_sql Akwai umarnin sihiri a cikin littattafan rubutu na SageMaker Studio. Don fitar da sakamakon tambayar SQL ɗinku cikin pandas DataFrame, akwai zaɓuɓɓuka biyu:

  • Daga sandar kayan aiki na littafin rubutu, zaɓi nau'in fitarwa DataFrame kuma suna suna mai canjin DataFrame
  • Sanya siga mai zuwa zuwa naka %%sm_sql umurnin:
    --output '{"format": "DATAFRAME", "dataframe_name": "df"}'

Hoton da ke gaba yana kwatanta wannan tafiyar aiki kuma yana nuna yadda zaku iya gudanar da tambayoyi ba tare da wahala ba a cikin maɓuɓɓuka daban-daban a cikin sel na littafin rubutu na gaba, da kuma horar da ƙirar SageMaker ta amfani da ayyukan horo ko kai tsaye a cikin littafin rubutu ta amfani da lissafin gida. Bugu da ƙari, zane yana nuna yadda ginanniyar haɗin SQL na SageMaker Studio ke sauƙaƙa hanyoyin cirewa da gini kai tsaye a cikin sanannen muhallin tantanin halitta na JupyterLab.

Rubutu zuwa SQL: Amfani da harshe na halitta don haɓaka rubutun tambaya

SQL yare ne mai sarƙaƙƙiya wanda ke buƙatar fahimtar bayanan bayanai, teburi, ɗabi'a, da metadata. A yau, haɓakar basirar wucin gadi (AI) na iya ba ku damar rubuta hadaddun tambayoyin SQL ba tare da buƙatar ƙwarewar SQL mai zurfi ba. Ci gaban LLMs ya yi tasiri sosai ga sarrafa harshe na halitta (NLP) tushen SQL tsararru, yana ba da damar ƙirƙirar madaidaicin tambayoyin SQL daga kwatancen harshe na halitta-dabarar da ake kira Rubutu-zuwa-SQL. Koyaya, yana da mahimmanci a san bambance-bambancen da ke tsakanin harshen ɗan adam da SQL. Harshen ɗan adam na iya zama wani lokaci maɗaukaki ko mara kyau, yayin da SQL aka tsara shi, bayyane, kuma maras tabbas. Cikakkar wannan gibin da canza harshe daidai gwargwado zuwa tambayoyin SQL na iya gabatar da babban kalubale. Lokacin da aka samar da abubuwan da suka dace, LLMs na iya taimakawa wajen cike wannan gibin ta hanyar fahimtar manufar harshen ɗan adam da samar da ingantattun tambayoyin SQL daidai da haka.

Tare da fitowar fasalin tambayar SQL na SageMaker Studio a cikin littafin rubutu, SageMaker Studio ya sa ya zama mai sauƙi don bincika bayanan bayanai da tsare-tsare, da marubuci, gudu, da cire tambayoyin SQL ba tare da barin Jupyter IDE littafin rubutu ba. Wannan sashe yana bincika yadda ƙarfin Rubutu-zuwa-SQL na ci-gaba LLMs zai iya sauƙaƙe ƙirƙirar tambayoyin SQL ta amfani da yaren halitta a cikin littattafan rubutu na Jupyter. Muna amfani da ƙirar rubutu-zuwa-SQL mai yanke-yanke defog/sqlcoder-7b-2 tare da Jupyter AI, mataimaki na AI mai haɓakawa wanda aka tsara musamman don littattafan rubutu na Jupyter, don ƙirƙirar tambayoyin SQL masu rikitarwa daga harshe na halitta. Ta amfani da wannan ci-gaba samfurin, za mu iya iya ƙoƙarinmu da ingantaccen ƙirƙira hadaddun tambayoyin SQL ta amfani da harshe na halitta, don haka haɓaka ƙwarewar SQL ɗin mu a cikin littattafan rubutu.

Samfuran littafin rubutu ta amfani da Hugging Face Hub

Don fara samfuri, kuna buƙatar waɗannan abubuwa:

  • GitHub code - Lambar da aka gabatar a wannan sashe yana samuwa a cikin mai zuwa GitHub repo kuma ta hanyar yin nuni ga littafin rubutu misali.
  • JupyterLab Space - Samun damar zuwa SageMaker Studio JupyterLab Space wanda ke goyan bayan tushen tushen GPU yana da mahimmanci. Domin defog/sqlcoder-7b-2 samfurin, samfurin siga na 7B, ta amfani da misalin ml.g5.2xlarge ana bada shawarar. Madadin kamar defog/sqlcoder-70b-alpha ko defog/sqlcoder-34b-alpha Hakanan ana iya amfani da shi don yaren halitta zuwa jujjuyawar SQL, amma ana iya buƙatar nau'ikan misali mafi girma don yin samfuri. Tabbatar cewa kuna da keɓaɓɓiyar ƙaddamar da misali mai goyan bayan GPU ta hanyar kewayawa zuwa na'ura mai ba da hanya tsakanin hanyoyin sadarwa, neman SageMaker, da neman Studio JupyterLab Apps running on <instance type>.

Ƙaddamar da sabon sararin JupyterLab mai goyon bayan GPU daga SageMaker Studio. Ana ba da shawarar ƙirƙirar sabon sararin JupyterLab tare da aƙalla 75 GB na Amazon Elastic Block Store (Amazon EBS) ajiya don ƙirar siga na 7B.

  • Rungumar Fuska Hub - Idan yankin SageMaker Studio ɗin ku yana da damar saukar da samfura daga Rungumar Fuska Hub, zaka iya amfani da AutoModelForCausalLM class daga hugging face/masu canza fuska don zazzage samfura ta atomatik kuma saka su zuwa GPUs na gida. Za a adana ma'aunin ƙira a cikin ma'ajin injin ku na gida. Duba lambar mai zuwa:
    model_id = "defog/sqlcoder-7b-2" # or use "defog/sqlcoder-34b-alpha", "defog/sqlcoder-70b-alpha
    
    # download model and tokenizer in fp16 and pin model to local notebook GPUs
    model = AutoModelForCausalLM.from_pretrained(
        model_id, 
        device_map="auto",
        torch_dtype=torch.float16
    )
    
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token

Bayan an gama zazzage samfurin gabaɗaya kuma an loda shi cikin ƙwaƙwalwar ajiya, yakamata ku lura da karuwar amfani da GPU akan injin ku na gida. Wannan yana nuna cewa ƙirar tana amfani da albarkatun GPU sosai don ayyukan lissafi. Kuna iya tabbatar da wannan a cikin sararin JupyterLab na ku ta hanyar gudu nvidia-smi (don nunin lokaci ɗaya) ko nvidia-smi —loop=1 (don maimaita kowace daƙiƙa) daga tashar JupyterLab ɗin ku.

Samfuran Rubutu-zuwa-SQL sun yi fice wajen fahimtar manufa da mahallin buƙatun mai amfani, ko da lokacin da harshen da ake amfani da shi na tattaunawa ne ko maƙasudi. Tsarin ya ƙunshi fassarar abubuwan shigar da harshe na halitta cikin madaidaitan abubuwan tsara bayanai, kamar sunayen tebur, sunayen shafi, da yanayi. Koyaya, ƙirar Rubutu-zuwa-SQL na waje ba zai iya sanin tsarin ma'ajiyar bayanan ku ba, takamaiman tsare-tsare na bayanai, ko kuma iya fassara daidaitattun abubuwan da ke cikin tebur bisa sunaye kawai. Don yin amfani da waɗannan samfuran yadda ya kamata don samar da ingantattun tambayoyin SQL masu inganci daga yare na halitta, ya zama dole a daidaita ƙirar ƙarni na rubutu na SQL zuwa ƙayyadaddun tsarin bayanan ku. Ana sauƙaƙe wannan daidaitawa ta hanyar amfani da LLM ya sa. Mai zuwa shine samfurin gaggawar da aka ba da shawarar ga samfurin defog/sqlcoder-7b-2 Rubutu-zuwa-SQL, zuwa kashi huɗu:

  • Task - Wannan sashe ya kamata ya ƙayyade babban aikin da za a yi ta samfurin. Ya kamata ya haɗa da nau'in bayanan bayanan bayanan (kamar Amazon RDS, PostgreSQL, ko Amazon Redshift) don sa samfurin ya san duk wani bambance-bambancen da aka daidaita wanda zai iya rinjayar ƙarni na tambayar SQL ta ƙarshe.
  • Umurnai - Wannan sashe yakamata ya ayyana iyakoki na aiki da wayar da kan yanki don ƙirar, kuma yana iya haɗawa da ƴan misalan harbi don jagorantar ƙirar wajen samar da ingantattun tambayoyin SQL.
  • Tsarin Database – Wannan sashe ya kamata ya ba da cikakken bayani game da tsare-tsaren bayanan ajiyar ku, yana bayyana alaƙar da ke tsakanin teburi da ginshiƙai don taimakawa ƙirar don fahimtar tsarin bayanan.
  • Amsa – Wannan sashe an tanada shi don ƙirar don fitar da martanin tambayar SQL zuwa shigar da harshe na halitta.

Misali na tsarin tsarin bayanai da faɗakarwa da aka yi amfani da su a cikin wannan sashe yana samuwa a cikin GitHub Repo.

### Task
Generate a SQL query to answer [QUESTION]{user_question}[/QUESTION]

### Instructions
- If you cannot answer the question with the available database schema, return 'I do not know'

### Database Schema
The query will run on a database with the following schema:
{table_metadata_string_DDL_statements}

### Answer
Given the database schema, here is the SQL query that 
 [QUESTION]
    {user_question}
 [/QUESTION]

[SQL]

Injin gaggawa ba kawai game da samar da tambayoyi ko bayanai ba ne; fasaha ce da kimiyyar da ba ta dace ba wacce ke tasiri sosai ga ingancin hulɗa tare da ƙirar AI. Yadda kuke ƙera faɗakarwa na iya yin tasiri sosai ga yanayi da fa'idar amsawar AI. Wannan fasaha tana da mahimmanci wajen haɓaka yuwuwar hulɗar AI, musamman a cikin ɗaruruwan ayyuka masu buƙatar fahimta na musamman da cikakkun bayanai.

Yana da mahimmanci a sami zaɓi don ginawa da gwada martanin samfuri don saurin da aka bayar da inganta saƙon bisa ga amsa. Littattafan bayanin kula na JupyterLab suna ba da damar karɓar ra'ayoyin samfurin nan take daga samfurin da ke gudana akan ƙididdigewa na gida da haɓaka saurin sauri da ƙara ƙara amsa samfurin ko canza samfurin gaba ɗaya. A cikin wannan post ɗin, muna amfani da littafin rubutu na SageMaker Studio JupyterLab wanda ke goyan bayan ml.g5.2xlarge's NVIDIA A10G 24 GB GPU don gudanar da ƙirar ƙirar Rubutu-zuwa-SQL akan littafin rubutu kuma tare da haɓaka ƙirar ƙirar mu har sai an daidaita martanin samfurin don samarwa. martanin da ake aiwatarwa kai tsaye a cikin sel SQL na JupyterLab. Don gudanar da ƙididdige ƙididdiga da ƙaddamar da martanin samfurin lokaci guda, muna amfani da haɗin gwiwa model.generate da kuma TextIteratorStreamer kamar yadda aka ayyana a cikin code mai zuwa:

streamer = TextIteratorStreamer(
    tokenizer=tokenizer, 
    timeout=240.0, 
    skip_prompt=True, 
    skip_special_tokens=True
)


def llm_generate_query(user_question):
    """ Generate text-gen SQL responses"""
    
    updated_prompt = prompt.format(question=user_question)
    inputs = tokenizer(updated_prompt, return_tensors="pt").to("cuda")
    
    return model.generate(
        **inputs,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
        max_new_tokens=1024,
        temperature=0.1,
        do_sample=False,
        num_beams=1, 
        streamer=streamer,
    )

Ana iya ƙawata fitowar samfurin tare da sihiri SageMaker SQL %%sm_sql ..., wanda ke ba da damar littafin rubutu na JupyterLab don gane tantanin halitta azaman tantanin halitta SQL.

Mai watsa shiri Tsarin Rubutu-zuwa-SQL azaman ƙarshen ƙarshen SageMaker

A ƙarshen matakin ƙirƙira, mun zaɓi fifikon Rubutu-zuwa-SQL LLM, ingantaccen tsari mai sauri, da nau'in misali da ya dace don ɗaukar samfurin (ko dai-GPU ɗaya ko Multi-GPU). SageMaker yana sauƙaƙe ƙaddamar da ƙima na samfuran al'ada ta hanyar amfani da ƙarshen ƙarshen SageMaker. Ana iya bayyana waɗannan wuraren ƙarshe bisa ga ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙayyadaddun ƙa'idodi, ke ba da izinin ƙaddamar da LLMs azaman wuraren ƙarshe. Wannan damar tana ba ku damar daidaita mafita zuwa ga mafi yawan masu sauraro, kyale masu amfani su samar da tambayoyin SQL daga abubuwan shigar da harshe na halitta ta amfani da LLMs da aka shirya. Zane na gaba yana kwatanta wannan gine-gine.

Don karɓar bakuncin LLM ɗinku azaman ƙarshen ƙarshen SageMaker, kuna samar da kayan tarihi da yawa.

Na farko kayan tarihi shine samfurin ma'aunin nauyi. SageMaker Deep Java Library (DJL) Hidima kwantena suna ba ku damar saita saiti ta hanyar meta hidima.kayayyaki fayil, wanda ke ba ku damar jagorantar yadda ake samo samfura-ko dai kai tsaye daga Hugging Face Hub ko ta zazzage kayan tarihi daga Amazon S3. Idan ka saka model_id=defog/sqlcoder-7b-2, DJL Serving zai yi ƙoƙarin sauke wannan samfurin kai tsaye daga Hugging Face Hub. Koyaya, ƙila za ku iya haifar da cajin ingress/egress na hanyar sadarwa a duk lokacin da aka ƙaddamar da ƙarshen ƙarshen ko auna ma'auni. Don guje wa waɗannan cajin da yuwuwar saurin zazzage kayan ƙirar ƙira, ana ba da shawarar tsallake amfani model_id in serving.properties da adana ma'aunin ƙira azaman kayan tarihi na S3 kuma kawai saka su da s3url=s3://path/to/model/bin.

Ajiye samfurin (tare da tokenizer) zuwa faifai da loda shi zuwa Amazon S3 ana iya cika shi tare da ƴan layin lamba:

# save model and tokenizer to local disk
model.save_pretrained(local_model_path)
tokenizer.save_pretrained(local_model_path)
...
...
...
# upload file to s3
s3_bucket_name = "<my llm artifact bucket name>>"
# s3 prefix to save model weights and tokenizer defs
model_s3_prefix = "sqlcoder-7b-instruct/weights"
# s3 prefix to store s
meta_model_s3_prefix = "sqlcoder-7b-instruct/meta-model"

sagemaker.s3.S3Uploader.upload(local_model_path,  f"s3://{s3_bucket_name}/{model_s3_prefix}")

Hakanan kuna amfani da fayil ɗin gaggawar bayanai. A cikin wannan saitin, saitin bayanan yana kunshe da Task, Instructions, Database Schema, Da kuma Answer sections. Don tsarin gine-gine na yanzu, muna keɓance keɓantaccen fayil ɗin gaggawa don kowane tsarin bayanai. Duk da haka, akwai sassauci don faɗaɗa wannan saitin don haɗa bayanai da yawa a kowane fayil mai sauri, yana barin ƙirar ta gudanar da haɗaɗɗen haɗaɗɗiyar bayanan bayanai akan sabar iri ɗaya. A lokacin matakin ƙirar mu, muna adana bayanan bayanan da sauri azaman fayil ɗin rubutu mai suna <Database-Glue-Connection-Name>.prompt, inda Database-Glue-Connection-Name yayi daidai da sunan haɗin da ake gani a mahallin JupyterLab ɗin ku. Misali, wannan sakon yana nufin haɗin Snowflake mai suna Airlines_Dataset, don haka sunan fayil mai sauri na bayanai Airlines_Dataset.prompt. Ana adana wannan fayil ɗin akan Amazon S3 daga baya kuma an karanta shi kuma a adana shi ta hanyar ƙirar aikin mu.

Haka kuma, wannan gine-ginen yana ba wa duk wani masu amfani da izini izini na wannan ƙarshen ƙarshen su ayyana, adanawa, da samar da yare na halitta zuwa tambayoyin SQL ba tare da buƙatar sake fasalin ƙirar ba. Muna amfani da wadannan misali mai sauri database don nuna aikin Rubutu-zuwa-SQL.

Na gaba, kuna samar da dabaru na sabis na samfur na al'ada. A cikin wannan sashe, kuna zayyana dabarun fahimtar al'ada mai suna samfurin.py. An tsara wannan rubutun don haɓaka aiki da haɗin kai na ayyukan Rubutu-zuwa-SQL:

  • Ƙayyade ma'anar ma'auni mai sauri na fayil ɗin caching - Don rage jinkiri, muna aiwatar da dabaru na al'ada don zazzagewa da adana fayilolin gaggawar bayanai. Wannan tsarin yana tabbatar da cewa tsokaci yana samuwa cikin sauƙi, yana rage yawan abin da ke da alaƙa da zazzagewa akai-akai.
  • Ƙayyade ƙididdiga na ƙididdiga na al'ada - Don haɓaka saurin ƙididdigewa, samfurin mu na rubutu-zuwa-SQL ana ɗora shi a cikin daidaitaccen tsarin float16 sannan a canza shi zuwa ƙirar DeepSpeed ​​​​. Wannan matakin yana ba da damar ƙididdige ƙididdiga mafi inganci. Bugu da ƙari, a cikin wannan ma'ana, kun ƙididdige waɗanne sigogi masu amfani za su iya daidaitawa yayin kiran ƙira don daidaita ayyukan gwargwadon bukatunsu.
  • Ƙayyade shigarwar al'ada da dabaru na fitarwa - Ƙaddamar bayyanannun tsarin shigar da / fitarwa na musamman yana da mahimmanci don haɗin kai tare da aikace-aikacen ƙasa. Ɗayan irin wannan aikace-aikacen shine JupyterAI, wanda zamu tattauna a cikin sashe na gaba.
%%writefile {meta_model_filename}/model.py
...

predictor = None
prompt_for_db_dict_cache = {}

def download_prompt_from_s3(prompt_filename):

    print(f"downloading prompt file: {prompt_filename}")
    s3 = boto3.resource('s3')
    ...


def get_model(properties):
    
    ...
    print(f"Loading model from {cwd}")
    model = AutoModelForCausalLM.from_pretrained(
        cwd, 
        low_cpu_mem_usage=True, 
        torch_dtype=torch.bfloat16
    )
    model = deepspeed.init_inference(
        model, 
        mp_size=properties["tensor_parallel_degree"]
    )
    
    ...


def handle(inputs: Input) -> None:

    ...

    global predictor
    if not predictor:
        predictor = get_model(inputs.get_properties())

    ...
    result = f"""%%sm_sql --metastore-id {prompt_for_db_key.split('.')[0]} --metastore-type GLUE_CONNECTIONnn{result}n"""
    result = [{'generated_text': result}]
    
    return Output().add(result)

Bugu da ƙari, mun haɗa da a serving.properties fayil, wanda ke aiki azaman fayil ɗin daidaitawa na duniya don samfuran da aka shirya ta amfani da hidimar DJL. Don ƙarin bayani, koma zuwa Saituna da saituna.

A ƙarshe, kuna iya haɗawa da a requirements.txt fayil don ayyana ƙarin samfura da ake buƙata don ƙaddamarwa da kunshin komai a cikin ƙwallon kwando don turawa.

Duba lambar mai zuwa:

os.system(f"tar czvf {meta_model_filename}.tar.gz ./{meta_model_filename}/")

>>>./deepspeed-djl-serving-7b/
>>>./deepspeed-djl-serving-7b/serving.properties
>>>./deepspeed-djl-serving-7b/model.py
>>>./deepspeed-djl-serving-7b/requirements.txt

Haɗa ƙarshen ƙarshen ku tare da mataimakin SageMaker Studio Jupyter AI

Jupyter AI kayan aiki ne mai buɗewa wanda ke kawo AI mai haɓakawa zuwa littattafan rubutu na Jupyter, yana ba da ingantaccen dandamali mai aminci da mai amfani don bincika samfuran AI na haɓaka. Yana haɓaka haɓaka aiki a cikin littattafan rubutu na JupyterLab da Jupyter ta hanyar samar da fasali kamar %% ai sihiri don ƙirƙirar filin wasa na AI mai ƙirƙira a cikin littattafan rubutu, UI na asali a cikin JupyterLab don yin hulɗa tare da AI azaman mataimaki na tattaunawa, da goyan baya ga ɗimbin tsararrun LLMs daga azurtawa kamar Amazon Titan, AI21, Anthropic, Cohere, da Hugging Face ko gudanar da ayyuka kamar Amazon Bedrock da SageMaker karshen. Don wannan post ɗin, muna amfani da haɗin kai na waje na Jupyter AI tare da ƙarshen ƙarshen SageMaker don kawo damar Rubutu-zuwa-SQL cikin littattafan rubutu na JupyterLab. Kayan aikin Jupyter AI ya zo an riga an shigar dashi a cikin duk SageMaker Studio JupyterLab Spaces wanda ke goyon bayan Hotunan Rarraba SageMaker; Ba a buƙatar masu amfani na ƙarshe don yin kowane ƙarin saiti don fara amfani da tsawo na Jupyter AI don haɗawa tare da SageMaker da aka karbi bakuncin ƙarshen ƙarshen. A cikin wannan sashe, mun tattauna hanyoyin biyu don amfani da kayan aikin Jupyter AI da aka haɗa.

Jupyter AI a cikin littafin rubutu ta amfani da sihiri

Jupyter AI %%ai umarnin sihiri yana ba ku damar canza littattafan rubutu na SageMaker Studio JupyterLab zuwa yanayin AI mai haɓakawa. Don fara amfani da sihirin AI, tabbatar kun loda tsawo na jupyter_ai_magics don amfani %%ai sihiri, da bugu da žari lodi amazon_sagemaker_sql_magic don amfani %%sm_sql sihiri:

# load sm_sql magic extension and ai magic extension
%load_ext jupyter_ai_magics
%load_ext amazon_sagemaker_sql_magic

Don gudanar da kira zuwa ƙarshen SageMaker daga littafin ku ta amfani da %%ai umarnin sihiri, samar da sigogi masu zuwa da tsara umarnin kamar haka:

  • -yanki-suna – Ƙayyade yankin da aka tura ƙarshen ƙarshen ku. Wannan yana tabbatar da cewa an tura buƙatar zuwa wurin da ya dace.
  • - buƙatar-tsari - Haɗa tsarin bayanan shigarwa. Wannan tsari yana zayyana tsarin da ake tsammani da nau'ikan bayanan shigar da samfurin ku ke buƙata don aiwatar da buƙatar.
  • -hanyar amsa - Ƙayyade hanya a cikin abin amsawa inda aka samo samfurin samfurin ku. Ana amfani da wannan hanyar don fitar da bayanan da suka dace daga martanin da samfurin ku ya dawo.
  • -f (na zaɓi) - Wannan wani mai tsara fitarwa tutar da ke nuna nau'in fitarwa da samfurin ya dawo. A cikin mahallin littafin rubutu na Jupyter, idan fitarwar lamba ce, yakamata a saita wannan tuta yadda ya kamata don tsara fitarwa azaman lambar aiwatarwa a saman tantanin rubutu na Jupyter, sannan kuma wurin shigar da rubutu kyauta don hulɗar mai amfani.

Misali, umarni a cikin tantanin halitta na Jupyter na iya yin kama da lambar mai zuwa:

%%ai sagemaker-endpoint:<endpoint-name> --region-name=us-east-1 
--request-schema={
    "inputs":"<prompt>", 
    "parameters":{
        "temperature":0.1,
        "top_p":0.2,
        "max_new_tokens":1024,
        "return_full_text":false
    }, 
    "db_prompt":"Airlines_Dataset.prompt"
  } 
--response-path=[0].generated_text -f code

My natural language query goes here...

Jupyter AI hira taga

A madadin, zaku iya yin hulɗa tare da wuraren ƙarshen SageMaker ta hanyar ginanniyar ƙirar mai amfani, sauƙaƙe tsarin samar da tambayoyi ko shiga cikin tattaunawa. Kafin fara yin taɗi tare da ƙarshen SageMaker, saita saitunan da suka dace a cikin Jupyter AI don ƙarshen ƙarshen SageMaker, kamar yadda aka nuna a cikin hoto mai zuwa.

Kammalawa

SageMaker Studio yanzu yana sauƙaƙawa da daidaita ayyukan masana kimiyyar bayanai ta hanyar haɗa tallafin SQL cikin littattafan rubutu na JupyterLab. Wannan yana bawa masana kimiyyar bayanai damar mayar da hankali kan ayyukansu ba tare da buƙatar sarrafa kayan aiki da yawa ba. Bugu da ƙari, sabon haɗin SQL da aka gina a cikin SageMaker Studio yana bawa mutane damar samar da tambayoyin SQL ba tare da wahala ba ta amfani da rubutun yare na halitta azaman shigarwa, don haka yana haɓaka aikin su.

Muna ƙarfafa ku don bincika waɗannan fasalulluka a cikin SageMaker Studio. Don ƙarin bayani, koma zuwa Shirya bayanai tare da SQL a cikin Studio.

shafi

Kunna mai binciken SQL da littafin rubutu SQL cell a cikin mahalli na al'ada

Idan ba kwa amfani da hoton Rarraba SageMaker ko amfani da hotunan Rarraba 1.5 ko ƙasa, gudanar da waɗannan umarni don kunna fasalin binciken SQL a cikin mahallin JupyterLab:

npm install -g vscode-jsonrpc
npm install -g sql-language-server
pip install amazon-sagemaker-sql-execution==0.1.0
pip install amazon-sagemaker-sql-editor
restart-jupyter-server

Matsar da widget din burauzar SQL

JupyterLab widgets suna ba da izinin ƙaura. Dangane da abin da kuka fi so, zaku iya matsar da widget din zuwa kowane bangare na kayan aikin widget din JupyterLab. Idan kun fi so, zaku iya matsar da hanyar widget din SQL zuwa gefe kishiyar (dama zuwa hagu) na mashin ɗin tare da danna dama mai sauƙi akan gunkin widget ɗin kuma zaɓi. Canja Sidebar Side.


Game da marubutan

Pranav Murthy shine AI/ML Specialist Solutions Architect a AWS. Yana mai da hankali kan taimaka wa abokan ciniki ginawa, horarwa, turawa da ƙaura aikin koyon injin (ML) zuwa SageMaker. A baya ya yi aiki a cikin masana'antar semiconductor haɓaka manyan hangen nesa na kwamfuta (CV) da ƙirar sarrafa harshe na halitta (NLP) don haɓaka matakan semiconductor ta amfani da fasahar ML na fasaha. A lokacin hutunsa, yana jin daɗin wasan dara da tafiye-tafiye. Kuna iya samun Pranav akan LinkedIn.

Varun Shah Injiniyan Software ne wanda ke aiki akan Amazon SageMaker Studio a Sabis na Yanar Gizon Amazon. Ya mai da hankali kan gina hanyoyin haɗin gwiwar ML waɗanda ke sauƙaƙe sarrafa bayanai da tafiye-tafiyen shirye-shiryen bayanai. A cikin lokacinsa, Varun yana jin daɗin ayyukan waje da suka haɗa da tafiye-tafiye da ski, kuma koyaushe yana kan gano sabbin wurare masu ban sha'awa.

Sumedha Swamy Babban Manajan Samfura ne a Sabis na Yanar Gizo na Amazon inda yake jagorantar ƙungiyar SageMaker Studio a cikin manufarta don haɓaka IDE zaɓi don ilimin kimiyyar bayanai da na'ura. Ya sadaukar da shekaru 15 da suka gabata na gina Injin Learning tushen mabukaci da samfuran masana'antu.

Bosco Albuquerque shi ne Sr. Partner Solutions Architect a AWS kuma yana da fiye da shekaru 20 na gwaninta aiki tare da bayanan bayanai da samfurori daga masu sayar da bayanan kasuwanci da masu samar da girgije. Ya taimaka wa kamfanonin fasaha tsarawa da aiwatar da hanyoyin nazarin bayanai da samfurori.

tabs_img

Sabbin Hankali

tabs_img