SQOOP : Data transfer between Hadoop and RDBMS

Apache Sqoop is a tool in Hadoop ecosystem which is designed to transfer data between HDFS (Hadoop storage) and RDBMS (relational database) servers like SQLite, Oracle, MySQL, Netezza, Teradata, Postgres etc. Apache Sqoop imports data from relational databases to HDFS, and exports data from HDFS to relational databases. It efficiently transfers bulk data between Hadoop and external data stores such as enterprise data warehouses, relational databases, etc.

This is how Sqoop got its name – “SQL to Hadoop & Hadoop to SQL”.

Additionally, Sqoop is used to import data from external datastores into Hadoop ecosystem’s tools like Hive & HBase.

An in-depth introduction to SQOOP architecture
SQOOP Architecture
Why Sqoop?

When the data residing in the relational database management systems need to be transferred to HDFS. The task of writing MapReduce code for importing and exporting data from the relational database to HDFS is uninteresting & tedious. This is where Apache Sqoop comes to rescue and removes their pain. It automates the process of importing & exporting the data.

Sqoop makes the life of developers easy by providing CLI for importing and exporting data. They just have to provide basic information like database authentication, source, destination, operations etc. It takes care of the remaining part.

Sqoop internally converts the command into MapReduce tasks, which are then executed over HDFS. It uses YARN framework to import and export the data, which provides fault tolerance on top of parallelism.

Mostly two sqoop commands are used by programmers which are sqoop import and export. Let’s discuss them one by one.

SQOOP IMPORT Command

Sqoop import command imports a table from an RDBMS to HDFS. In our case, we are going to import tables from MySQL databases to HDFS.

Syntax
$ sqoop import (generic-args) (import-args)
$ sqoop-import (generic-args) (import-args)

As you can see in the below image, we have student table in the mydb database which we will be importing into HDFS.  

The command for importing table is:

sqoop import --connect jdbc:mysql://localhost/mydb --username root --password root --table root -m 1 --target-dir /sqoop_import_01

As you can see in the below images,  after executing this command SQL statement and Map tasks will be executed at the back end. 

After the command is executed, you can check the HDFS target folder (in our case sqoop_import_01) where the data is imported.

Import control arguments:

ArgumentDescription
--appendAppend data to an existing dataset in HDFS
--as-avrodatafileImports data to Avro Data Files
--as-sequencefileImports data to SequenceFiles
--as-textfileImports data as plain text (default)
--as-parquetfileImports data to Parquet Files
--boundary-query <statement>Boundary query to use for creating splits
--columns <col,col,col…>Columns to import from table
--delete-target-dirDelete the import target directory if it exists
--directUse direct connector if exists for the database
--fetch-size <n>Number of entries to read from database at once.
--inline-lob-limit <n>Set the maximum size for an inline LOB
-m,--num-mappers <n>Use n map tasks to import in parallel
-e,--query <statement>Import the results of statement.
--split-by <column-name>Column of the table used to split work units. Cannot be used with --autoreset-to-one-mapper option.
--split-limit <n>Upper Limit for each split size. This only applies to Integer and Date columns. For date or timestamp fields it is calculated in seconds.
--autoreset-to-one-mapperImport should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with --split-by <col> option.
--table <table-name>Table to read
--target-dir <dir>HDFS destination dir
--temporary-rootdir <dir>HDFS directory for temporary files created during import (overrides default “_sqoop”)
--warehouse-dir <dir>HDFS parent for table destination
--where <where clause>WHERE clause to use during import
-z,--compressEnable compression
--compression-codec <c>Use Hadoop codec (default gzip)
--null-string <null-string>The string to be written for a null value for string columns
--null-non-string <null-string>The string to be written for a null value for non-string columns

Now, try to increase the no. of map task from 1 to 2 in the query to run it in 2 parallel threads and let’s see the result.

sqoop import --connect jdbc:mysql://localhost/mydb --username root --password root --table root -m 2 --target-dir /sqoop_import_02

So, as you can see, if more than 1 map task (-m 2) is used in the import command than table should have a primary key or specify –split-by in your command.

There are lots of other import arguments available which you can use as per your requirement.

SQOOP Export Command

The sqoop export command exports a set of files from HDFS back to an RDBMS. The target table must already exist in the database. The input files are read and parsed into a set of records according to the user-specified delimiters.

The default operation is to transform these into a set of INSERT statements that inject the records into the database. In “update mode,” Sqoop will generate UPDATE statements that replace existing records in the database, and in “call mode” Sqoop will make a stored procedure call for each record.

Syntax
$ sqoop export (generic-args) (export-args) 
$ sqoop-export (generic-args) (export-args)

So, first we are creating an empty table, where we will export our data.

Creating Table for Sqoop Export - Apache Sqoop Tutorial - Edureka

The command to export data from HDFS to the relational database is:

sqoop export --connect jdbc:mysql://localhost/mydb --username root --password root --table emp --export-dir /sqoop_export_01
Data in Table after Sqoop Export - Apache Sqoop Tutorial - Edureka

Export control arguments:

ArgumentDescription
--columns <col,col,col…>Columns to export to table
--directUse direct export fast path
--export-dir <dir>HDFS source path for the export
-m,--num-mappers <n>Use n map tasks to export in parallel
--table <table-name>Table to populate
--call <stored-proc-name>Stored Procedure to call
--update-key <col-name>Anchor column to use for updates. Use a comma separated list of columns if there are more than one column.
--update-mode <mode>Specify how updates are performed when new rows are found with non-matching keys in database.
Legal values for mode include updateonly (default) and allowinsert.
--input-null-string <null-string>The string to be interpreted as null for string columns
--input-null-non-string <null-string>The string to be interpreted as null for non-string columns
--staging-table <staging-table-name>The table in which data will be staged before being inserted into the destination table.
--clear-staging-tableIndicates that any data present in the staging table can be deleted.
--batchUse batch mode for underlying statement execution.

To import or export, the order of columns in both MySQL and Hive should be the same.

I hope you have enjoyed this post and it helped you to understand in sqoop import and export command. Please like and share and feel free to comment if you have any suggestions or feedback.

Tagged : / / / / / /

Generate Sequential and Unique IDs in a Spark Dataframe

When the data is in one table or dataframe (in one machine), adding sequential/unique ids is pretty straight forward. What happens though when you have distributed data, split into partitions that might resides in different machines like in apache spark?
And, coming from traditional relational databases, one may be used to working with ids (auto incremented usually) for identification, ordering and use them as reference in the constraints in data. Depending on the need, we might be in a position where we can benefit from having a unique auto-increment-ids like behavior in a spark dataframe. Let’s discuss them and the catch behind using them in detail.

monotonically_increasing_id

Since Spark 1.6 there is a function called monotonically_increasing_id().
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. It generates a new column with unique 64-bit monotonic index for each row. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the Spark DataFrame has less than 1 billion partitions, and each partition has less than 8 billion records.

As an example, consider a Spark DataFrame with two partitions, each with 3 records. This expression would return the following IDs: 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.

val dfWithUniqueId = df.withColumn("unique_id", monotonically_increasing_id())

Remember it will always generate 10 digit numeric values even if you have few records in a dataframe. Also, these ids are unique but not sequential.
So, use it in such cases where you want only unique ids with no length constraint(max ).

This is equivalent to the MONOTONICALLY_INCREASING_ID function in SQL.

zipWithIndex – RDD

Another option is to fall back to RDD (Resilient Distributed Dataset) like df.rdd.zipWithIndex(). RDD is a collection of elements partitioned across the nodes of the cluster that can be operated in parallel.

Zips the RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index.

This is similar to Scala’s zipWithIndex but it uses Long instead of Int as the index type. This method needs to trigger a spark job when this RDD contains more than one partitions.

# use zipWithIndex to add the indexes in RDD and then toDF to get back 
to a dataframe and then you will get below like structure:
+--------+---+
| _1     | _2|
+--------+---+
| [1, 2] | 0 |
|[15, 21]| 1 |
+--------+---+

Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The index assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.

The indexes will be starting from 0 and the ordering is done by partition. Below is an example using zipWithIndex to get dataframe out of RDD where you can set the starting offset (which defaults to 1) and the index column name (defaults to “id”) using scala.

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types.{LongType, StructField, StructType}
import org.apache.spark.sql.Row


def dfZipWithIndex(
  df: DataFrame,
  offset: Int = 1,
  colName: String = "id") : DataFrame = {
  df.sqlContext.createDataFrame(
    df.rdd.zipWithIndex.map(element =>
      Row.fromSeq(Seq(element._2 + offset) ++ element._1.toSeq)
    ),
    StructType(
      Array(StructField(colName,LongType,false)) ++ df.schema.fields
    )
  ) 
}

Keep in mind falling back to RDDs and then to dataframe can be quite expensive.

row_number()

Starting in Spark 1.5Window expressions were added to Spark. Instead of having to convert the DataFrame to an RDD, you can now use org.apache.spark.sql.expressions.row_number. Note that I found performance for the the above dfZipWithIndex to be significantly faster than the below algorithm.

It returns a sequential number starting at 1 within a window partition.

In order to use row_number(), we need to move our data into one partition. The Window in both cases (sortable and not sortable data) consists basically of all the rows we currently have so that the row_number() function can go over them and increment the row number. This can cause performance and memory issues — we can easily go OOM , depending on how much data and how much memory we have. Below is small code snippet :

import org.apache.spark.sql.expressions._

df.withColumn("row_num", row_number.over(Window.partitionBy(lit(1)).orderBy(lit(1))))

Note by using lit(1) for both the partitioning and the ordering — this makes everything to be in the same partition, and seems to preserve the original ordering of the DataFrame .

In other dialect like Hive ,order by is not the must when using window function, but it is must before Spark 2.4.5 version, else you will get the error for below query

select row_number()over() from test1

Error: org.apache.spark.sql.AnalysisException: Window function row_number() requires window to be ordered, please add ORDER BY clause. For example SELECT row_number()(value_expr) OVER (PARTITION BY window_partition ORDER BY window_ordering) from table; (state=,code=0)

order by made optional for row_number window function in Spark 2.4.5. Please refer the below link :-
https://issues.apache.org/jira/browse/SPARK-31512

I hope you have enjoyed this post and it helped you to understand in generating sequential and unique ids in a spark dataframe. Please like and share and feel free to comment if you have any suggestions or feedback.

Tagged : / / /

Locking in Hibernate using Java

In Hibernate, locking strategy can be either optimistic or pessimistic. Let’s first see the definition.

Optimistic
Optimistic locking assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect. Before committing, each transaction verifies that no other transaction has modified its data. If the check reveals conflicting modifications, the committing transaction rolls back.

Pessimistic
Pessimistic locking assumes that concurrent transactions will conflict with each other, and requires resources to be locked after they are read and only unlocked after the application has finished using the data.
Hibernate provides mechanisms for implementing both types of locking in your applications. Let’s understand in detail :

Optimistic

When your application uses long transactions or conversations that span several database transactions, you can store versioning data so that if the same entity is updated by two conversations, the last to commit changes is informed of the conflict, and does not override the other conversation’s work. This approach guarantees some isolation, but scales well and works particularly well in read-often-write-sometimes situations.

Hibernate provides two different mechanisms for storing versioning information, a dedicated version number or a timestamp.

A version or timestamp property can never be null for a detached instance. Hibernate detects any instance with a null version or timestamp as transient, regardless of other unsaved-value strategies that you specify. Declaring a nullable version or timestamp property is an easy way to avoid problems with transitive reattachment in Hibernate, especially useful if you use assigned identifiers or composite keys.

Dedicated version number

The version number mechanism for optimistic locking is provided through a @Version annotation.

Example : The @Version annotation

@Entity
public class Flight implements Serializable {
...
    @Version
    @Column(name="OPTLOCK")
    public Integer getVersion() { ... }
}

Here, the version property is mapped to the OPTLOCK column, and the entity manager uses it to detect conflicting updates, and prevent the loss of updates that would be overwritten by a last-commit-wins strategy.

The version column can be any kind of type, as long as you define and implement the appropriate UserVersionType.

Your application is forbidden from altering the version number set by Hibernate. To artificially increase the version number, see the documentation for properties LockModeType.OPTIMISTIC_FORCE_INCREMENT or LockModeType.PESSIMISTIC_FORCE_INCREMENT check in the Hibernate Entity Manager reference documentation.

Database-generated version numbers
If the version number is generated by the database, such as a trigger, use the annotation @org.hibernate.annotations.Generated(GenerationTime.ALWAYS).

Example: Declaring a version property in hbm.xml

<version
        column="version_column"
        name="propertyName"
        type="typename"
        access="field|property|ClassName"
        unsaved-value="null|negative|undefined"
        generated="never|always"
        insert="true|false"
        node="element-name|@attribute-name|element/@attribute|."
/>
columnThe name of the column holding the version number. Optional, defaults to the property name.
nameThe name of a property of the persistent class.
typeThe type of the version number. Optional, defaults to integer.
accessHibernate’s strategy for accessing the property value. Optional, defaults to property.
unsaved-valueIndicates that an instance is newly instantiated and thus unsaved. This distinguishes it from detached instances that were saved or loaded in a previous session. The default value, undefined, indicates that the identifier property value should be used. Optional.
generatedIndicates that the version property value is generated by the database. Optional, defaults to never.
insertWhether or not to include the version column in SQL insert statements. Defaults to true, but you can set it to false if the database column is defined with a default value of 0.
Timestamp

Timestamps are a less reliable way of optimistic locking than version numbers, but can be used by applications for other purposes as well. Timestamping is automatically used if you the @Version annotation on a Date or Calendar.

Example: Using timestamps for optimistic locking

@Entity
public class Flight implements Serializable {
...
    @Version
    public Date getLastUpdate() { ... }
}

Hibernate can retrieve the timestamp value from the database or the JVM, by reading the value you specify for the @org.hibernate.annotations.Source annotation. The value can be either org.hibernate.annotations.SourceType.DB or org.hibernate.annotations.SourceType.VM. The default behavior is to use the database, and is also used if you don’t specify the annotation at all.

The timestamp can also be generated by the database instead of Hibernate, if you use the @org.hibernate.annotations.Generated(GenerationTime.ALWAYS) annotation.

Example 5.4. The timestamp element in hbm.xml

<timestamp
        column="timestamp_column"
        name="propertyName"
        access="field|property|ClassName"
        unsaved-value="null|undefined"
        source="vm|db"
        generated="never|always"
        node="element-name|@attribute-name|element/@attribute|."
/>
columnThe name of the column which holds the timestamp. Optional, defaults to the property namel
nameThe name of a JavaBeans style property of Java type Date or Timestamp of the persistent class.
accessThe strategy Hibernate uses to access the property value. Optional, defaults to property.
unsaved-valueA version property which indicates than instance is newly instantiated, and unsaved. This distinguishes it from detached instances that were saved or loaded in a previous session. The default value of undefined indicates that Hibernate uses the identifier property value.
sourceWhether Hibernate retrieves the timestamp from the database or the current JVM. Database-based timestamps incur an overhead because Hibernate needs to query the database each time to determine the incremental next value. However, database-derived timestamps are safer to use in a clustered environment. Not all database dialects are known to support the retrieval of the database’s current timestamp. Others may also be unsafe for locking, because of lack of precision.
generatedWhether the timestamp property value is generated by the database. Optional, defaults to never.
Versionless optimistic locking

Although the default @Version property optimistic locking mechanism is sufficient in many situations, sometimes, you need rely on the actual database row column values to prevent lost updates.

Hibernate supports a form of optimistic locking that does not require a dedicated “version attribute”. This is also useful for use with modeling legacy schemas.

The idea is that you can get Hibernate to perform “version checks” using either all of the entity’s attributes, or just the attributes that have changed. This is achieved through the use of the @OptimisticLocking annotation which defines a single attribute of type org.hibernate.annotations.OptimisticLockType .

There are 4 available OptimisticLockTypes:

NONE
optimistic locking is disabled even if there is a @Version annotation present

VERSION (the default)
performs optimistic locking based on a @Version as described above

ALL
performs optimistic locking based on all fields as part of an expanded WHERE clause restriction for the UPDATE/DELETE SQL statements

DIRTY
performs optimistic locking based on dirty fields as part of an expanded WHERE clause restriction for the UPDATE/DELETE SQL statements

Example: OptimisticLockType.ALL mapping example

@Entity(name = "Person")
@OptimisticLocking(type = OptimisticLockType.ALL)
@DynamicUpdate
public static class Person {

    @Id
    private Long id;

    @Column(name = "`name`")
    private String name;

    private String country;

    private String city;

    @Column(name = "created_on")
    private Timestamp createdOn;

    . . .
}

Pessimistic

Typically, you only need to specify an isolation level for the JDBC connections and let the database handle locking issues. If you do need to obtain exclusive pessimistic locks or re-obtain locks at the start of a new transaction, Hibernate gives you the tools you need.

Hibernate always uses the locking mechanism of the database, and never lock objects in memory.

The LockMode class

The LockMode class defines the different lock levels that Hibernate can acquire.

LockMode.WRITEacquired automatically when Hibernate updates or inserts a row.
LockMode.UPGRADEacquired upon explicit user request using SELECT ... FOR UPDATE on databases which support that syntax.
LockMode.UPGRADE_NOWAITacquired upon explicit user request using a SELECT ... FOR UPDATE NOWAIT in Oracle.
LockMode.READacquired automatically when Hibernate reads data under Repeatable Read or Serializable isolation level. It can be re-acquired by explicit user request.
LockMode.NONEThe absence of a lock. All objects switch to this lock mode at the end of a Transaction. Objects associated with the session via a call to update() or saveOrUpdate() also start out in this lock mode.

The explicit user request mentioned above occurs as a consequence of any of the following actions:

  • A call to Session.load(), specifying a LockMode.
  • A call to Session.lock().
  • A call to Query.setLockMode().

If you call Session.load() with option UPGRADE or UPGRADE_NOWAIT, and the requested object is not already loaded by the session, the object is loaded using SELECT ... FOR UPDATE. If you call load() for an object that is already loaded with a less restrictive lock than the one you request, Hibernate calls lock() for that object.

Session.lock() performs a version number check if the specified lock mode is READUPGRADE, or UPGRADE_NOWAIT. In the case of UPGRADE or UPGRADE_NOWAITSELECT ... FOR UPDATE syntax is used.

If the requested lock mode is not supported by the database, Hibernate uses an appropriate alternate mode instead of throwing an exception. This ensures that applications are portable.

I hope you have enjoyed this post and it helped you to understand the locking mechanism in hibernate using Java. Please like and share and feel free to comment if you have any suggestions or feedback.

Tagged : / /

curl – Unix, Linux Command with examples

curl is cross-platform utility means you can use on Windows, MAC, and UNIX. It offers proxy support, user authentication, FTP uploading, HTTP posting, SSL connections, cookies, file transfer resume, metalink, and other features.

Syntax:

curl [options] [URL...]

The following are some of the most used syntaxes with an example to help you.

connect to URL

The most basic uses of curl is typing the command followed by the URL.

curl https://www.geeksforgeeks.org

This should display the content of the URL on the terminal. Multiple URLs or parts of URLs can be specified by writing part sets within braces as in:

http://site.{one,two,three}.com 
or get sequences of alphanumeric series by using [] as in:
ftp://ftp.numerical.com/file[1-100].txt
ftp://ftp.numerical.com/file[001-100].txt (with leading zeros)
ftp://ftp.letter.com/file[a-z].txt

If the server can’t connect, then you will get error such as could not resolve host.

C:>curl http://jjsdsldkjksl.com/
curl: (6) Could not resolve host: jjsdsldkjksl.com
Save URL/URI output to file

If you have to save the URL or URI contents to a specific file, you can use the following syntax

curl https://yoururl.com > yoururl.html

eg.

To save the result of the curl command, use either the -o or -O option.

Lowercase -o saves the file with a predefined filename, which in the example below is vue-v2.6.10.js:

C:\tmp>curl -o vue-v2.6.10.js https://cdn.jsdelivr.net/npm/vue/dist/vue.js

Uppercase -O saves the file with its original filename:

C:\tmp>curl -O https://cdn.jsdelivr.net/npm/vue/dist/vue.js
View curl Version

The -V or --version options will not only return the version, but also the supported protocols and features in your current version.

C:>curl -V
curl 7.55.1 (Windows) libcurl/7.55.1 WinSSL
Release-Date: 2017-11-14, security patched: 2019-11-05
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile SSPI Kerberos SPNEGO NTLM SSL
Limit Download Rate

To prevent curl from consuming all the available bandwidth, you can limit the download rate to 100 KB/s as follows.

curl –-limit-rate 100K https://gf.dev
Query HTTP Headers

HTTP headers allow the remote web server to send additional information about itself along with the actual request. This provides the client with details on how the request is being handled. To query the HTTP headers from a website, use:

C:\>curl -I www.thecodersstop.com
HTTP/1.1 301 Moved Permanently
Date: Tue, 01 Dec 2020 09:25:32 GMT
Server: Apache
X-Powered-By: PHP/7.3.21
X-Redirect-By: WordPress
Upgrade: h2,h2c
Connection: Upgrade
Location: http://thecodersstop.com/
Vary: User-Agent
Content-Type: text/html; charset=UTF-8
Resume a Download

You can resume a download by using the -C - option. This is useful if your connection drops during the download of a large file, and instead of starting the download from scratch, you can continue the previous one.

For example, if you are downloading the Ubuntu 18.04 iso file using the following command:

curl -O http://releases.ubuntu.com/18.04/ubuntu-18.04-live-server-amd64.iso

and suddenly your connection drops you can resume the download with:

curl -C - -O http://releases.ubuntu.com/18.04/ubuntu-18.04-live-server-amd64.iso
Change the User-Agent

Sometimes when downloading a file, the remote server may be set to block the Curl User-Agent or to return different contents depending on the visitor device and browser.

In situations like this to emulate a different browser, use the -A option.

For example to emulates Firefox 60 you would use:

C:\tmp>curl -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0" https://getfedora.org/
Transfer Files via FTP

To access a protected FTP server with curl, use the -u option and specify the username and password as shown below:

curl -u USERNAME:PASSWORD ftp://ftp.example.com/

Once logged in, the command lists all files and directories in the user’s home directory.

You can download a single file from the FTP server using the following syntax:

curl -u USERNAME:PASSWORD ftp://ftp.example.com/file.tar.gz

To upload a file to the FTP server, use the -T followed by the name of the file you want to upload:

curl -T newfile.tar.gz -u USERNAME:PASSWORD ftp://ftp.example.com/
Using Proxies

curl supports different types of proxies, including HTTP, HTTPS and SOCKS. To transfer data through a proxy server, use the -x (--proxy) option, followed by the proxy URL.

The following command downloads the specified resource using a proxy on 192.168.44.2 port 8888:

curl -x 192.168.44.2:8888 http://linux.com/

If the proxy server requires authentication, use the -U (--proxy-user) option followed by the user name and password separated by a colon (user:password):

curl -U username:password -x 192.168.44.2:8888 http://linux.com/
Connect HTTPS/SSL URL and ignore any SSL certificate error

When you try to access SSL/TLS cert secured URL and if that is having the wrong cert or CN doesn’t match, then you will get the following error.

curl: (51) Unable to communicate securely with peer: requested domain name does not match the server's certificate.

You can instruct curl to ignore the cert error with --insecure or -k flag.

curl --insecure https://yoururl.com
View curl all information

Try curl --help for more information. It will list down all the information on terminal.

 C:\>curl --help
Usage: curl [options...] <url>
     --abstract-unix-socket <path> Connect via abstract Unix domain socket
     --anyauth       Pick any authentication method
 -a, --append        Append to target file when uploading
     --basic         Use HTTP Basic Authentication
     --cacert <CA certificate> CA certificate to verify peer against
     --capath <dir>  CA directory to verify peer against
 -E, --cert <certificate[:password]> Client certificate file and password
     --cert-status   Verify the status of the server certificate
     --cert-type <type> Certificate file type (DER/PEM/ENG)
     --ciphers <list of ciphers> SSL ciphers to use
     --compressed    Request compressed response
 -K, --config <file> Read config from a file
     --connect-timeout <seconds> Maximum time allowed for connection
     --connect-to <HOST1:PORT1:HOST2:PORT2> Connect to host
 -C, --continue-at <offset> Resumed transfer offset
 -b, --cookie <data> Send cookies from string/file
 -c, --cookie-jar <filename> Write cookies to <filename> after operation
     --create-dirs   Create necessary local directory hierarchy
     --crlf          Convert LF to CRLF in upload
     --crlfile <file> Get a CRL list in PEM format from the given file
 -d, --data <data>   HTTP POST data
     --data-ascii <data> HTTP POST ASCII data
     --data-binary <data> HTTP POST binary data
     --data-raw <data> HTTP POST data, '@' allowed
     --data-urlencode <data> HTTP POST data url encoded
     --delegation <LEVEL> GSS-API delegation permission
     --digest        Use HTTP Digest Authentication
 -q, --disable       Disable .curlrc
     --disable-eprt  Inhibit using EPRT or LPRT
     --disable-epsv  Inhibit using EPSV
     --dns-interface <interface> Interface to use for DNS requests
     --dns-ipv4-addr <address> IPv4 address to use for DNS requests
     --dns-ipv6-addr <address> IPv6 address to use for DNS requests
     --dns-servers <addresses> DNS server addrs to use
 -D, --dump-header <filename> Write the received headers to <filename>
     --egd-file <file> EGD socket path for random data
     --engine <name> Crypto engine to use
     --expect100-timeout <seconds> How long to wait for 100-continue
 -f, --fail          Fail silently (no output at all) on HTTP errors
     --fail-early    Fail on first transfer error, do not continue
     --false-start   Enable TLS False Start
 -F, --form <name=content> Specify HTTP multipart POST data
     --form-string <name=string> Specify HTTP multipart POST data
     --ftp-account <data> Account data string
     --ftp-alternative-to-user <command> String to replace USER [name]
     --ftp-create-dirs Create the remote dirs if not present
     --ftp-method <method> Control CWD usage
     --ftp-pasv      Use PASV/EPSV instead of PORT
 -P, --ftp-port <address> Use PORT instead of PASV
     --ftp-pret      Send PRET before PASV
     --ftp-skip-pasv-ip Skip the IP address for PASV
     --ftp-ssl-ccc   Send CCC after authenticating
     --ftp-ssl-ccc-mode <active/passive> Set CCC mode
     --ftp-ssl-control Require SSL/TLS for FTP login, clear for transfer
 -G, --get           Put the post data in the URL and use GET
 -g, --globoff       Disable URL sequences and ranges using {} and []
 -I, --head          Show document info only
 -H, --header <header/@file> Pass custom header(s) to server
 -h, --help          This help text
     --hostpubmd5 <md5> Acceptable MD5 hash of the host public key
 -0, --http1.0       Use HTTP 1.0
     --http1.1       Use HTTP 1.1
     --http2         Use HTTP 2
     --http2-prior-knowledge Use HTTP 2 without HTTP/1.1 Upgrade
     --ignore-content-length Ignore the size of the remote resource
 -i, --include       Include protocol response headers in the output
 -k, --insecure      Allow insecure server connections when using SSL
     --interface <name> Use network INTERFACE (or address)
 -4, --ipv4          Resolve names to IPv4 addresses
 -6, --ipv6          Resolve names to IPv6 addresses
 -j, --junk-session-cookies Ignore session cookies read from file
     --keepalive-time <seconds> Interval time for keepalive probes
     --key <key>     Private key file name
     --key-type <type> Private key file type (DER/PEM/ENG)
     --krb <level>   Enable Kerberos with security <level>
     --libcurl <file> Dump libcurl equivalent code of this command line
     --limit-rate <speed> Limit transfer speed to RATE
 -l, --list-only     List only mode
     --local-port <num/range> Force use of RANGE for local port numbers
 -L, --location      Follow redirects
     --location-trusted Like --location, and send auth to other hosts
     --login-options <options> Server login options
     --mail-auth <address> Originator address of the original email
     --mail-from <address> Mail from this address
     --mail-rcpt <address> Mail from this address
 -M, --manual        Display the full manual
     --max-filesize <bytes> Maximum file size to download
     --max-redirs <num> Maximum number of redirects allowed
 -m, --max-time <time> Maximum time allowed for the transfer
     --metalink      Process given URLs as metalink XML file
     --negotiate     Use HTTP Negotiate (SPNEGO) authentication
 -n, --netrc         Must read .netrc for user name and password
     --netrc-file <filename> Specify FILE for netrc
     --netrc-optional Use either .netrc or URL
 -:, --next          Make next URL use its separate set of options
     --no-alpn       Disable the ALPN TLS extension
 -N, --no-buffer     Disable buffering of the output stream
     --no-keepalive  Disable TCP keepalive on the connection
     --no-npn        Disable the NPN TLS extension
     --no-sessionid  Disable SSL session-ID reusing
     --noproxy <no-proxy-list> List of hosts which do not use proxy
     --ntlm          Use HTTP NTLM authentication
     --ntlm-wb       Use HTTP NTLM authentication with winbind
     --oauth2-bearer <token> OAuth 2 Bearer Token
 -o, --output <file> Write to file instead of stdout
     --pass <phrase> Pass phrase for the private key
     --path-as-is    Do not squash .. sequences in URL path
     --pinnedpubkey <hashes> FILE/HASHES Public key to verify peer against
     --post301       Do not switch to GET after following a 301
     --post302       Do not switch to GET after following a 302
     --post303       Do not switch to GET after following a 303
     --preproxy [protocol://]host[:port] Use this proxy first
 -#, --progress-bar  Display transfer progress as a bar
     --proto <protocols> Enable/disable PROTOCOLS
     --proto-default <protocol> Use PROTOCOL for any URL missing a scheme
     --proto-redir <protocols> Enable/disable PROTOCOLS on redirect
 -x, --proxy [protocol://]host[:port] Use this proxy
     --proxy-anyauth Pick any proxy authentication method
     --proxy-basic   Use Basic authentication on the proxy
     --proxy-cacert <file> CA certificate to verify peer against for proxy
     --proxy-capath <dir> CA directory to verify peer against for proxy
     --proxy-cert <cert[:passwd]> Set client certificate for proxy
     --proxy-cert-type <type> Client certificate type for HTTS proxy
     --proxy-ciphers <list> SSL ciphers to use for proxy
     --proxy-crlfile <file> Set a CRL list for proxy
     --proxy-digest  Use Digest authentication on the proxy
     --proxy-header <header/@file> Pass custom header(s) to proxy
     --proxy-insecure Do HTTPS proxy connections without verifying the proxy
     --proxy-key <key> Private key for HTTPS proxy
     --proxy-key-type <type> Private key file type for proxy
     --proxy-negotiate Use HTTP Negotiate (SPNEGO) authentication on the proxy
     --proxy-ntlm    Use NTLM authentication on the proxy
     --proxy-pass <phrase> Pass phrase for the private key for HTTPS proxy
     --proxy-service-name <name> SPNEGO proxy service name
     --proxy-ssl-allow-beast Allow security flaw for interop for HTTPS proxy
     --proxy-tlsauthtype <type> TLS authentication type for HTTPS proxy
     --proxy-tlspassword <string> TLS password for HTTPS proxy
     --proxy-tlsuser <name> TLS username for HTTPS proxy
     --proxy-tlsv1   Use TLSv1 for HTTPS proxy
 -U, --proxy-user <user:password> Proxy user and password
     --proxy1.0 <host[:port]> Use HTTP/1.0 proxy on given port
 -p, --proxytunnel   Operate through a HTTP proxy tunnel (using CONNECT)
     --pubkey <key>  SSH Public key file name
 -Q, --quote         Send command(s) to server before transfer
     --random-file <file> File for reading random data from
 -r, --range <range> Retrieve only the bytes within RANGE
     --raw           Do HTTP "raw"; no transfer decoding
 -e, --referer <URL> Referrer URL
 -J, --remote-header-name Use the header-provided filename
 -O, --remote-name   Write output to a file named as the remote file
     --remote-name-all Use the remote file name for all URLs
 -R, --remote-time   Set the remote file's time on the local output
 -X, --request <command> Specify request command to use
     --request-target Specify the target for this request
     --resolve <host:port:address> Resolve the host+port to this address
     --retry <num>   Retry request if transient problems occur
     --retry-connrefused Retry on connection refused (use with --retry)
     --retry-delay <seconds> Wait time between retries
     --retry-max-time <seconds> Retry only within this period
     --sasl-ir       Enable initial response in SASL authentication
     --service-name <name> SPNEGO service name
 -S, --show-error    Show error even when -s is used
 -s, --silent        Silent mode
     --socks4 <host[:port]> SOCKS4 proxy on given host + port
     --socks4a <host[:port]> SOCKS4a proxy on given host + port
     --socks5 <host[:port]> SOCKS5 proxy on given host + port
     --socks5-basic  Enable username/password auth for SOCKS5 proxies
     --socks5-gssapi Enable GSS-API auth for SOCKS5 proxies
     --socks5-gssapi-nec Compatibility with NEC SOCKS5 server
     --socks5-gssapi-service <name> SOCKS5 proxy service name for GSS-API
     --socks5-hostname <host[:port]> SOCKS5 proxy, pass host name to proxy
 -Y, --speed-limit <speed> Stop transfers slower than this
 -y, --speed-time <seconds> Trigger 'speed-limit' abort after this time
     --ssl           Try SSL/TLS
     --ssl-allow-beast Allow security flaw to improve interop
     --ssl-no-revoke Disable cert revocation checks (WinSSL)
     --ssl-reqd      Require SSL/TLS
 -2, --sslv2         Use SSLv2
 -3, --sslv3         Use SSLv3
     --stderr        Where to redirect stderr
     --suppress-connect-headers Suppress proxy CONNECT response headers
     --tcp-fastopen  Use TCP Fast Open
     --tcp-nodelay   Use the TCP_NODELAY option
 -t, --telnet-option <opt=val> Set telnet option
     --tftp-blksize <value> Set TFTP BLKSIZE option
     --tftp-no-options Do not send any TFTP options
 -z, --time-cond <time> Transfer based on a time condition
     --tls-max <VERSION> Use TLSv1.0 or greater
     --tlsauthtype <type> TLS authentication type
     --tlspassword   TLS password
     --tlsuser <name> TLS user name
 -1, --tlsv1         Use TLSv1.0 or greater
     --tlsv1.0       Use TLSv1.0
     --tlsv1.1       Use TLSv1.1
     --tlsv1.2       Use TLSv1.2
     --tlsv1.3       Use TLSv1.3
     --tr-encoding   Request compressed transfer encoding
     --trace <file>  Write a debug trace to FILE
     --trace-ascii <file> Like --trace, but without hex output
     --trace-time    Add time stamps to trace/verbose output
     --unix-socket <path> Connect through this Unix domain socket
 -T, --upload-file <file> Transfer local FILE to destination
     --url <url>     URL to work with
 -B, --use-ascii     Use ASCII/text transfer
 -u, --user <user:password> Server user and password
 -A, --user-agent <name> Send User-Agent <name> to server
 -v, --verbose       Make the operation more talkative
 -V, --version       Show version number and quit
 -w, --write-out <format> Use output FORMAT after completion
     --xattr         Store metadata in extended file attributes

I hope you have enjoyed this post and it helped you to understand the curl command usage. Please like and share and feel free to comment if you have any suggestions or feedback.

Tagged : / / /

Make your own Java Annotations

What’s the use of Annotations?

The first question that comes into our mind is what’s the use case of annotations and why it is consider as a powerful part of Java. Annotations have a number of uses, among them:

  • Information for the compiler — There are three built-in annotations available in Java (@Deprecated@Override & @SuppressWarnings) that can be used for giving certain instructions to the compiler. For example, the @Override annotation is used for instructing compiler that the annotated method is overriding the method.

 @FunctionalInterface annotation, introduced in Java SE 8, indicates that the type declaration is intended to be a functional interface, as defined by the Java Language Specification.

  • Compile-time and deployment-time processing — Software tools can process annotation information to generate code, XML files, and so forth.
  • Runtime processing — We can define annotations to be available at runtime which we can access using java reflection and can be used to give instructions to the program at runtime.

Creating Custom Annotations

  • Annotations are created by using @interface, followed by annotation name as shown in the below example.
  • An annotation can have elements as well. They look like methods. For example in the below code, we have six elements. We should not provide implementation for these elements.
  • All annotations extends java.lang.annotation.Annotation interface. Annotations cannot include any extends clause.
@interface ClassInfo {
   String author();
   String date();
   int currentRevision() default 1;
   String lastModified() default "N/A";
   String lastModifiedBy() default "N/A";
   // Note use of array
   String[] reviewers();
}

Note: All the elements that have default values set while creating annotations can be skipped while using annotation and we can also have array elements in an annotation. For example if I’m applying the above annotation to a class then I would do it like this:

@ClassInfo(
    author="TheCodersStop",
    date = "05-10-2020"
    reviewers={"Me", "You"}
)
public class AnyClass {

}

As you can see, we have not given any value to the currentRevision , lastModified and lastModifiedBy elements as it is optional to set the values of these elements (default values already been set in Annotation definition, but if you want you can assign new value while using annotation just the same way as we did for other elements). However we have to provide the values of other elements (the elements that do not have default values set) while using annotation.

Annotations That Apply to Other Annotations

Annotations that apply to other annotations are called meta-annotations. There are several meta-annotation types defined in java.lang.annotation.

import java.lang.annotation.Documented;
import java.lang.annotation.ElementType;
import java.lang.annotation.Inherited;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
 
@Documented
@Target(ElementType.METHOD)
@Inherited
@Retention(RetentionPolicy.RUNTIME)
public @interface StudentAnnotation{
    int studentAge() default 15;
    String studentName();
    String studentAddress();
    String studentStream() default "PCM";
}

In the above custom annotation example we have used these four annotations :  @Documented,  @Target,  @Inherited &  @Retention. Lets discuss them one by one.

@Documented

@Documented annotation indicates that whenever the specified annotation is used those elements should be documented using the Javadoc tool. (By default, annotations are not included in Javadoc.) For more information, see the Javadoc tools page.

@StudentAnnotation
public class AnyClass { 
     //Class body
}

So while generating the javadoc for class AnyClass, the annotation @MyCustomAnnotation would be included in that.

@Target

@Target annotation marks another annotation to restrict what kind of Java elements the annotation can be applied to. In our case, we have defined the target type as METHOD which means the below annotation can only be used on methods.

public class AnyClass {
   @StudentAnnotation
   public void anyMethod()
   {
       //Doing something
   }
}

If you do not define any Target type that means annotation can be applied to any element. A target annotation specifies one of the following element types as its value:

  • ElementType.ANNOTATION_TYPE can be applied to an annotation type.
  • ElementType.CONSTRUCTOR can be applied to a constructor.
  • ElementType.FIELD can be applied to a field or property.
  • ElementType.LOCAL_VARIABLE can be applied to a local variable.
  • ElementType.METHOD can be applied to a method-level annotation.
  • ElementType.PACKAGE can be applied to a package declaration.
  • ElementType.PARAMETER can be applied to the parameters of a method.
  • ElementType.TYPE can be applied to any element of a class.
@Inherited

The @Inherited annotation indicates that a custom annotation used in a class should be inherited by all of its sub classes. This is not true by default.

@StudentAnnotation
public class AnyParentClass { 
  ... 
}
public class AnyChildClass extends AnyParentClass { 
   ... 
}

Here the class AnyParentClass is using annotation @StudentAnnotation which is marked with @inherited annotation. It means that the sub class AnyChildClass inherits the @AnyCustomAnnotation.

@Retention

@Retention annotation specifies how the marked annotation is stored:

  • RetentionPolicy.SOURCE – The marked annotation is retained only in the source level and is ignored by the compiler.
  • RetentionPolicy.CLASS – The marked annotation is retained by the compiler at compile time, but is ignored by the Java Virtual Machine (JVM).
  • RetentionPolicy.RUNTIME – The marked annotation is retained by the JVM so it can be used by the runtime environment.
@Repeatable

 @Repeatable annotation, introduced in Java SE 8, indicates that the marked annotation can be applied more than once to the same declaration or type use. For more information, see Repeating Annotations.

I hope you have enjoyed this post and it helped you to create a custom Java annotation. Please like and share and feel free to comment if you have any suggestions or feedback.

Tagged : / /
%d bloggers like this: